Dave's nasmjf Dev Log 07
Created: 2022-07-22
This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.
So the error in TCFA (which gets the code address for a dictionary entry) has been fixed. Now let's see why the COLON definition is crashing with a segfault. I type my usual test word ": FIVE 5 ;". 80 cld ; Clear the "direction flag" which means the string (gdb) c Continuing. : FIVE 5 ; (We'll skip everything that we now know works from log06.txt.) First, INTERPRET checks STATE to see if we're executing or compiling (we're executing). code_INTERPRET.check_state () at nasmjf.asm:239 239 mov edx,[var_STATE] 240 test edx,edx 241 jz .execute ; Jump if executing. Then we check to see if we are executing a literal value or a word (it's a word ":" (COLON)). code_INTERPRET.execute () at nasmjf.asm:254 254 mov ecx,[interpret_is_lit] ; Literal? 255 test ecx,ecx ; Literal? 256 jnz .do_literal Now we jump to the code pointed to at the beginning of COLON, which is DOCOL. (Increasingly, as we define more words with other words rather than pure machine language, they'll start with DOCOL.) Here I double-check that we're about t jump to the first pointe in COLON and that it points to DOCOL. 260 jmp [eax] (gdb) info symbol $eax COLON in section .data of /home/dave/nasmjf/nasmjf (gdb) info symbol *$eax DOCOL in section .text of /home/dave/nasmjf/nasmjf Here's DOCOL. I may have stepped through this before but it's worth looking at again since it's crucial to understanding this type of Forth implementation. Given the glacial pace at which I'm porting this, I need lots of reminders! This cheatsheet is currently in a comment at the top of my jonesforth.asm: esi - next forth word address to execute ebp - return stack for forth word addresses In the source, PUSHRSP and POPRSP usually handle the ebp register, which we're using for the return stack pointer (RSP). DOCOL's first two lines are both from the PUSHRSP macro (you can see that they have lower line numbers). That handles ebp. Then DOCOL advances esi to the next word pointer (in COLON after DOCOL itself). DOCOL () at nasmjf.asm:40 40 lea ebp, [ebp-4] ; "load effective address" of next stack position 41 mov [ebp], %1 ; "push" the register value to the address at ebp 70 add eax, 4 ; eax points to DOCOL (me!) in word definition. Go to next. 71 mov esi, eax ; Put the next word pointer into esi Let's see if that's right. Here's the entire definition of COLON. We don't see DOCOL here because it's inserted by the DEFWORD macro, but it comes right before FWORD. (By the way, FWORD is just WORD, but I can't have a symbol called "WORD" in NASM because it's a reserved keyword.) DEFWORD ":",1,,COLON dd FWORD dd CREATE dd LIT, DOCOL, COMMA dd LATEST, FETCH, HIDDEN dd RBRAC dd EXIT I think it's super-cool that Forth exposes all of the primitives needed to create (or replace!) the COLON compiler so can you can use them in the interpreter yourself. Truly a no-holds-barred language. At any rate, the pointer in esi should be the next one in COLON and it should point to WORD (well, FWORD). (gdb) info symbol $esi COLON + 4 in section .data of /home/dave/nasmjf/nasmjf (gdb) info symbol *$esi FWORD in section .data of /home/dave/nasmjf/nasmjf Great! And then the NEXT macro puts the address pointed to by esi into eax, increments esi to next word pointer, and jumps to the address *pointed to* by the address now in eax. HAVE YOU GOT THAT??? This is made even more confusing by the lodsd instruction. The mnemonic stands for "load string doubleword". The idea is that you can use it to load a "string" of values by repeatedly calling lodsd (or loadsb for byte, etc.). What it actually does is load 4 bytes from the address at esi into eax and then increments esi by 4. (By the way, I've come to _loathe_ the terms "word", "double" "long", etc. I'm okay with "byte" because it's come to mean "8 bits" pretty universally in the year 2022. If I were king, we would just use the byte count for these sizes like: b = 1 byte = 8 bits b2 = 2 bytes = 16 bits b4 = 4 bytes = 32 bits b8 = 8 bytes = 64 bits and "lodsd" would become "lodsb4". Well, lods* would probably have a better mnemonic. But you get the idea. Anyway, harping on x86 is, like, a full-time job and it ain't gonna get this Forth port done.) 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. And did it work? 28 jmp [eax] ; Jump to whatever code we're now pointing at. (gdb) info symbol *$eax code_FWORD in section .text of /home/dave/nasmjf/nasmjf Yup! It's jumping to WORD. (As another aside, it occurs to me that "WORD" is a really confusing name for this Forth word - it just tokenizes a space-delimited string from input. Otherwise, it doesn't have anything to do with Forth's concept of "words" as executable code stored in a "dictionary".) code_FWORD () at nasmjf.asm:302 302 call _WORD Cool, so now I'll skip stepping through WORD/KEY as we gather the string "FIVE" (the name of the word I'm trying to define) from input. _WORD.skip_non_words () at nasmjf.asm:309 309 call _KEY ; get next key, returned in %eax ... 325 mov ecx, edi ; return it I have to admit, I don't understand why I can't access the memory at word_buffer. (gdb) x/s (int)word_buffer 0x45564946: <error: Cannot access memory at address 0x45564946> Wait a dang second, 45 56 49 46 isn't an address, it's the actual characters 'E' 'V' 'I' 'F' (or "FIVE" stored little-endian)! Grrrr... another gripe - the way GDB treats symbols just confuses me. I like how NASM does it: foo is always an address, [foo] is always the value AT that address. It's very consistent. Next night: gosh darn it! I remembered. You gotta put a '&' in front of "variables" to get the address...and that includes when you're trying to use the 'examine' ('x') command to format and view memory using the variable name. (gdb) p &word_buffer $1 = (<data variable, no debug info> *) 0x804a068 <word_buffer> (gdb) x/4c &word_buffer 0x804a068 <word_buffer>: 70 'F' 73 'I' 86 'V' 69 'E' At any rate, looks good. WORD returns "FIVE". code_FWORD () at nasmjf.asm:303 303 push edi ; push base address 304 push ecx ; push length And with any luck, now we'll be headed to the next word in the COLON definition, CREATE. code_FWORD () at nasmjf.asm:27 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 28 jmp [eax] ; Jump to whatever code we're now pointing at. (gdb) info symbol $eax CREATE in section .data of /home/dave/nasmjf/nasmjf (gdb) info symbol *$eax code_CREATE in section .text of /home/dave/nasmjf/nasmjf Yay! Now CREATE makes the header (dictionary link, name, flags) portion of the word we're compiling. In the next log, we'll see if CREATE works and then try to track down which word is causing a segfault when COLON runs.