Dave's nasmjf Dev Log 02
Created: 2022-07-22
This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.
Tonight, we'll see how much of the FIND word works. FIND looks for words in the "dictionary" of defined Forth words via linked list. The interpreter uses it to look up the addresses of the word implementations so it can "compile" them into new words definitions. Reading symbols from nasmjf... Breakpoint 1 at 0x804900e: file nasmjf.asm, line 80. Now that I'm using GNU Screen with windows for Vim and GDB that will close if either application exits, I need to reload the file in GDB when I make changes (previously I was just restarting GDB). (gdb) file nasmjf Reading symbols from nasmjf... No need to step through everything until this point because I already know it works: WORD collects a word entered through STDIN. So I break when we enter the implementatoin for FIND and then continue (run the program). The "foo" below is where nasmjf is asking for input and I type "foo" and hit enter. (gdb) break _FIND Breakpoint 2 at 0x804920b: file nasmjf.asm, line 485. (gdb) c Continuing. foo Our breakpoint triggers. Now we're in FIND. It checks if we've run out of entries. Breakpoint 2, _FIND () at nasmjf.asm:485 485 push esi ; _FIND! Save esi, we'll use this reg for string comparison 488 mov edx,[var_LATEST] ; LATEST points to name header of the latest word in the diction ary _FIND.test_word () at nasmjf.asm:490 490 test edx,edx ; NULL pointer? (end of the linked list) 491 je .not_found And then I think this is clever: instead of immediately checking if the name strings match, it checks the precalculated and stored length of the name first. Much more efficient. 496 xor eax,eax 497 mov al, [edx+4] ; al = flags+length field 498 and al,(F_HIDDEN|F_LENMASK) ; al = name length 499 cmp cl,al ; Length is the same? 500 jne .prev_word ; nope, try prev And that's what happens here: the length doesn't match, so we move to the previous word in the linked list. And it starts over at .test_word... _FIND.prev_word () at nasmjf.asm:517 517 mov edx,[edx] ; Move back through the link field to the previous word 518 jmp .test_word ; loop, test prev word _FIND.test_word () at nasmjf.asm:490 490 test edx,edx ; NULL pointer? (end of the linked list) So I set a new breakpoint back in INTERPRET right after FIND returns to see how a "not found" condition is handled. (gdb) break 215 Breakpoint 3 at 0x8049043: file nasmjf.asm, line 215. (gdb) c Continuing. Breakpoint 3, code_INTERPRET () at nasmjf.asm:215 215 test eax,eax ; Found? If FIND fails, INTERPRET checks if the input is a numeric literal. 216 jz .try_literal code_INTERPRET.try_literal () at nasmjf.asm:230 230 inc byte [interpret_is_lit] ; DID NOT MATCH a word, trying literal number 231 call _NUMBER ; Returns the parsed number in %eax, %ecx > 0 if error _NUMBER () at nasmjf.asm:407 407 xor eax,eax 408 xor ebx,ebx 410 test ecx,ecx ; trying to parse a zero-length string is an error, but returns 0 411 jz .return It's neat how Forth supports numeric input in the base of your choice without any extra syntax. Just set BASE. 413 mov edx, [var_BASE] ; get BASE (in dl) 416 mov bl,[edi] ; bl = first character in string 417 inc edi 418 push eax ; push 0 on stack _NUMBER () at nasmjf.asm:419 419 cmp bl,'-' ; negative number? 420 jnz .convert_char _NUMBER.convert_char () at nasmjf.asm:435 435 sub bl,'0' ; < '0'? 436 jb .negate 437 cmp bl,10 ; <= '9'? 438 jb .compare_base 439 sub bl,17 ; < 'A'? (17 is 'A'-'0') 440 jb .negate 441 add bl,10 _NUMBER.compare_base () at nasmjf.asm:444 444 cmp bl,dl ; >= BASE? 445 jge .negate _NUMBER.negate () at nasmjf.asm:453 453 pop ebx _NUMBER.negate () at nasmjf.asm:454 454 test ebx,ebx 455 jz .return _NUMBER.return () at nasmjf.asm:459 459 ret Coming back from NUMBER, a value > 0 in ecx indicates an error in trying to parse a numeric value. code_INTERPRET.try_literal () at nasmjf.asm:232 232 test ecx,ecx 233 jnz .parse_error And sure enough, "foo" was not a valid base-ten (the default) value, so we jump to the parse_error section. This should print an error message. code_INTERPRET.parse_error () at nasmjf.asm:267 267 mov ebx,2 ; 1st param: stderr 268 mov ecx,errmsg ; 2nd param: error message 269 mov edx,(errmsgend - errmsg) ; 3rd param: length of string 270 mov eax,[__NR_write] ; write syscall But oops! Looks like I've got an error. Program received signal SIGSEGV, Segmentation fault. code_INTERPRET.parse_error () at nasmjf.asm:270 270 mov eax,[__NR_write] ; write syscall The next evening, I load it up again to see what's going on... Reading symbols from nasmjf... (gdb) break code_INTERPRET.parse_error Breakpoint 2 at 0x80490a6: file nasmjf.asm, line 267. (gdb) cont Continuing. foo Breakpoint 2, code_INTERPRET.parse_error () at nasmjf.asm:267 267 mov ebx,2 ; 1st param: stderr 268 mov ecx,errmsg ; 2nd param: error message 269 mov edx,(errmsgend - errmsg) ; 3rd param: length of string First I try to print the value at errmsg as a string. It should be the string "PARSE ERROR: ". (gdb) x/s $ecx 0x804a315 <errmsg>: "" Weird. Let's look at the first 4 bytes: (gdb) x/4x $ecx 0x804a315 <errmsg>: 0x00 0x00 0x00 0x53 Weird! Looking at stuff... (gdb) info addr errmsg Symbol "errmsg" is at 0x804a315 in a file compiled without debugging. (gdb) info addr errmsgend Symbol "errmsgend" is at 0x804a322 in a file compiled without debugging. (gdb) x/10c $ecx 0x804a315 <errmsg>: 0 '\000' 0 '\000' 0 '\000' 83 'S' 69 'E' 32 ' ' 69 ' E' 82 'R' 0x804a31d: 82 'R' 79 'O' Huh, so I've basically got "---SE ERROR: " (where '-' is NUL). Something is happening to the first three bytes of my string. Or is this some alignment issue? I'll see... To be continued.