Dave's nasmjf Dev Log 02

Created: 2022-07-22

nasmjf home

This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.

← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →
    Tonight, we'll see how much of the FIND word works. FIND looks for
    words in the "dictionary" of defined Forth words via linked list.
    The interpreter uses it to look up the addresses of the word
    implementations so it can "compile" them into new words definitions.

Reading symbols from nasmjf...
Breakpoint 1 at 0x804900e: file nasmjf.asm, line 80.

    Now that I'm using GNU Screen with windows for Vim and GDB that
    will close if either application exits, I need to reload the
    file in GDB when I make changes (previously I was just restarting
    GDB).

(gdb) file nasmjf
Reading symbols from nasmjf...

    No need to step through everything until this point because I
    already know it works: WORD collects a word entered through STDIN.

    So I break when we enter the implementatoin for FIND and then
    continue (run the program). The "foo" below is where nasmjf is
    asking for input and I type "foo" and hit enter.

(gdb) break _FIND
Breakpoint 2 at 0x804920b: file nasmjf.asm, line 485.
(gdb) c
Continuing.
foo

    Our breakpoint triggers. Now we're in FIND. It checks if we've
    run out of entries.

Breakpoint 2, _FIND () at nasmjf.asm:485
485         push esi                ; _FIND! Save esi, we'll use this reg for string comparison
488         mov edx,[var_LATEST]    ; LATEST points to name header of the latest word in the diction
ary
_FIND.test_word () at nasmjf.asm:490
490             test edx,edx            ; NULL pointer?  (end of the linked list)
491         je .not_found

        And then I think this is clever: instead of immediately
        checking if the name strings match, it checks the precalculated
        and stored length of the name first. Much more efficient.

496         xor eax,eax
497         mov al, [edx+4]           ; al = flags+length field
498         and al,(F_HIDDEN|F_LENMASK) ; al = name length
499         cmp cl,al        ; Length is the same?
500         jne .prev_word          ; nope, try prev

        And that's what happens here: the length doesn't match, so
        we move to the previous word in the linked list. And it
        starts over at .test_word...

_FIND.prev_word () at nasmjf.asm:517
517         mov edx,[edx]           ; Move back through the link field to the previous word
518         jmp .test_word          ; loop, test prev word
_FIND.test_word () at nasmjf.asm:490
490             test edx,edx            ; NULL pointer?  (end of the linked list)

    So I set a new breakpoint back in INTERPRET right after FIND
    returns to see how a "not found" condition is handled.

(gdb) break 215
Breakpoint 3 at 0x8049043: file nasmjf.asm, line 215.
(gdb) c
Continuing.
Breakpoint 3, code_INTERPRET () at nasmjf.asm:215
215         test eax,eax            ; Found?

    If FIND fails, INTERPRET checks if the input is a numeric literal.

216         jz .try_literal
code_INTERPRET.try_literal () at nasmjf.asm:230
230         inc byte [interpret_is_lit] ; DID NOT MATCH a word, trying literal number
231         call _NUMBER            ; Returns the parsed number in %eax, %ecx > 0 if error
_NUMBER () at nasmjf.asm:407
407         xor eax,eax
408         xor ebx,ebx
410         test ecx,ecx            ; trying to parse a zero-length string is an error, but returns
0
411         jz .return

    It's neat how Forth supports numeric input in the base
    of your choice without any extra syntax. Just set BASE.

413         mov edx, [var_BASE]    ; get BASE (in dl)
416         mov bl,[edi]            ; bl = first character in string
417         inc edi
418         push eax                ; push 0 on stack
_NUMBER () at nasmjf.asm:419
419         cmp bl,'-'              ; negative number?
420         jnz .convert_char
_NUMBER.convert_char () at nasmjf.asm:435
435         sub bl,'0'              ; < '0'?
436         jb .negate
437         cmp bl,10        ; <= '9'?
438         jb .compare_base
439         sub bl,17              ; < 'A'? (17 is 'A'-'0')
440         jb .negate
441         add bl,10
_NUMBER.compare_base () at nasmjf.asm:444
444             cmp bl,dl               ; >= BASE?
445         jge .negate
_NUMBER.negate () at nasmjf.asm:453
453         pop ebx
_NUMBER.negate () at nasmjf.asm:454
454         test ebx,ebx
455         jz .return
_NUMBER.return () at nasmjf.asm:459
459         ret

    Coming back from NUMBER, a value > 0 in ecx indicates an error
    in trying to parse a numeric value.

code_INTERPRET.try_literal () at nasmjf.asm:232
232         test ecx,ecx
233         jnz .parse_error

    And sure enough, "foo" was not a valid base-ten (the default)
    value, so we jump to the parse_error section. This should
    print an error message.

code_INTERPRET.parse_error () at nasmjf.asm:267
267         mov ebx,2               ; 1st param: stderr
268         mov ecx,errmsg          ; 2nd param: error message
269         mov edx,(errmsgend - errmsg) ; 3rd param: length of string
270         mov eax,[__NR_write]    ; write syscall

    But oops! Looks like I've got an error.

Program received signal SIGSEGV, Segmentation fault.
code_INTERPRET.parse_error () at nasmjf.asm:270
270         mov eax,[__NR_write]    ; write syscall

    The next evening, I load it up again to see what's going on...

Reading symbols from nasmjf...
(gdb) break code_INTERPRET.parse_error
Breakpoint 2 at 0x80490a6: file nasmjf.asm, line 267.
(gdb) cont
Continuing.
foo

Breakpoint 2, code_INTERPRET.parse_error () at nasmjf.asm:267
267         mov ebx,2               ; 1st param: stderr
268         mov ecx,errmsg          ; 2nd param: error message
269         mov edx,(errmsgend - errmsg) ; 3rd param: length of string

    First I try to print the value at errmsg as a string. It
    should be the string "PARSE ERROR: ".

(gdb) x/s $ecx
0x804a315 <errmsg>:     ""

    Weird. Let's look at the first 4 bytes:

(gdb) x/4x $ecx
0x804a315 <errmsg>:     0x00    0x00    0x00    0x53

    Weird! Looking at stuff...

(gdb) info addr errmsg
Symbol "errmsg" is at 0x804a315 in a file compiled without debugging.
(gdb) info addr errmsgend
Symbol "errmsgend" is at 0x804a322 in a file compiled without debugging.
(gdb) x/10c $ecx
0x804a315 <errmsg>:     0 '\000'        0 '\000'        0 '\000'        83 'S'  69 'E'  32 ' '  69 '
E'      82 'R'
0x804a31d:      82 'R'  79 'O'

    Huh, so I've basically got "---SE ERROR: " (where '-' is NUL). Something
    is happening to the first three bytes of my string. Or is this some
    alignment issue? I'll see... To be continued.
← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →