Dave's nasmjf Dev Log 07

Created: 2022-07-22
This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.
← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →
    So the error in TCFA (which gets the code address for
    a dictionary entry) has been fixed.

    Now let's see why the COLON definition is crashing with
    a segfault.

    I type my usual test word ": FIVE 5 ;".


80          cld    ; Clear the "direction flag" which means the string
(gdb) c
Continuing.
: FIVE 5 ;

    (We'll skip everything that we now know works from log06.txt.)

    First, INTERPRET checks STATE to see if we're executing
    or compiling (we're executing).

code_INTERPRET.check_state () at nasmjf.asm:239
239         mov edx,[var_STATE]
240         test edx,edx
241         jz .execute             ; Jump if executing.

    Then we check to see if we are executing a literal value
    or a word (it's a word ":" (COLON)).

code_INTERPRET.execute () at nasmjf.asm:254
254         mov ecx,[interpret_is_lit] ; Literal?
255         test ecx,ecx               ; Literal?
256         jnz .do_literal

    Now we jump to the code pointed to at the beginning
    of COLON, which is DOCOL. (Increasingly, as we define
    more words with other words rather than pure machine
    language, they'll start with DOCOL.)

    Here I double-check that we're about t jump to the
    first pointe in COLON and that it points to DOCOL.

260         jmp [eax]
(gdb) info symbol $eax
COLON in section .data of /home/dave/nasmjf/nasmjf
(gdb) info symbol *$eax
DOCOL in section .text of /home/dave/nasmjf/nasmjf


    Here's DOCOL. I may have stepped through this before
    but it's worth looking at again since it's crucial
    to understanding this type of Forth implementation.

    Given the glacial pace at which I'm porting this, I need
    lots of reminders!

    This cheatsheet is currently in a comment at the top
    of my jonesforth.asm:

       esi - next forth word address to execute
       ebp - return stack for forth word addresses

    In the source, PUSHRSP and POPRSP usually handle the
    ebp register, which we're using for the return stack
    pointer (RSP).

    DOCOL's first two lines are both from the PUSHRSP macro
    (you can see that they have lower line numbers).

    That handles ebp.

    Then DOCOL advances esi to the next word pointer
    (in COLON after DOCOL itself).

DOCOL () at nasmjf.asm:40
40          lea ebp, [ebp-4]   ; "load effective address" of next stack position
41          mov [ebp], %1      ; "push" the register value to the address at ebp
70          add eax, 4      ; eax points to DOCOL (me!) in word definition. Go to next.
71          mov esi, eax    ; Put the next word pointer into esi

    Let's see if that's right. Here's the entire definition
    of COLON. We don't see DOCOL here because it's inserted
    by the DEFWORD macro, but it comes right before FWORD.
    (By the way, FWORD is just WORD, but I can't have a
    symbol called "WORD" in NASM because it's a reserved
    keyword.)

        DEFWORD ":",1,,COLON
        dd FWORD
        dd CREATE
        dd LIT, DOCOL, COMMA
        dd LATEST, FETCH, HIDDEN
        dd RBRAC
        dd EXIT

    I think it's super-cool that Forth exposes all of the
    primitives needed to create (or replace!) the COLON
    compiler so can you can use them in the interpreter
    yourself. Truly a no-holds-barred language.

    At any rate, the pointer in esi should be the next one
    in COLON and it should point to WORD (well, FWORD).

(gdb) info symbol $esi
COLON + 4 in section .data of /home/dave/nasmjf/nasmjf
(gdb) info symbol *$esi
FWORD in section .data of /home/dave/nasmjf/nasmjf

    Great! And then the NEXT macro puts the address pointed
    to by esi into eax, increments esi to next word pointer,
    and jumps to the address *pointed to* by the address now
    in eax. HAVE YOU GOT THAT???

    This is made even more confusing by the lodsd instruction.
    The mnemonic stands for "load string doubleword". The idea
    is that you can use it to load a "string" of values
    by repeatedly calling lodsd (or loadsb for byte, etc.).
    What it actually does is load 4 bytes from the address at
    esi into eax and then increments esi by 4.

    (By the way, I've come to _loathe_ the terms "word", "double"
    "long", etc. I'm okay with "byte" because it's come to
    mean "8 bits" pretty universally in the year 2022. If
    I were king, we would just use the byte count for these sizes
    like:
        b  = 1 byte  = 8 bits
        b2 = 2 bytes = 16 bits
        b4 = 4 bytes = 32 bits
        b8 = 8 bytes = 64 bits
    and "lodsd" would become "lodsb4". Well, lods* would probably
    have a better mnemonic. But you get the idea. Anyway, harping
    on x86 is, like, a full-time job and it ain't gonna get this
    Forth port done.)



27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.

    And did it work?

28          jmp [eax] ; Jump to whatever code we're now pointing at.
(gdb) info symbol *$eax
code_FWORD in section .text of /home/dave/nasmjf/nasmjf

    Yup! It's jumping to WORD.

    (As another aside, it occurs to me that "WORD" is
    a really confusing name for this Forth word - it just
    tokenizes a space-delimited string from input. Otherwise,
    it doesn't have anything to do with Forth's concept
    of "words" as executable code stored in a "dictionary".)

code_FWORD () at nasmjf.asm:302
302         call _WORD

    Cool, so now I'll skip stepping through WORD/KEY as
    we gather the string "FIVE" (the name of the word I'm
    trying to define) from input.

_WORD.skip_non_words () at nasmjf.asm:309
309         call _KEY               ; get next key, returned in %eax
            ...
325         mov ecx, edi            ; return it

    I have to admit, I don't understand why I can't access
    the memory at word_buffer.

(gdb) x/s (int)word_buffer
0x45564946:     <error: Cannot access memory at address 0x45564946>

    Wait a dang second, 45 56 49 46 isn't an address, it's
    the actual characters 'E' 'V' 'I' 'F' (or "FIVE" stored
    little-endian)!

    Grrrr... another gripe - the way GDB treats symbols
    just confuses me. I like how NASM does it: foo is
    always an address, [foo] is always the value AT that
    address. It's very consistent.

    Next night: gosh darn it! I remembered. You gotta put
    a '&' in front of "variables" to get the address...and
    that includes when you're trying to use the 'examine'
    ('x') command to format and view memory using the variable
    name.

(gdb) p &word_buffer
$1 = (<data variable, no debug info> *) 0x804a068 <word_buffer>
(gdb) x/4c &word_buffer
0x804a068 <word_buffer>:        70 'F'  73 'I'  86 'V'  69 'E'

    At any rate, looks good. WORD returns "FIVE".

code_FWORD () at nasmjf.asm:303
303         push edi                ; push base address
304         push ecx                ; push length

    And with any luck, now we'll be headed to the next word in
    the COLON definition, CREATE.

code_FWORD () at nasmjf.asm:27
27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
28          jmp [eax] ; Jump to whatever code we're now pointing at.
(gdb) info symbol $eax
CREATE in section .data of /home/dave/nasmjf/nasmjf
(gdb) info symbol *$eax
code_CREATE in section .text of /home/dave/nasmjf/nasmjf

    Yay!

    Now CREATE makes the header (dictionary link, name, flags)
    portion of the word we're compiling.

    In the next log, we'll see if CREATE works and then try to
    track down which word is causing a segfault when COLON runs.
← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →