Dave's nasmjf Dev Log 16

Created: 2022-07-22

nasmjf home

This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.

← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →
    Warning, the examples with variables in this log are
    all wrong. This update explains:

        !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!!
        ! In log19.txt, I realize that my variable     !
        ! handling is wrong. Variables should leave    !
        ! their addresses on the stack, not their      !
        ! values! We need FETCH to get the value from  !
        ! the address!                                 !
        !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!!

    Two new words add to the word "compiler" abilities of
    the interpreter:

        IMMEDIATE  sets the latest word to be "immediate"
        HIDE       takes the next "word" of input, looks it up
                   in the dictionary, and then sets that word
                   to be hidden (via the word HIDDEN)

    HIDE seems the easiest to test, so we'll start with that:

: emit2 EMIT EMIT ;
66 65 emit2
AB
HIDE emit2
66 65 emit2
PARSE ERROR: 66 65 emit2

    That worked: we get the parse error because emit2 has been
    hidden and is no longer found in the dictionary.

    The HIDDEN word used by HIDE actually toggles the hidden state,
    so can we call HIDE again to unhide the word?

HIDE emit2

Program received signal SIGSEGV, Segmentation fault.
code_HIDDEN () at nasmjf.asm:635
635         xor [edi], word F_HIDDEN  ; Toggle the HIDDEN bit in place.

    Oh, ha ha, no, of course not. It's hidden, so HIDE can't
    find it (and since there's absolutely no error checking,
    we crash trying to toggle the bit in some random memory
    location).

    I guess we could use LATEST and HIDDEN to manually toggle
    it back, but I can't be bothered tonight.

    Onward to IMMEDIATE.

: ab 66 65 EMIT EMIT ;
IMMEDIATE
: foo 1000 . ;
foo
foo
55 EMIT
;

^C
Program received signal SIGINT, Interrupt.
_WORD.skip_non_words () at nasmjf.asm:339
339         call _KEY               ; get next key, returned in %eax

   Something went wrong. I had to Ctrl+C to end the
   program. It was merrily taking input, but nothing
   would execute, not even Ctrl+D to end the input
   and exit.

   Let's try that again and verify we're toggling the
   right word...

(gdb) r
Starting program: /home/dave/nasmjf/nasmjf
(gdb) c
Continuing.
LATEST 4 + C@ .
6
LATEST 5 + C@ EMIT
L

    Okay, just sanity checking LATEST - it points to
    a word with 6 letters in the name and starts with
    the letter "L" (it's LATEST itself).

    I'll define my 'ab' word again, try it out (it should
    print the letters "AB"), and check LATEST again...

: ab 66 65 EMIT EMIT ;
ab
AB
LATEST 4 + C@ .
2
LATEST 5 + C@ EMIT
a
IMMEDIATE
ab

    Drat! Then it locked up again. So IMMEDIATE is
    definitely not working right.

    Next night: okay, let's see what's going on...

(gdb) break code_IMMEDIATE
Breakpoint 2 at 0x80494ec: file nasmjf.asm, line 1097.
(gdb) c
Continuing.
: ab 66 65 EMIT EMIT ;
ab
AB
IMMEDIATE

Breakpoint 2, code_IMMEDIATE () at nasmjf.asm:1097
(gdb) p/x (int)var_LATEST
$1 = 0x804e000
(gdb) x/10c (int)var_LATEST
0x804e000:   ...  2 '\002' 97 'a'  98 'b' ...

    So that's right - LATEST points at word 'ab'...

1098        add edi, 4                ; Point to name/flags byte.
1099        xor byte [edi], F_IMMED   ; Toggle the IMMED bit.
(gdb) p/x $edi
$2 = 0x804a6b0

    That's a dead giveaway, the address in register
    edi should now be LATEST + 4.  But it's actually
    the _address_ of LATEST + 4!

(gdb) p/x (int)var_LATEST
$3 = 0x804e000

    It still takes me a bit before I see it...

(gdb) disass 1099
No function contains specified address.
(gdb) disass code_IMMEDIATE
Dump of assembler code for function code_IMMEDIATE:
   0x080494ec <+0>:     mov    edi,0x804a6ac   <--- should be PTR
   0x080494f1 <+5>:     add    edi,0x4
=> 0x080494f4 <+8>:     xor    BYTE PTR [edi],0x80
   0x080494f7 <+11>:    lods   eax,DWORD PTR ds:[esi]
   0x080494f8 <+12>:    jmp    DWORD PTR [eax]
End of assembler dump.

    I finally see it.

    I have

        mov edi, var_LATEST

    where I should have

        mov edi, [var_LATEST]

    (so of course it wasn't working after that. LATEST was
    incremented and no longer pointed at word. All further
    interpretation would fail to match!)

    With that fixed, it should work...

(gdb) load
(gdb) r
Starting program: /home/dave/nasmjf/nasmjf
: ab 66 65 EMIT EMIT ;
ab
AB
IMMEDIATE

    So now 'ab' should execute as soon as the interpreter
    sees it, even in compile mode:

: five 5 . ab ;
AB
five
5

    Yeah! The call to 'ab' executed at "compile time" rather
    than "run time" for the new word 'five'. Using this, we
    could add new language features to FORTH in FORTH.

    Next, the TICK (single quote ') word gets the address
    of a word (supplied after the ' so it doesn't execute.
    this is the same trick LIT uses).

    This implementation can only work at compile time because
    the interpreter needs to turn the word that follows as
    a 4-byte address for ' to be able to read and then hope
    over that value. Just for fun, let's try to print the
    address of the EMIT word outside of the compile state:

' EMIT .

Program received signal SIGSEGV, Segmentation fault.

    See?

    Now let's use it the same way, but in a new compiled word:

: addrofemit ' EMIT . ;
addrofemit
134521260

    Looks like it worked, but is that address correct?

(gdb) info addr EMIT
Symbol "EMIT" is at 0x804a1ac in a file compiled without debugging.
(gdb) p/d 0x804a1ac
$1 = 134521260

    Yup!

    Then the next night, I've got a really exciting one,
    BRANCH0.

    But first, I'm trying to figure out how to even test
    BRANCH, let alone its conditional big brother!

    I even worked it out on paper the next morning, and I'm
    still not seeing why this doesn't work:

: foo 65 EMIT BRANCH -12 ;
foo
A
Program received signal SIGSEGV, Segmentation fault.
code_BRANCH () at nasmjf.asm:27
27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.

    By my calculations, esi starts off pointing at the
    offset number's instruction (-12), then we should be
    branching back to "65":

          0    "-12"
         -4    BRANCH
         -8    EMIT
        -12    "65"

    And I've had a bummer of a time trying to step through
    it by breaking on BRANCH because that word is used
    (correctly) as part of the interpreter loop.

    So I'm going to copy BRANCH with the silly name BRUNCH
    and see why it's not correct!

        DEFCODE "BRUNCH",6,,BRUNCH
        add esi, [esi]
        NEXT

    Should be pretty simple, right? It's just a one-liner!

(gdb) break code_BRUNCH
(gdb) c
Continuing.
: foo 65 EMIT BRUNCH -12 ;
foo
A
Breakpoint 2, code_BRUNCH () at nasmjf.asm:251
251         add esi, [esi]          ; add the offset to the instruction pointer

    Okay, now let's thoroughly examine this. We're going to
    add the negative number stored where esi points to FROM
    esi. Where does esi point?

(gdb) p/x $esi
$3 = 0x804e01c
(gdb) x/x $esi
0x804e01c:      0x0804a0f0
(gdb) info sym *$esi
LIT in section .data of /home/dave/nasmjf/nasmjf
(gdb) x/b $esi+4
0x804e020:      -12 '\364'

    Yup, we can see that esi points to the address of LIT
    followed by the value -12. As expected.

(gdb) s
27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.

    Now that's run, where does esi point now?

(gdb) info sym *$esi
Cannot access memory at address 0x1009810c
(gdb) p/x $esi
$4 = 0x1009810c

    What? That address isn't right. It should be 12 less
    than before, not...oh wait...

(gdb) disass code_BRUNCH
Dump of assembler code for function code_BRUNCH:
   0x08049054 <+0>:     add    esi,DWORD PTR [esi]
=> 0x08049056 <+2>:     lods   eax,DWORD PTR ds:[esi]
   0x08049057 <+3>:     jmp    DWORD PTR [eax]
End of assembler dump.

    Now I see it. We subtracted the address of LIT, not
    the -12 that follows it. No wonder I got a segfault.

    So how do I get the value -12 right after BRANCH?

    Next night: okay, so I reviewed the ported words
    so far and I'm pretty sure COMMA (,) fits the bill.
    It "compiles" the value on the stack to the current
    position...

: foo 65 EMIT BRUNCH -12 , ;
foo
A
Breakpoint 3, code_BRUNCH () at nasmjf.asm:251
251         add esi, [esi]          ; add the offset to the instruction pointer
(gdb) x/x **$esi
0x8049228 <code_LIT>:   0xffad50ad

    ...no, dang it, that doesn't work either, and for
    the same reason. Sure, ',' will store whatever's on
    the stack, but we're still getting LIT -12 compiled
    first when we're compiling.

    So I really don't see any easy way to test BRANCH,
    let alone 0BRANCH with an arbitrary snippet of code
    at this point. :-(

    Next evening: I've also just ported LITSTRING and TELL,
    two more primitives that appear hard to test because
    I'm not sure how to compile literal values into memory
    yet.

    So, this would be a pretty big let-down way to end a
    log file but...

    IT APPEARS THAT I'VE PORTED ALL OF THE ASSEMBLY!

    Yeah, so starting with the next log, I'm going to
    start feeding jonesforth.f, which is the second half
    of the language implementation implemented in itself,
    into my port and fix the inevitable bugs.

    It's been about six months of slowly chipping away
    at this port nearly every single evening. I can
    barely believe this stage has arrived. This is so
    cool. 8-)
← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →