Dave's nasmjf Dev Log 21

Created: 2022-07-22

nasmjf home

This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.

← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →
    The last log saw yet _another_ bug in my DEFVAR macro.
    Sheesh. Hopefully I have variables working correctly
    now?

    As I mentioned last night, the next words look like a
    real mixture of things. Let's jump into it:

    WITHIN is a conditional that tests if a number is
    between two other numbers. This highlights the advantage
    of such a flexible language (and the simplicity of the
    syntax). I'll format this slightly to make a bit of a
    truth table out of it to demonstrate how within works:

1 2 3 WITHIN .  0
2 3 1 WITHIN .  0
3 1 2 WITHIN .  0
2 1 3 WITHIN .  1  <-- 2 is within 1 and 3

    DEPTH gives us the depth of the stack:

1 2
.S
2 1
DEPTH .
8

    Huh? Oh, it's in bytes, not number of items!

    ALIGNED rounds a number up to the next multiple of 4,
    which aligns addresses on 32-bit (4 byte) computers.
    It's easy to test:

1 ALIGNED .
4
7 ALIGNED .
8

    ALIGN performs ALIGNED on HERE. Here's the definition.

        : ALIGN HERE @ ALIGNED HERE ! ;

    I have a sense for when this would be used while
    compiling, but don't really feel like trying to make up
    a test for it. I'm content to see how it's used later
    and test *that*.

    And now, the moment we've all been waiting for: strings!

    First, we need a new primitive to store the bytes of the
    string. The word ',' (COMMA) does this for word-sized
    (four bytes, the size of a 32-bit address) data. As with
    the other byte-sized words, the name is prepended with a
    letter 'C'.

    So let's compare 'C,':

        : C,
                HERE @ C!	( store the character in the compiled image )
                1 HERE +!	( increment HERE pointer by 1 byte )
        ;

    ...with the assembly definition of ',':

        mov edi, [var_HERE]
        stosd                  ; puts the value in eax at edi, increments edi
        mov [var_HERE], edi

    Same thing, only stosd increments edi (HERE) by 4 rather
    than 1.

    I've learned just to see these primitives in action.
    Trying to use _some_ them on their own is challenging
    and surpisingly unrewarding.

    Next, S" is a word that stores the characters which
    follow as a string until it hits the end quote: ".

    Note that there has to be a space after S" or it
    wouldn't be matched as the correct word. However, the
    final quote is NOT a word, it's just the special
    character S" is looking for.

    Finally, S" pushes the address of the start of the
    string onto the stack, followed by the length of the
    string.

    Okay, I think I've got all of that:

S" Hello World"
.
11

    There we go. "Hello World" is 11 characters long.

DUP
HEX
.
804EADC
@ EMIT
H

    It has been stored in memory at a particular address,
    which means we can retrieve the string from there.
    (I've EMITted the 'H' from Hello.)

    In compile mode, it's stored in the word being compiled,
    along with the LITSTRING word we defined in assembly.

    Bug where in memory is this stored when in immediate
    mode? Jones explains that this implementation stores it
    in the same place where we compile words. So I guess we
    can do this:

HERE @ .
804EADC
HERE @ @ EMIT
H
HERE @ 1 + @ EMIT
e

    And since HERE clearly hasn't been moved to a point
    after the string, that means it is temporary. It will be
    overwritten as soon as we define a new word!

    A close relative to S" (in fact, it uses S" in compiling
    mode), is the ." word, which is Forth's print string
    word. I'm excited to have this:

." Hello World!"
Hello World!

    I've updated the README now that I can do this. :-)

    And the good stuff keeps coming. The next words allow us
    to define our own constants and variables. Not only
    that, there's a handy explanation for each which would
    have saved me some trouble before had I bothered to look
    ahead (or, you know, actually learned the language I was
    going to implement).

	10 CONSTANT TEN
	VARIABLE FOO

	When TEN is executed, it leaves the integer 10 on the stack
        When FOO is executed, it leaves the address of FOO on the stack

    Let's try a constant:

42 CONSTANT answer
." The answer is " answer .
The answer is 42

    And a variable:

VARIABLE foo
9000 foo !
foo @ .
9000

    The definition of CONSTANT is pretty easy. It's
    basically the same as (and functionally equivalent to)
    this:

: answer 42 ;
." The answer is " answer .
The answer is 42

    But variables needs to allocate some memory and store its
    address. Two simple utilities aid in this:

        ALLOT - advances HERE by the amount on the stack and
                leaves the previous HERE on the stack

        CELLS - multiplies the number on the stack by the
                natural address size of the machine (4 bytes
                for our 32-bit implementation)

    What's neat about CELLS is it shows how you can build up
    words that read rather like a natural language:

HEX
HERE @ .
804EB30
5 CELLS ALLOT
.
804EB30
HERE @ .
804EB44

    And that looks like 20 bytes...hey, wait a second. Now
    that I can easiy store values, I'll let Forth figure it
    out:

HERE @ CONSTANT previous
5 CELLS ALLOT CONSTANT new
previous .
804EB78
new .
804EB98
new previous - .
20

    VALUE is like VARIABLE, except that the result is a word
    which leaves its value on the stack like a constant
    instead of its address. Unlike a constant, it can be
    updated by another word, TO.

        10 VALUE foo   create foo, set to 10
        20 TO foo      update foo to 20

    Sounds good:

10 VALUE foo
foo .
10
20 TO foo
PARSE ERROR: 20 TO

    Huh? Oh! Right between these two word definitions is
    where I have it stop reading jonesforth.f on load.

    Guess it's time to figure out the next bug in my port.

    I'll try reading all lines again:

        %assign __lines_of_jf_to_read 10000


PARSE ERROR:    ( look it up in the dictionary )
        >DFA
PARSE ERROR:    ( look it up in the dictionary )
        >DFA

Program received signal SIGSEGV, Segmentation fault.
_COMMA () at nasmjf.asm:688
688         stosd     ; puts the value in eax at edi, increments edi
(gdb)

    Okay, so the PARSE ERROR message prints out the word
    that caused the trouble, here ">DFA", after a buffer's
    worth of context.

    Ha ha, how silly. I simply missed that word in my port.
    It's a simple definition since we already have >CFA,
    which returns the codeword for a word pointer. >DFA just
    has to advance 4 bytes to the "data" (so-called threaded
    word addresses) after the codeword.

    Okay, that's defined. Now can we run all of
    jonesforth.f?

Program received signal SIGSEGV, Segmentation fault.
_COMMA () at nasmjf.asm:697
697         stosd                  ; puts the value in eax at edi, increments edi
(gdb)

    Drat! Nope, still segfaulting. And in COMMA (',') again.

    I wonder if I've overflowed some memory limitation? How
    to check that...hmmm... Well, comma stores where HERE
    points, and that's in memory reserved with Linux's brk
    syscall. How about I bump that from 0x16000 to 0x64000
    bytes:

        add eax, 0x64000  ; add our desired number of bytes to break addr

    Nope, exact same error:

_COMMA () at nasmjf.asm:697

    And same thing if I add another zero to the number. So
    much for an easy answer. So I guess, ideally, I would
    break when COMMA is trying to stosd at an address in edi
    that is outside the reserved FORTH data area. But first
    I need to know what that area is.

    I'm going to add some custom FORTH variables to capture
    this so it'll be easy to examine. I'll test in gdb
    first:

Breakpoint 2, _start () at nasmjf.asm:103
103         xor ebx, ebx
104         mov eax, __NR_brk         ; syscall brk
105         int 0x80
106         mov [var_HERE], eax       ; eax has start addr of data segment
(gdb) p/x $eax
$4 = 0x804e000
107         mov [var_CSTART], eax     ; store info: start address of data segment
108         add eax, 0x16000          ; add our desired number of bytes to break addr
(gdb) p/x (int)var_HERE
$5 = 0x804e000
(gdb) p/x (int)var_CSTART
$6 = 0x804e000

    So far so good, CSTART contains the start address of the
    data area.

109         mov ebx, eax              ; reserve memory by setting this new break addr
(gdb) p/x $eax
$7 = 0x80b2000
110         mov [var_CEND], eax       ; store info: end address of data segment
111         mov eax, __NR_brk         ; syscall brk again
(gdb) p/x $eax
$8 = 0x80b2000
112         int 0x80
(gdb) p/x $eax
117         mov ecx, 0                ; LOADJF read only flag for open
(gdb) p/x $eax
$10 = 0x80b2000
(gdb) p/x (int)var_CEND
$11 = 0x80b2000

    That looks right. CEND contains the end address of the
    data segment. Did I get my requested 0x64000 bytes?

    Let's use the new FORTH vars to find out:

(gdb) c
Continuing.
HEX CSTART @ . CEND @ . CEND @ CSTART @ - .
804E000 80B2000 64000

    Looking good. Now to catch the bad address being used in
    COMMA. I added two compares and an ".oops" label:

        _COMMA:
        mov edi, [var_HERE]
        cmp edi, [var_CSTART]
        jl .oops
        cmp edi, [var_CEND]
        jg .oops
        stosd
        mov [var_HERE], edi
        ret
    .oops:
        nop

(gdb) break _COMMA.oops
Breakpoint 2 at 0x8049343: file nasmjf.asm, line 707.
(gdb) c
Continuing.

Breakpoint 2, _COMMA.oops () at nasmjf.asm:707
707         nop
(gdb) p var_HERE
'var_HERE' has unknown type; cast it to its declared type
(gdb) p (int)var_HERE
$1 = 61368
(gdb) p (int)var_CSTART
$2 = 134537216
(gdb) p (int)var_CEND
$3 = 134627328

    Okay, so HERE has been set to an invalid address
    somehow. I wish backtraces worked. Then I'd be able to
    see which word this came from.

    I'm tempted to divide and conquer...and it looks like
    if I stop execution of jonesforth.f right before the
    definition of SEE, it doesn't segfault.

    So I'll continue testing 'til there and then tackle the
    problem head-on.

    Anyway, where was I? Oh yeah, VALUE!

    To quote myself:

    "VALUE is like VARIABLE, except that the result is a word
    which leaves its value on the stack like a constant
    instead of its address. Unlike a constant, it can be
    updated by another word, TO."

        10 VALUE foo   create foo, set to 10
        20 TO foo      update foo to 20

    Sounds good:

10 VALUE foo
foo .
10
20 TO foo
foo .
20

    That's better. I'll continue with the word testing in
    the next log.
← Previous 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Next →