Dave's nasmjf Dev Log 21
Created: 2022-07-22
This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.
The last log saw yet _another_ bug in my DEFVAR macro.
Sheesh. Hopefully I have variables working correctly
now?
As I mentioned last night, the next words look like a
real mixture of things. Let's jump into it:
WITHIN is a conditional that tests if a number is
between two other numbers. This highlights the advantage
of such a flexible language (and the simplicity of the
syntax). I'll format this slightly to make a bit of a
truth table out of it to demonstrate how within works:
1 2 3 WITHIN . 0
2 3 1 WITHIN . 0
3 1 2 WITHIN . 0
2 1 3 WITHIN . 1 <-- 2 is within 1 and 3
DEPTH gives us the depth of the stack:
1 2
.S
2 1
DEPTH .
8
Huh? Oh, it's in bytes, not number of items!
ALIGNED rounds a number up to the next multiple of 4,
which aligns addresses on 32-bit (4 byte) computers.
It's easy to test:
1 ALIGNED .
4
7 ALIGNED .
8
ALIGN performs ALIGNED on HERE. Here's the definition.
: ALIGN HERE @ ALIGNED HERE ! ;
I have a sense for when this would be used while
compiling, but don't really feel like trying to make up
a test for it. I'm content to see how it's used later
and test *that*.
And now, the moment we've all been waiting for: strings!
First, we need a new primitive to store the bytes of the
string. The word ',' (COMMA) does this for word-sized
(four bytes, the size of a 32-bit address) data. As with
the other byte-sized words, the name is prepended with a
letter 'C'.
So let's compare 'C,':
: C,
HERE @ C! ( store the character in the compiled image )
1 HERE +! ( increment HERE pointer by 1 byte )
;
...with the assembly definition of ',':
mov edi, [var_HERE]
stosd ; puts the value in eax at edi, increments edi
mov [var_HERE], edi
Same thing, only stosd increments edi (HERE) by 4 rather
than 1.
I've learned just to see these primitives in action.
Trying to use _some_ them on their own is challenging
and surpisingly unrewarding.
Next, S" is a word that stores the characters which
follow as a string until it hits the end quote: ".
Note that there has to be a space after S" or it
wouldn't be matched as the correct word. However, the
final quote is NOT a word, it's just the special
character S" is looking for.
Finally, S" pushes the address of the start of the
string onto the stack, followed by the length of the
string.
Okay, I think I've got all of that:
S" Hello World"
.
11
There we go. "Hello World" is 11 characters long.
DUP
HEX
.
804EADC
@ EMIT
H
It has been stored in memory at a particular address,
which means we can retrieve the string from there.
(I've EMITted the 'H' from Hello.)
In compile mode, it's stored in the word being compiled,
along with the LITSTRING word we defined in assembly.
Bug where in memory is this stored when in immediate
mode? Jones explains that this implementation stores it
in the same place where we compile words. So I guess we
can do this:
HERE @ .
804EADC
HERE @ @ EMIT
H
HERE @ 1 + @ EMIT
e
And since HERE clearly hasn't been moved to a point
after the string, that means it is temporary. It will be
overwritten as soon as we define a new word!
A close relative to S" (in fact, it uses S" in compiling
mode), is the ." word, which is Forth's print string
word. I'm excited to have this:
." Hello World!"
Hello World!
I've updated the README now that I can do this. :-)
And the good stuff keeps coming. The next words allow us
to define our own constants and variables. Not only
that, there's a handy explanation for each which would
have saved me some trouble before had I bothered to look
ahead (or, you know, actually learned the language I was
going to implement).
10 CONSTANT TEN
VARIABLE FOO
When TEN is executed, it leaves the integer 10 on the stack
When FOO is executed, it leaves the address of FOO on the stack
Let's try a constant:
42 CONSTANT answer
." The answer is " answer .
The answer is 42
And a variable:
VARIABLE foo
9000 foo !
foo @ .
9000
The definition of CONSTANT is pretty easy. It's
basically the same as (and functionally equivalent to)
this:
: answer 42 ;
." The answer is " answer .
The answer is 42
But variables needs to allocate some memory and store its
address. Two simple utilities aid in this:
ALLOT - advances HERE by the amount on the stack and
leaves the previous HERE on the stack
CELLS - multiplies the number on the stack by the
natural address size of the machine (4 bytes
for our 32-bit implementation)
What's neat about CELLS is it shows how you can build up
words that read rather like a natural language:
HEX
HERE @ .
804EB30
5 CELLS ALLOT
.
804EB30
HERE @ .
804EB44
And that looks like 20 bytes...hey, wait a second. Now
that I can easiy store values, I'll let Forth figure it
out:
HERE @ CONSTANT previous
5 CELLS ALLOT CONSTANT new
previous .
804EB78
new .
804EB98
new previous - .
20
VALUE is like VARIABLE, except that the result is a word
which leaves its value on the stack like a constant
instead of its address. Unlike a constant, it can be
updated by another word, TO.
10 VALUE foo create foo, set to 10
20 TO foo update foo to 20
Sounds good:
10 VALUE foo
foo .
10
20 TO foo
PARSE ERROR: 20 TO
Huh? Oh! Right between these two word definitions is
where I have it stop reading jonesforth.f on load.
Guess it's time to figure out the next bug in my port.
I'll try reading all lines again:
%assign __lines_of_jf_to_read 10000
PARSE ERROR: ( look it up in the dictionary )
>DFA
PARSE ERROR: ( look it up in the dictionary )
>DFA
Program received signal SIGSEGV, Segmentation fault.
_COMMA () at nasmjf.asm:688
688 stosd ; puts the value in eax at edi, increments edi
(gdb)
Okay, so the PARSE ERROR message prints out the word
that caused the trouble, here ">DFA", after a buffer's
worth of context.
Ha ha, how silly. I simply missed that word in my port.
It's a simple definition since we already have >CFA,
which returns the codeword for a word pointer. >DFA just
has to advance 4 bytes to the "data" (so-called threaded
word addresses) after the codeword.
Okay, that's defined. Now can we run all of
jonesforth.f?
Program received signal SIGSEGV, Segmentation fault.
_COMMA () at nasmjf.asm:697
697 stosd ; puts the value in eax at edi, increments edi
(gdb)
Drat! Nope, still segfaulting. And in COMMA (',') again.
I wonder if I've overflowed some memory limitation? How
to check that...hmmm... Well, comma stores where HERE
points, and that's in memory reserved with Linux's brk
syscall. How about I bump that from 0x16000 to 0x64000
bytes:
add eax, 0x64000 ; add our desired number of bytes to break addr
Nope, exact same error:
_COMMA () at nasmjf.asm:697
And same thing if I add another zero to the number. So
much for an easy answer. So I guess, ideally, I would
break when COMMA is trying to stosd at an address in edi
that is outside the reserved FORTH data area. But first
I need to know what that area is.
I'm going to add some custom FORTH variables to capture
this so it'll be easy to examine. I'll test in gdb
first:
Breakpoint 2, _start () at nasmjf.asm:103
103 xor ebx, ebx
104 mov eax, __NR_brk ; syscall brk
105 int 0x80
106 mov [var_HERE], eax ; eax has start addr of data segment
(gdb) p/x $eax
$4 = 0x804e000
107 mov [var_CSTART], eax ; store info: start address of data segment
108 add eax, 0x16000 ; add our desired number of bytes to break addr
(gdb) p/x (int)var_HERE
$5 = 0x804e000
(gdb) p/x (int)var_CSTART
$6 = 0x804e000
So far so good, CSTART contains the start address of the
data area.
109 mov ebx, eax ; reserve memory by setting this new break addr
(gdb) p/x $eax
$7 = 0x80b2000
110 mov [var_CEND], eax ; store info: end address of data segment
111 mov eax, __NR_brk ; syscall brk again
(gdb) p/x $eax
$8 = 0x80b2000
112 int 0x80
(gdb) p/x $eax
117 mov ecx, 0 ; LOADJF read only flag for open
(gdb) p/x $eax
$10 = 0x80b2000
(gdb) p/x (int)var_CEND
$11 = 0x80b2000
That looks right. CEND contains the end address of the
data segment. Did I get my requested 0x64000 bytes?
Let's use the new FORTH vars to find out:
(gdb) c
Continuing.
HEX CSTART @ . CEND @ . CEND @ CSTART @ - .
804E000 80B2000 64000
Looking good. Now to catch the bad address being used in
COMMA. I added two compares and an ".oops" label:
_COMMA:
mov edi, [var_HERE]
cmp edi, [var_CSTART]
jl .oops
cmp edi, [var_CEND]
jg .oops
stosd
mov [var_HERE], edi
ret
.oops:
nop
(gdb) break _COMMA.oops
Breakpoint 2 at 0x8049343: file nasmjf.asm, line 707.
(gdb) c
Continuing.
Breakpoint 2, _COMMA.oops () at nasmjf.asm:707
707 nop
(gdb) p var_HERE
'var_HERE' has unknown type; cast it to its declared type
(gdb) p (int)var_HERE
$1 = 61368
(gdb) p (int)var_CSTART
$2 = 134537216
(gdb) p (int)var_CEND
$3 = 134627328
Okay, so HERE has been set to an invalid address
somehow. I wish backtraces worked. Then I'd be able to
see which word this came from.
I'm tempted to divide and conquer...and it looks like
if I stop execution of jonesforth.f right before the
definition of SEE, it doesn't segfault.
So I'll continue testing 'til there and then tackle the
problem head-on.
Anyway, where was I? Oh yeah, VALUE!
To quote myself:
"VALUE is like VARIABLE, except that the result is a word
which leaves its value on the stack like a constant
instead of its address. Unlike a constant, it can be
updated by another word, TO."
10 VALUE foo create foo, set to 10
20 TO foo update foo to 20
Sounds good:
10 VALUE foo
foo .
10
20 TO foo
foo .
20
That's better. I'll continue with the word testing in
the next log.