Dave's nasmjf Dev Log 22
Created: 2022-07-22
This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.
We'll pick up right where we left off with a convenience
word to be used with VALUEs that works just like += in
"C-like" languages:
+TO adds to the value
50 VALUE foo foo . 50
100 TO foo foo . 100
10 +TO foo foo . 110
And now some exciting introspection words:
ID. prints name of word at address
?HIDDEN returns truthy value if word hidden
?IMMEDATE returns truthy value if word immediate
LATEST @ ID.
foo
LATEST @ DUP DUP ID. SPACE ?IMMEDIATE . ?HIDDEN .
foo 0 0
LATEST @ @ DUP DUP ID. SPACE ?IMMEDIATE . ?HIDDEN .
CFA> 0 0
LATEST @ @ @ DUP DUP ID. SPACE ?IMMEDIATE . ?HIDDEN .
ENDCASE 128 0
And
WORDS prints all non-hidden words
WORDS
foo CFA> ENDCASE ENDOF OF CASE DUMP FORGET WORDS ?IMMEDIATE ?HIDDEN ID. +TO TO VALUE VARIABLE CELLS
ALLOT CONSTANT ." S" C, ALIGN ALIGNED DEPTH WITHIN ? U. . .R U.R UWIDTH .S U. HEX DECIMAL SPACES PIC
K TUCK NIP ( UNLESS REPEAT WHILE AGAIN UNTIL BEGIN ELSE THEN IF RECURSE [COMPILE] '.' '-' '0' 'A' '"
' ')' '(' ';' ':' LITERAL NOT FALSE TRUE NEGATE SPACE CR BL '\n' MOD / LATEST CEND CSTART BASE S0 HE
RE STATE O_NONBLOCK O_APPEND O_TRUNC O_EXCL O_CREAT O_RDWR O_WRONLY O_RDONLY SYS_BRK SYS_CREAT SYS_W
RITE SYS_READ SYS_CLOSE SYS_OPEN SYS_EXIT F_LENMASK F_HIDDEN F_IMMED DOCOL R0 VERSION SYSCALL0 SYSCA
LL1 SYSCALL2 SYSCALL3 EXECUTE CHAR HIDE IMMEDIATE DSP! DSP@ RDROP RSP! RSP@ R> >R CMOVE C@C! C@ C! -
! +! @ ! INVERT XOR OR AND 0>= 0<= 0> 0< 0<> 0= >= <= > < <> = /MOD * - + 4- 4+ 1- 1+ ?DUP 2SWAP 2DU
P 2DROP -ROT ROT OVER DUP SWAP DROP PRINTWORD . EMIT ; : EXIT HIDDEN ] [ , CREATE FIND LIT NUMBER >D
FA >CFA KEY WORD INTERPRET TELL LITSTRING 0BRANCH BRANCH ' gtfo QUIT
The next one is pretty crazy. You would think that this
would just hide or "forget" the word CR. But it doesn't,
it forgets CR and *everything after* it!
FORGET CR
So now WORDS should only show up to BL:
WORDS
PARSE ERROR: WORDS
Huh? Oh, ha ha ha. Right.
Time to reload.
Let's see that again, but maybe not "forget" quite so
much:
LATEST @ ID.
CFA>
: foo ." Hello" ;
foo
Program received signal SIGSEGV, Segmentation fault.
code_LITSTRING () at nasmjf.asm:28
28 jmp [eax] ; Jump to whatever code we're now pointing at.
Argh! I can hardly believe this is the first time I've
tried compiling a word that contains a print string
statement. But I guess so. So my LITSTRING assembly
port of the original must contain a bug?
Two nights later: debugging this with GDB *sucks*
because LITSTRING is written in assembly, but the
higher-level S" and ." are written in Forth, so it's
really hard to break right when I want to. On top of
that, trying to examine Forth constructs with GDB feels
like trying to eat soup with a fork. I know it's a
programmable fork and I could probably make it an
awesome soup fork, but I'd rather work on my soup
instead of my utensils.
Hmmm... you know what? The next word I was going to test
after FORGET is DUMP, which is supposed to give a hex
dump of memory.
Maybe I can use it to help debug LITSTRING and friends?
First, let's see how it works:
DUMP ( addr len -- )
DUMP is used to dump out the contents of memory,
in the 'traditional' hexdump format.
So we give it an address of memory and a number of bytes
to print. I wonder it will look like if I create a
simple word and display it?
This one has some repetition so maybe I can see the
pattern:
: foobar 5 . 5 . ;
foobar
5 5
And let's dump 64 bytes of memory starting at the
laatest definition:
LATEST @ 64 DUMP
804EF20 C4 EE 4 8 6 66 6F 6F 62 61 72 0 5A 90 4 8 .....foobar.Z...
804EF30 38 A1 4 8 5 0 0 0 2C E7 4 8 38 A1 4 8 8.......,...8...
804EF40 5 0 0 0 2C E7 4 8 9C A1 4 8 0 0 0 0 ....,...........
804EF50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ................
Gasp! Oh my goodness! It's beautiful. So handy. And DUMP
is less than 50 lines of Forth?
Okay, my bitter annoyance is melting away.
I can totally read my word.
The little-endian data is annoying, but
C4 EE 4 8
is 804EEC4, so that'll be the link to the previous word.
I really wish these were grouped by 4-byte chunks and
put into big-endian order...and then run ID. on each
recognizable address (since I know they're all
0804xxxx)...
Okay, after midnight now.
I figured it out after verifying every single statement
in the compile-mode execution of S" and wouldn't you
know it was the very last statement that was wrecking
everything?
At the very end, it calls ALIGN to get the HERE pointer
to the next 4-byte boundary (after writing an address,
length, and string to memory).
ALIGN calls ALIGNED which uses "3 INVERT" to create a
mask to zero out the last two bits of an address.
Well, it turns out my INVERT word definition was doing
this:
not word [esp]
which gives you a 16 bit number. Instead, I needed this:
not dword [esp]
to operate on the full 32 bit number.
So let me take another moment here to rant for a second:
x86 terminoloy blows! A "word" should be the natural
address size on the architecture. But in x86 land,
"word" is stuck at 16 bits! So 32 bits is a dword and 64
bits is a qword and so on. Argh! This is not the first
time I've been bitten by this garbage.
************************
*** "word" is a lie! ***
************************
This would *definitely* explain all the segfaults since
ALIGN was masking off half of HERE and setting it to an
invalid address!
Do I dare get my hopes up?
Yeah, why not...
JONESFORTH VERSION 1
20643 CELLS REMAINING
OK
Yes! It works!
: foo ." Hello World!!!" ;
foo
Hello World!!!
Hello World, indeed.
As far as I know, my NASM interpreter port is complete
now. I'll continue with the word testing. And I have a
lot of cleanup to do. But this is a great night for
nasmjf!
I'll start the next log with CASE statement testing