Dave's nasmjf Dev Log 22
Created: 2022-07-22
This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.
We'll pick up right where we left off with a convenience word to be used with VALUEs that works just like += in "C-like" languages: +TO adds to the value 50 VALUE foo foo . 50 100 TO foo foo . 100 10 +TO foo foo . 110 And now some exciting introspection words: ID. prints name of word at address ?HIDDEN returns truthy value if word hidden ?IMMEDATE returns truthy value if word immediate LATEST @ ID. foo LATEST @ DUP DUP ID. SPACE ?IMMEDIATE . ?HIDDEN . foo 0 0 LATEST @ @ DUP DUP ID. SPACE ?IMMEDIATE . ?HIDDEN . CFA> 0 0 LATEST @ @ @ DUP DUP ID. SPACE ?IMMEDIATE . ?HIDDEN . ENDCASE 128 0 And WORDS prints all non-hidden words WORDS foo CFA> ENDCASE ENDOF OF CASE DUMP FORGET WORDS ?IMMEDIATE ?HIDDEN ID. +TO TO VALUE VARIABLE CELLS ALLOT CONSTANT ." S" C, ALIGN ALIGNED DEPTH WITHIN ? U. . .R U.R UWIDTH .S U. HEX DECIMAL SPACES PIC K TUCK NIP ( UNLESS REPEAT WHILE AGAIN UNTIL BEGIN ELSE THEN IF RECURSE [COMPILE] '.' '-' '0' 'A' '" ' ')' '(' ';' ':' LITERAL NOT FALSE TRUE NEGATE SPACE CR BL '\n' MOD / LATEST CEND CSTART BASE S0 HE RE STATE O_NONBLOCK O_APPEND O_TRUNC O_EXCL O_CREAT O_RDWR O_WRONLY O_RDONLY SYS_BRK SYS_CREAT SYS_W RITE SYS_READ SYS_CLOSE SYS_OPEN SYS_EXIT F_LENMASK F_HIDDEN F_IMMED DOCOL R0 VERSION SYSCALL0 SYSCA LL1 SYSCALL2 SYSCALL3 EXECUTE CHAR HIDE IMMEDIATE DSP! DSP@ RDROP RSP! RSP@ R> >R CMOVE C@C! C@ C! - ! +! @ ! INVERT XOR OR AND 0>= 0<= 0> 0< 0<> 0= >= <= > < <> = /MOD * - + 4- 4+ 1- 1+ ?DUP 2SWAP 2DU P 2DROP -ROT ROT OVER DUP SWAP DROP PRINTWORD . EMIT ; : EXIT HIDDEN ] [ , CREATE FIND LIT NUMBER >D FA >CFA KEY WORD INTERPRET TELL LITSTRING 0BRANCH BRANCH ' gtfo QUIT The next one is pretty crazy. You would think that this would just hide or "forget" the word CR. But it doesn't, it forgets CR and *everything after* it! FORGET CR So now WORDS should only show up to BL: WORDS PARSE ERROR: WORDS Huh? Oh, ha ha ha. Right. Time to reload. Let's see that again, but maybe not "forget" quite so much: LATEST @ ID. CFA> : foo ." Hello" ; foo Program received signal SIGSEGV, Segmentation fault. code_LITSTRING () at nasmjf.asm:28 28 jmp [eax] ; Jump to whatever code we're now pointing at. Argh! I can hardly believe this is the first time I've tried compiling a word that contains a print string statement. But I guess so. So my LITSTRING assembly port of the original must contain a bug? Two nights later: debugging this with GDB *sucks* because LITSTRING is written in assembly, but the higher-level S" and ." are written in Forth, so it's really hard to break right when I want to. On top of that, trying to examine Forth constructs with GDB feels like trying to eat soup with a fork. I know it's a programmable fork and I could probably make it an awesome soup fork, but I'd rather work on my soup instead of my utensils. Hmmm... you know what? The next word I was going to test after FORGET is DUMP, which is supposed to give a hex dump of memory. Maybe I can use it to help debug LITSTRING and friends? First, let's see how it works: DUMP ( addr len -- ) DUMP is used to dump out the contents of memory, in the 'traditional' hexdump format. So we give it an address of memory and a number of bytes to print. I wonder it will look like if I create a simple word and display it? This one has some repetition so maybe I can see the pattern: : foobar 5 . 5 . ; foobar 5 5 And let's dump 64 bytes of memory starting at the laatest definition: LATEST @ 64 DUMP 804EF20 C4 EE 4 8 6 66 6F 6F 62 61 72 0 5A 90 4 8 .....foobar.Z... 804EF30 38 A1 4 8 5 0 0 0 2C E7 4 8 38 A1 4 8 8.......,...8... 804EF40 5 0 0 0 2C E7 4 8 9C A1 4 8 0 0 0 0 ....,........... 804EF50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ................ Gasp! Oh my goodness! It's beautiful. So handy. And DUMP is less than 50 lines of Forth? Okay, my bitter annoyance is melting away. I can totally read my word. The little-endian data is annoying, but C4 EE 4 8 is 804EEC4, so that'll be the link to the previous word. I really wish these were grouped by 4-byte chunks and put into big-endian order...and then run ID. on each recognizable address (since I know they're all 0804xxxx)... Okay, after midnight now. I figured it out after verifying every single statement in the compile-mode execution of S" and wouldn't you know it was the very last statement that was wrecking everything? At the very end, it calls ALIGN to get the HERE pointer to the next 4-byte boundary (after writing an address, length, and string to memory). ALIGN calls ALIGNED which uses "3 INVERT" to create a mask to zero out the last two bits of an address. Well, it turns out my INVERT word definition was doing this: not word [esp] which gives you a 16 bit number. Instead, I needed this: not dword [esp] to operate on the full 32 bit number. So let me take another moment here to rant for a second: x86 terminoloy blows! A "word" should be the natural address size on the architecture. But in x86 land, "word" is stuck at 16 bits! So 32 bits is a dword and 64 bits is a qword and so on. Argh! This is not the first time I've been bitten by this garbage. ************************ *** "word" is a lie! *** ************************ This would *definitely* explain all the segfaults since ALIGN was masking off half of HERE and setting it to an invalid address! Do I dare get my hopes up? Yeah, why not... JONESFORTH VERSION 1 20643 CELLS REMAINING OK Yes! It works! : foo ." Hello World!!!" ; foo Hello World!!! Hello World, indeed. As far as I know, my NASM interpreter port is complete now. I'll continue with the word testing. And I have a lot of cleanup to do. But this is a great night for nasmjf! I'll start the next log with CASE statement testing