Dave's nasmjf Dev Log 21
Created: 2022-07-22
This is an entry in my developer’s log series written between December 2021 and August 2022 (started project in September). I wrote these as I completed my port of the JONESFORTH assembly language Forth interpreter.
The last log saw yet _another_ bug in my DEFVAR macro. Sheesh. Hopefully I have variables working correctly now? As I mentioned last night, the next words look like a real mixture of things. Let's jump into it: WITHIN is a conditional that tests if a number is between two other numbers. This highlights the advantage of such a flexible language (and the simplicity of the syntax). I'll format this slightly to make a bit of a truth table out of it to demonstrate how within works: 1 2 3 WITHIN . 0 2 3 1 WITHIN . 0 3 1 2 WITHIN . 0 2 1 3 WITHIN . 1 <-- 2 is within 1 and 3 DEPTH gives us the depth of the stack: 1 2 .S 2 1 DEPTH . 8 Huh? Oh, it's in bytes, not number of items! ALIGNED rounds a number up to the next multiple of 4, which aligns addresses on 32-bit (4 byte) computers. It's easy to test: 1 ALIGNED . 4 7 ALIGNED . 8 ALIGN performs ALIGNED on HERE. Here's the definition. : ALIGN HERE @ ALIGNED HERE ! ; I have a sense for when this would be used while compiling, but don't really feel like trying to make up a test for it. I'm content to see how it's used later and test *that*. And now, the moment we've all been waiting for: strings! First, we need a new primitive to store the bytes of the string. The word ',' (COMMA) does this for word-sized (four bytes, the size of a 32-bit address) data. As with the other byte-sized words, the name is prepended with a letter 'C'. So let's compare 'C,': : C, HERE @ C! ( store the character in the compiled image ) 1 HERE +! ( increment HERE pointer by 1 byte ) ; ...with the assembly definition of ',': mov edi, [var_HERE] stosd ; puts the value in eax at edi, increments edi mov [var_HERE], edi Same thing, only stosd increments edi (HERE) by 4 rather than 1. I've learned just to see these primitives in action. Trying to use _some_ them on their own is challenging and surpisingly unrewarding. Next, S" is a word that stores the characters which follow as a string until it hits the end quote: ". Note that there has to be a space after S" or it wouldn't be matched as the correct word. However, the final quote is NOT a word, it's just the special character S" is looking for. Finally, S" pushes the address of the start of the string onto the stack, followed by the length of the string. Okay, I think I've got all of that: S" Hello World" . 11 There we go. "Hello World" is 11 characters long. DUP HEX . 804EADC @ EMIT H It has been stored in memory at a particular address, which means we can retrieve the string from there. (I've EMITted the 'H' from Hello.) In compile mode, it's stored in the word being compiled, along with the LITSTRING word we defined in assembly. Bug where in memory is this stored when in immediate mode? Jones explains that this implementation stores it in the same place where we compile words. So I guess we can do this: HERE @ . 804EADC HERE @ @ EMIT H HERE @ 1 + @ EMIT e And since HERE clearly hasn't been moved to a point after the string, that means it is temporary. It will be overwritten as soon as we define a new word! A close relative to S" (in fact, it uses S" in compiling mode), is the ." word, which is Forth's print string word. I'm excited to have this: ." Hello World!" Hello World! I've updated the README now that I can do this. :-) And the good stuff keeps coming. The next words allow us to define our own constants and variables. Not only that, there's a handy explanation for each which would have saved me some trouble before had I bothered to look ahead (or, you know, actually learned the language I was going to implement). 10 CONSTANT TEN VARIABLE FOO When TEN is executed, it leaves the integer 10 on the stack When FOO is executed, it leaves the address of FOO on the stack Let's try a constant: 42 CONSTANT answer ." The answer is " answer . The answer is 42 And a variable: VARIABLE foo 9000 foo ! foo @ . 9000 The definition of CONSTANT is pretty easy. It's basically the same as (and functionally equivalent to) this: : answer 42 ; ." The answer is " answer . The answer is 42 But variables needs to allocate some memory and store its address. Two simple utilities aid in this: ALLOT - advances HERE by the amount on the stack and leaves the previous HERE on the stack CELLS - multiplies the number on the stack by the natural address size of the machine (4 bytes for our 32-bit implementation) What's neat about CELLS is it shows how you can build up words that read rather like a natural language: HEX HERE @ . 804EB30 5 CELLS ALLOT . 804EB30 HERE @ . 804EB44 And that looks like 20 bytes...hey, wait a second. Now that I can easiy store values, I'll let Forth figure it out: HERE @ CONSTANT previous 5 CELLS ALLOT CONSTANT new previous . 804EB78 new . 804EB98 new previous - . 20 VALUE is like VARIABLE, except that the result is a word which leaves its value on the stack like a constant instead of its address. Unlike a constant, it can be updated by another word, TO. 10 VALUE foo create foo, set to 10 20 TO foo update foo to 20 Sounds good: 10 VALUE foo foo . 10 20 TO foo PARSE ERROR: 20 TO Huh? Oh! Right between these two word definitions is where I have it stop reading jonesforth.f on load. Guess it's time to figure out the next bug in my port. I'll try reading all lines again: %assign __lines_of_jf_to_read 10000 PARSE ERROR: ( look it up in the dictionary ) >DFA PARSE ERROR: ( look it up in the dictionary ) >DFA Program received signal SIGSEGV, Segmentation fault. _COMMA () at nasmjf.asm:688 688 stosd ; puts the value in eax at edi, increments edi (gdb) Okay, so the PARSE ERROR message prints out the word that caused the trouble, here ">DFA", after a buffer's worth of context. Ha ha, how silly. I simply missed that word in my port. It's a simple definition since we already have >CFA, which returns the codeword for a word pointer. >DFA just has to advance 4 bytes to the "data" (so-called threaded word addresses) after the codeword. Okay, that's defined. Now can we run all of jonesforth.f? Program received signal SIGSEGV, Segmentation fault. _COMMA () at nasmjf.asm:697 697 stosd ; puts the value in eax at edi, increments edi (gdb) Drat! Nope, still segfaulting. And in COMMA (',') again. I wonder if I've overflowed some memory limitation? How to check that...hmmm... Well, comma stores where HERE points, and that's in memory reserved with Linux's brk syscall. How about I bump that from 0x16000 to 0x64000 bytes: add eax, 0x64000 ; add our desired number of bytes to break addr Nope, exact same error: _COMMA () at nasmjf.asm:697 And same thing if I add another zero to the number. So much for an easy answer. So I guess, ideally, I would break when COMMA is trying to stosd at an address in edi that is outside the reserved FORTH data area. But first I need to know what that area is. I'm going to add some custom FORTH variables to capture this so it'll be easy to examine. I'll test in gdb first: Breakpoint 2, _start () at nasmjf.asm:103 103 xor ebx, ebx 104 mov eax, __NR_brk ; syscall brk 105 int 0x80 106 mov [var_HERE], eax ; eax has start addr of data segment (gdb) p/x $eax $4 = 0x804e000 107 mov [var_CSTART], eax ; store info: start address of data segment 108 add eax, 0x16000 ; add our desired number of bytes to break addr (gdb) p/x (int)var_HERE $5 = 0x804e000 (gdb) p/x (int)var_CSTART $6 = 0x804e000 So far so good, CSTART contains the start address of the data area. 109 mov ebx, eax ; reserve memory by setting this new break addr (gdb) p/x $eax $7 = 0x80b2000 110 mov [var_CEND], eax ; store info: end address of data segment 111 mov eax, __NR_brk ; syscall brk again (gdb) p/x $eax $8 = 0x80b2000 112 int 0x80 (gdb) p/x $eax 117 mov ecx, 0 ; LOADJF read only flag for open (gdb) p/x $eax $10 = 0x80b2000 (gdb) p/x (int)var_CEND $11 = 0x80b2000 That looks right. CEND contains the end address of the data segment. Did I get my requested 0x64000 bytes? Let's use the new FORTH vars to find out: (gdb) c Continuing. HEX CSTART @ . CEND @ . CEND @ CSTART @ - . 804E000 80B2000 64000 Looking good. Now to catch the bad address being used in COMMA. I added two compares and an ".oops" label: _COMMA: mov edi, [var_HERE] cmp edi, [var_CSTART] jl .oops cmp edi, [var_CEND] jg .oops stosd mov [var_HERE], edi ret .oops: nop (gdb) break _COMMA.oops Breakpoint 2 at 0x8049343: file nasmjf.asm, line 707. (gdb) c Continuing. Breakpoint 2, _COMMA.oops () at nasmjf.asm:707 707 nop (gdb) p var_HERE 'var_HERE' has unknown type; cast it to its declared type (gdb) p (int)var_HERE $1 = 61368 (gdb) p (int)var_CSTART $2 = 134537216 (gdb) p (int)var_CEND $3 = 134627328 Okay, so HERE has been set to an invalid address somehow. I wish backtraces worked. Then I'd be able to see which word this came from. I'm tempted to divide and conquer...and it looks like if I stop execution of jonesforth.f right before the definition of SEE, it doesn't segfault. So I'll continue testing 'til there and then tackle the problem head-on. Anyway, where was I? Oh yeah, VALUE! To quote myself: "VALUE is like VARIABLE, except that the result is a word which leaves its value on the stack like a constant instead of its address. Unlike a constant, it can be updated by another word, TO." 10 VALUE foo create foo, set to 10 20 TO foo update foo to 20 Sounds good: 10 VALUE foo foo . 10 20 TO foo foo . 20 That's better. I'll continue with the word testing in the next log.