colorful rat Ratfactor.com > Dave's Repos

nasmjf

A NASM assembler port of JONESFORTH
git clone http://ratfactor.com/repos/nasmjf/nasmjf.git

nasmjf/devlog/log07.txt

Download raw file: devlog/log07.txt

1 So the error in TCFA (which gets the code address for 2 a dictionary entry) has been fixed. 3 4 Now let's see why the COLON definition is crashing with 5 a segfault. 6 7 I type my usual test word ": FIVE 5 ;". 8 9 10 80 cld ; Clear the "direction flag" which means the string 11 (gdb) c 12 Continuing. 13 : FIVE 5 ; 14 15 (We'll skip everything that we now know works from log06.txt.) 16 17 First, INTERPRET checks STATE to see if we're executing 18 or compiling (we're executing). 19 20 code_INTERPRET.check_state () at nasmjf.asm:239 21 239 mov edx,[var_STATE] 22 240 test edx,edx 23 241 jz .execute ; Jump if executing. 24 25 Then we check to see if we are executing a literal value 26 or a word (it's a word ":" (COLON)). 27 28 code_INTERPRET.execute () at nasmjf.asm:254 29 254 mov ecx,[interpret_is_lit] ; Literal? 30 255 test ecx,ecx ; Literal? 31 256 jnz .do_literal 32 33 Now we jump to the code pointed to at the beginning 34 of COLON, which is DOCOL. (Increasingly, as we define 35 more words with other words rather than pure machine 36 language, they'll start with DOCOL.) 37 38 Here I double-check that we're about t jump to the 39 first pointe in COLON and that it points to DOCOL. 40 41 260 jmp [eax] 42 (gdb) info symbol $eax 43 COLON in section .data of /home/dave/nasmjf/nasmjf 44 (gdb) info symbol *$eax 45 DOCOL in section .text of /home/dave/nasmjf/nasmjf 46 47 48 Here's DOCOL. I may have stepped through this before 49 but it's worth looking at again since it's crucial 50 to understanding this type of Forth implementation. 51 52 Given the glacial pace at which I'm porting this, I need 53 lots of reminders! 54 55 This cheatsheet is currently in a comment at the top 56 of my jonesforth.asm: 57 58 esi - next forth word address to execute 59 ebp - return stack for forth word addresses 60 61 In the source, PUSHRSP and POPRSP usually handle the 62 ebp register, which we're using for the return stack 63 pointer (RSP). 64 65 DOCOL's first two lines are both from the PUSHRSP macro 66 (you can see that they have lower line numbers). 67 68 That handles ebp. 69 70 Then DOCOL advances esi to the next word pointer 71 (in COLON after DOCOL itself). 72 73 DOCOL () at nasmjf.asm:40 74 40 lea ebp, [ebp-4] ; "load effective address" of next stack position 75 41 mov [ebp], %1 ; "push" the register value to the address at ebp 76 70 add eax, 4 ; eax points to DOCOL (me!) in word definition. Go to next. 77 71 mov esi, eax ; Put the next word pointer into esi 78 79 Let's see if that's right. Here's the entire definition 80 of COLON. We don't see DOCOL here because it's inserted 81 by the DEFWORD macro, but it comes right before FWORD. 82 (By the way, FWORD is just WORD, but I can't have a 83 symbol called "WORD" in NASM because it's a reserved 84 keyword.) 85 86 DEFWORD ":",1,,COLON 87 dd FWORD 88 dd CREATE 89 dd LIT, DOCOL, COMMA 90 dd LATEST, FETCH, HIDDEN 91 dd RBRAC 92 dd EXIT 93 94 I think it's super-cool that Forth exposes all of the 95 primitives needed to create (or replace!) the COLON 96 compiler so can you can use them in the interpreter 97 yourself. Truly a no-holds-barred language. 98 99 At any rate, the pointer in esi should be the next one 100 in COLON and it should point to WORD (well, FWORD). 101 102 (gdb) info symbol $esi 103 COLON + 4 in section .data of /home/dave/nasmjf/nasmjf 104 (gdb) info symbol *$esi 105 FWORD in section .data of /home/dave/nasmjf/nasmjf 106 107 Great! And then the NEXT macro puts the address pointed 108 to by esi into eax, increments esi to next word pointer, 109 and jumps to the address *pointed to* by the address now 110 in eax. HAVE YOU GOT THAT??? 111 112 This is made even more confusing by the lodsd instruction. 113 The mnemonic stands for "load string doubleword". The idea 114 is that you can use it to load a "string" of values 115 by repeatedly calling lodsd (or loadsb for byte, etc.). 116 What it actually does is load 4 bytes from the address at 117 esi into eax and then increments esi by 4. 118 119 (By the way, I've come to _loathe_ the terms "word", "double" 120 "long", etc. I'm okay with "byte" because it's come to 121 mean "8 bits" pretty universally in the year 2022. If 122 I were king, we would just use the byte count for these sizes 123 like: 124 b = 1 byte = 8 bits 125 b2 = 2 bytes = 16 bits 126 b4 = 4 bytes = 32 bits 127 b8 = 8 bytes = 64 bits 128 and "lodsd" would become "lodsb4". Well, lods* would probably 129 have a better mnemonic. But you get the idea. Anyway, harping 130 on x86 is, like, a full-time job and it ain't gonna get this 131 Forth port done.) 132 133 134 135 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 136 137 And did it work? 138 139 28 jmp [eax] ; Jump to whatever code we're now pointing at. 140 (gdb) info symbol *$eax 141 code_FWORD in section .text of /home/dave/nasmjf/nasmjf 142 143 Yup! It's jumping to WORD. 144 145 (As another aside, it occurs to me that "WORD" is 146 a really confusing name for this Forth word - it just 147 tokenizes a space-delimited string from input. Otherwise, 148 it doesn't have anything to do with Forth's concept 149 of "words" as executable code stored in a "dictionary".) 150 151 code_FWORD () at nasmjf.asm:302 152 302 call _WORD 153 154 Cool, so now I'll skip stepping through WORD/KEY as 155 we gather the string "FIVE" (the name of the word I'm 156 trying to define) from input. 157 158 _WORD.skip_non_words () at nasmjf.asm:309 159 309 call _KEY ; get next key, returned in %eax 160 ... 161 325 mov ecx, edi ; return it 162 163 I have to admit, I don't understand why I can't access 164 the memory at word_buffer. 165 166 (gdb) x/s (int)word_buffer 167 0x45564946: <error: Cannot access memory at address 0x45564946> 168 169 Wait a dang second, 45 56 49 46 isn't an address, it's 170 the actual characters 'E' 'V' 'I' 'F' (or "FIVE" stored 171 little-endian)! 172 173 Grrrr... another gripe - the way GDB treats symbols 174 just confuses me. I like how NASM does it: foo is 175 always an address, [foo] is always the value AT that 176 address. It's very consistent. 177 178 Next night: gosh darn it! I remembered. You gotta put 179 a '&' in front of "variables" to get the address...and 180 that includes when you're trying to use the 'examine' 181 ('x') command to format and view memory using the variable 182 name. 183 184 (gdb) p &word_buffer 185 $1 = (<data variable, no debug info> *) 0x804a068 <word_buffer> 186 (gdb) x/4c &word_buffer 187 0x804a068 <word_buffer>: 70 'F' 73 'I' 86 'V' 69 'E' 188 189 At any rate, looks good. WORD returns "FIVE". 190 191 code_FWORD () at nasmjf.asm:303 192 303 push edi ; push base address 193 304 push ecx ; push length 194 195 And with any luck, now we'll be headed to the next word in 196 the COLON definition, CREATE. 197 198 code_FWORD () at nasmjf.asm:27 199 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 200 28 jmp [eax] ; Jump to whatever code we're now pointing at. 201 (gdb) info symbol $eax 202 CREATE in section .data of /home/dave/nasmjf/nasmjf 203 (gdb) info symbol *$eax 204 code_CREATE in section .text of /home/dave/nasmjf/nasmjf 205 206 Yay! 207 208 Now CREATE makes the header (dictionary link, name, flags) 209 portion of the word we're compiling. 210 211 In the next log, we'll see if CREATE works and then try to 212 track down which word is causing a segfault when COLON runs.