colorful rat Ratfactor.com > Dave's Repos

nasmjf

A NASM assembler port of JONESFORTH
git clone http://ratfactor.com/repos/nasmjf/nasmjf.git

nasmjf/devlog/log16.txt

Download raw file: devlog/log16.txt

1 Warning, the examples with variables in this log are 2 all wrong. This update explains: 3 4 !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!! 5 ! In log19.txt, I realize that my variable ! 6 ! handling is wrong. Variables should leave ! 7 ! their addresses on the stack, not their ! 8 ! values! We need FETCH to get the value from ! 9 ! the address! ! 10 !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!! 11 12 Two new words add to the word "compiler" abilities of 13 the interpreter: 14 15 IMMEDIATE sets the latest word to be "immediate" 16 HIDE takes the next "word" of input, looks it up 17 in the dictionary, and then sets that word 18 to be hidden (via the word HIDDEN) 19 20 HIDE seems the easiest to test, so we'll start with that: 21 22 : emit2 EMIT EMIT ; 23 66 65 emit2 24 AB 25 HIDE emit2 26 66 65 emit2 27 PARSE ERROR: 66 65 emit2 28 29 That worked: we get the parse error because emit2 has been 30 hidden and is no longer found in the dictionary. 31 32 The HIDDEN word used by HIDE actually toggles the hidden state, 33 so can we call HIDE again to unhide the word? 34 35 HIDE emit2 36 37 Program received signal SIGSEGV, Segmentation fault. 38 code_HIDDEN () at nasmjf.asm:635 39 635 xor [edi], word F_HIDDEN ; Toggle the HIDDEN bit in place. 40 41 Oh, ha ha, no, of course not. It's hidden, so HIDE can't 42 find it (and since there's absolutely no error checking, 43 we crash trying to toggle the bit in some random memory 44 location). 45 46 I guess we could use LATEST and HIDDEN to manually toggle 47 it back, but I can't be bothered tonight. 48 49 Onward to IMMEDIATE. 50 51 : ab 66 65 EMIT EMIT ; 52 IMMEDIATE 53 : foo 1000 . ; 54 foo 55 foo 56 55 EMIT 57 ; 58 59 ^C 60 Program received signal SIGINT, Interrupt. 61 _WORD.skip_non_words () at nasmjf.asm:339 62 339 call _KEY ; get next key, returned in %eax 63 64 Something went wrong. I had to Ctrl+C to end the 65 program. It was merrily taking input, but nothing 66 would execute, not even Ctrl+D to end the input 67 and exit. 68 69 Let's try that again and verify we're toggling the 70 right word... 71 72 (gdb) r 73 Starting program: /home/dave/nasmjf/nasmjf 74 (gdb) c 75 Continuing. 76 LATEST 4 + C@ . 77 6 78 LATEST 5 + C@ EMIT 79 L 80 81 Okay, just sanity checking LATEST - it points to 82 a word with 6 letters in the name and starts with 83 the letter "L" (it's LATEST itself). 84 85 I'll define my 'ab' word again, try it out (it should 86 print the letters "AB"), and check LATEST again... 87 88 : ab 66 65 EMIT EMIT ; 89 ab 90 AB 91 LATEST 4 + C@ . 92 2 93 LATEST 5 + C@ EMIT 94 a 95 IMMEDIATE 96 ab 97 98 Drat! Then it locked up again. So IMMEDIATE is 99 definitely not working right. 100 101 Next night: okay, let's see what's going on... 102 103 (gdb) break code_IMMEDIATE 104 Breakpoint 2 at 0x80494ec: file nasmjf.asm, line 1097. 105 (gdb) c 106 Continuing. 107 : ab 66 65 EMIT EMIT ; 108 ab 109 AB 110 IMMEDIATE 111 112 Breakpoint 2, code_IMMEDIATE () at nasmjf.asm:1097 113 (gdb) p/x (int)var_LATEST 114 $1 = 0x804e000 115 (gdb) x/10c (int)var_LATEST 116 0x804e000: ... 2 '\002' 97 'a' 98 'b' ... 117 118 So that's right - LATEST points at word 'ab'... 119 120 1098 add edi, 4 ; Point to name/flags byte. 121 1099 xor byte [edi], F_IMMED ; Toggle the IMMED bit. 122 (gdb) p/x $edi 123 $2 = 0x804a6b0 124 125 That's a dead giveaway, the address in register 126 edi should now be LATEST + 4. But it's actually 127 the _address_ of LATEST + 4! 128 129 (gdb) p/x (int)var_LATEST 130 $3 = 0x804e000 131 132 It still takes me a bit before I see it... 133 134 (gdb) disass 1099 135 No function contains specified address. 136 (gdb) disass code_IMMEDIATE 137 Dump of assembler code for function code_IMMEDIATE: 138 0x080494ec <+0>: mov edi,0x804a6ac <--- should be PTR 139 0x080494f1 <+5>: add edi,0x4 140 => 0x080494f4 <+8>: xor BYTE PTR [edi],0x80 141 0x080494f7 <+11>: lods eax,DWORD PTR ds:[esi] 142 0x080494f8 <+12>: jmp DWORD PTR [eax] 143 End of assembler dump. 144 145 I finally see it. 146 147 I have 148 149 mov edi, var_LATEST 150 151 where I should have 152 153 mov edi, [var_LATEST] 154 155 (so of course it wasn't working after that. LATEST was 156 incremented and no longer pointed at word. All further 157 interpretation would fail to match!) 158 159 With that fixed, it should work... 160 161 (gdb) load 162 (gdb) r 163 Starting program: /home/dave/nasmjf/nasmjf 164 : ab 66 65 EMIT EMIT ; 165 ab 166 AB 167 IMMEDIATE 168 169 So now 'ab' should execute as soon as the interpreter 170 sees it, even in compile mode: 171 172 : five 5 . ab ; 173 AB 174 five 175 5 176 177 Yeah! The call to 'ab' executed at "compile time" rather 178 than "run time" for the new word 'five'. Using this, we 179 could add new language features to FORTH in FORTH. 180 181 Next, the TICK (single quote ') word gets the address 182 of a word (supplied after the ' so it doesn't execute. 183 this is the same trick LIT uses). 184 185 This implementation can only work at compile time because 186 the interpreter needs to turn the word that follows as 187 a 4-byte address for ' to be able to read and then hope 188 over that value. Just for fun, let's try to print the 189 address of the EMIT word outside of the compile state: 190 191 ' EMIT . 192 193 Program received signal SIGSEGV, Segmentation fault. 194 195 See? 196 197 Now let's use it the same way, but in a new compiled word: 198 199 : addrofemit ' EMIT . ; 200 addrofemit 201 134521260 202 203 Looks like it worked, but is that address correct? 204 205 (gdb) info addr EMIT 206 Symbol "EMIT" is at 0x804a1ac in a file compiled without debugging. 207 (gdb) p/d 0x804a1ac 208 $1 = 134521260 209 210 Yup! 211 212 Then the next night, I've got a really exciting one, 213 BRANCH0. 214 215 But first, I'm trying to figure out how to even test 216 BRANCH, let alone its conditional big brother! 217 218 I even worked it out on paper the next morning, and I'm 219 still not seeing why this doesn't work: 220 221 : foo 65 EMIT BRANCH -12 ; 222 foo 223 A 224 Program received signal SIGSEGV, Segmentation fault. 225 code_BRANCH () at nasmjf.asm:27 226 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 227 228 By my calculations, esi starts off pointing at the 229 offset number's instruction (-12), then we should be 230 branching back to "65": 231 232 0 "-12" 233 -4 BRANCH 234 -8 EMIT 235 -12 "65" 236 237 And I've had a bummer of a time trying to step through 238 it by breaking on BRANCH because that word is used 239 (correctly) as part of the interpreter loop. 240 241 So I'm going to copy BRANCH with the silly name BRUNCH 242 and see why it's not correct! 243 244 DEFCODE "BRUNCH",6,,BRUNCH 245 add esi, [esi] 246 NEXT 247 248 Should be pretty simple, right? It's just a one-liner! 249 250 (gdb) break code_BRUNCH 251 (gdb) c 252 Continuing. 253 : foo 65 EMIT BRUNCH -12 ; 254 foo 255 A 256 Breakpoint 2, code_BRUNCH () at nasmjf.asm:251 257 251 add esi, [esi] ; add the offset to the instruction pointer 258 259 Okay, now let's thoroughly examine this. We're going to 260 add the negative number stored where esi points to FROM 261 esi. Where does esi point? 262 263 (gdb) p/x $esi 264 $3 = 0x804e01c 265 (gdb) x/x $esi 266 0x804e01c: 0x0804a0f0 267 (gdb) info sym *$esi 268 LIT in section .data of /home/dave/nasmjf/nasmjf 269 (gdb) x/b $esi+4 270 0x804e020: -12 '\364' 271 272 Yup, we can see that esi points to the address of LIT 273 followed by the value -12. As expected. 274 275 (gdb) s 276 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 277 278 Now that's run, where does esi point now? 279 280 (gdb) info sym *$esi 281 Cannot access memory at address 0x1009810c 282 (gdb) p/x $esi 283 $4 = 0x1009810c 284 285 What? That address isn't right. It should be 12 less 286 than before, not...oh wait... 287 288 (gdb) disass code_BRUNCH 289 Dump of assembler code for function code_BRUNCH: 290 0x08049054 <+0>: add esi,DWORD PTR [esi] 291 => 0x08049056 <+2>: lods eax,DWORD PTR ds:[esi] 292 0x08049057 <+3>: jmp DWORD PTR [eax] 293 End of assembler dump. 294 295 Now I see it. We subtracted the address of LIT, not 296 the -12 that follows it. No wonder I got a segfault. 297 298 So how do I get the value -12 right after BRANCH? 299 300 Next night: okay, so I reviewed the ported words 301 so far and I'm pretty sure COMMA (,) fits the bill. 302 It "compiles" the value on the stack to the current 303 position... 304 305 : foo 65 EMIT BRUNCH -12 , ; 306 foo 307 A 308 Breakpoint 3, code_BRUNCH () at nasmjf.asm:251 309 251 add esi, [esi] ; add the offset to the instruction pointer 310 (gdb) x/x **$esi 311 0x8049228 <code_LIT>: 0xffad50ad 312 313 ...no, dang it, that doesn't work either, and for 314 the same reason. Sure, ',' will store whatever's on 315 the stack, but we're still getting LIT -12 compiled 316 first when we're compiling. 317 318 So I really don't see any easy way to test BRANCH, 319 let alone 0BRANCH with an arbitrary snippet of code 320 at this point. :-( 321 322 Next evening: I've also just ported LITSTRING and TELL, 323 two more primitives that appear hard to test because 324 I'm not sure how to compile literal values into memory 325 yet. 326 327 So, this would be a pretty big let-down way to end a 328 log file but... 329 330 IT APPEARS THAT I'VE PORTED ALL OF THE ASSEMBLY! 331 332 Yeah, so starting with the next log, I'm going to 333 start feeding jonesforth.f, which is the second half 334 of the language implementation implemented in itself, 335 into my port and fix the inevitable bugs. 336 337 It's been about six months of slowly chipping away 338 at this port nearly every single evening. I can 339 barely believe this stage has arrived. This is so 340 cool. 8-)