colorful rat Ratfactor.com > Dave's Repos

nasmjf

A NASM assembler port of JONESFORTH
git clone http://ratfactor.com/repos/nasmjf/nasmjf.git

nasmjf/devlog/log21.txt

Download raw file: devlog/log21.txt

1 The last log saw yet _another_ bug in my DEFVAR macro. 2 Sheesh. Hopefully I have variables working correctly 3 now? 4 5 As I mentioned last night, the next words look like a 6 real mixture of things. Let's jump into it: 7 8 WITHIN is a conditional that tests if a number is 9 between two other numbers. This highlights the advantage 10 of such a flexible language (and the simplicity of the 11 syntax). I'll format this slightly to make a bit of a 12 truth table out of it to demonstrate how within works: 13 14 1 2 3 WITHIN . 0 15 2 3 1 WITHIN . 0 16 3 1 2 WITHIN . 0 17 2 1 3 WITHIN . 1 <-- 2 is within 1 and 3 18 19 DEPTH gives us the depth of the stack: 20 21 1 2 22 .S 23 2 1 24 DEPTH . 25 8 26 27 Huh? Oh, it's in bytes, not number of items! 28 29 ALIGNED rounds a number up to the next multiple of 4, 30 which aligns addresses on 32-bit (4 byte) computers. 31 It's easy to test: 32 33 1 ALIGNED . 34 4 35 7 ALIGNED . 36 8 37 38 ALIGN performs ALIGNED on HERE. Here's the definition. 39 40 : ALIGN HERE @ ALIGNED HERE ! ; 41 42 I have a sense for when this would be used while 43 compiling, but don't really feel like trying to make up 44 a test for it. I'm content to see how it's used later 45 and test *that*. 46 47 And now, the moment we've all been waiting for: strings! 48 49 First, we need a new primitive to store the bytes of the 50 string. The word ',' (COMMA) does this for word-sized 51 (four bytes, the size of a 32-bit address) data. As with 52 the other byte-sized words, the name is prepended with a 53 letter 'C'. 54 55 So let's compare 'C,': 56 57 : C, 58 HERE @ C! ( store the character in the compiled image ) 59 1 HERE +! ( increment HERE pointer by 1 byte ) 60 ; 61 62 ...with the assembly definition of ',': 63 64 mov edi, [var_HERE] 65 stosd ; puts the value in eax at edi, increments edi 66 mov [var_HERE], edi 67 68 Same thing, only stosd increments edi (HERE) by 4 rather 69 than 1. 70 71 I've learned just to see these primitives in action. 72 Trying to use _some_ them on their own is challenging 73 and surpisingly unrewarding. 74 75 Next, S" is a word that stores the characters which 76 follow as a string until it hits the end quote: ". 77 78 Note that there has to be a space after S" or it 79 wouldn't be matched as the correct word. However, the 80 final quote is NOT a word, it's just the special 81 character S" is looking for. 82 83 Finally, S" pushes the address of the start of the 84 string onto the stack, followed by the length of the 85 string. 86 87 Okay, I think I've got all of that: 88 89 S" Hello World" 90 . 91 11 92 93 There we go. "Hello World" is 11 characters long. 94 95 DUP 96 HEX 97 . 98 804EADC 99 @ EMIT 100 H 101 102 It has been stored in memory at a particular address, 103 which means we can retrieve the string from there. 104 (I've EMITted the 'H' from Hello.) 105 106 In compile mode, it's stored in the word being compiled, 107 along with the LITSTRING word we defined in assembly. 108 109 Bug where in memory is this stored when in immediate 110 mode? Jones explains that this implementation stores it 111 in the same place where we compile words. So I guess we 112 can do this: 113 114 HERE @ . 115 804EADC 116 HERE @ @ EMIT 117 H 118 HERE @ 1 + @ EMIT 119 e 120 121 And since HERE clearly hasn't been moved to a point 122 after the string, that means it is temporary. It will be 123 overwritten as soon as we define a new word! 124 125 A close relative to S" (in fact, it uses S" in compiling 126 mode), is the ." word, which is Forth's print string 127 word. I'm excited to have this: 128 129 ." Hello World!" 130 Hello World! 131 132 I've updated the README now that I can do this. :-) 133 134 And the good stuff keeps coming. The next words allow us 135 to define our own constants and variables. Not only 136 that, there's a handy explanation for each which would 137 have saved me some trouble before had I bothered to look 138 ahead (or, you know, actually learned the language I was 139 going to implement). 140 141 10 CONSTANT TEN 142 VARIABLE FOO 143 144 When TEN is executed, it leaves the integer 10 on the stack 145 When FOO is executed, it leaves the address of FOO on the stack 146 147 Let's try a constant: 148 149 42 CONSTANT answer 150 ." The answer is " answer . 151 The answer is 42 152 153 And a variable: 154 155 VARIABLE foo 156 9000 foo ! 157 foo @ . 158 9000 159 160 The definition of CONSTANT is pretty easy. It's 161 basically the same as (and functionally equivalent to) 162 this: 163 164 : answer 42 ; 165 ." The answer is " answer . 166 The answer is 42 167 168 But variables needs to allocate some memory and store its 169 address. Two simple utilities aid in this: 170 171 ALLOT - advances HERE by the amount on the stack and 172 leaves the previous HERE on the stack 173 174 CELLS - multiplies the number on the stack by the 175 natural address size of the machine (4 bytes 176 for our 32-bit implementation) 177 178 What's neat about CELLS is it shows how you can build up 179 words that read rather like a natural language: 180 181 HEX 182 HERE @ . 183 804EB30 184 5 CELLS ALLOT 185 . 186 804EB30 187 HERE @ . 188 804EB44 189 190 And that looks like 20 bytes...hey, wait a second. Now 191 that I can easiy store values, I'll let Forth figure it 192 out: 193 194 HERE @ CONSTANT previous 195 5 CELLS ALLOT CONSTANT new 196 previous . 197 804EB78 198 new . 199 804EB98 200 new previous - . 201 20 202 203 VALUE is like VARIABLE, except that the result is a word 204 which leaves its value on the stack like a constant 205 instead of its address. Unlike a constant, it can be 206 updated by another word, TO. 207 208 10 VALUE foo create foo, set to 10 209 20 TO foo update foo to 20 210 211 Sounds good: 212 213 10 VALUE foo 214 foo . 215 10 216 20 TO foo 217 PARSE ERROR: 20 TO 218 219 Huh? Oh! Right between these two word definitions is 220 where I have it stop reading jonesforth.f on load. 221 222 Guess it's time to figure out the next bug in my port. 223 224 I'll try reading all lines again: 225 226 %assign __lines_of_jf_to_read 10000 227 228 229 PARSE ERROR: ( look it up in the dictionary ) 230 >DFA 231 PARSE ERROR: ( look it up in the dictionary ) 232 >DFA 233 234 Program received signal SIGSEGV, Segmentation fault. 235 _COMMA () at nasmjf.asm:688 236 688 stosd ; puts the value in eax at edi, increments edi 237 (gdb) 238 239 Okay, so the PARSE ERROR message prints out the word 240 that caused the trouble, here ">DFA", after a buffer's 241 worth of context. 242 243 Ha ha, how silly. I simply missed that word in my port. 244 It's a simple definition since we already have >CFA, 245 which returns the codeword for a word pointer. >DFA just 246 has to advance 4 bytes to the "data" (so-called threaded 247 word addresses) after the codeword. 248 249 Okay, that's defined. Now can we run all of 250 jonesforth.f? 251 252 Program received signal SIGSEGV, Segmentation fault. 253 _COMMA () at nasmjf.asm:697 254 697 stosd ; puts the value in eax at edi, increments edi 255 (gdb) 256 257 Drat! Nope, still segfaulting. And in COMMA (',') again. 258 259 I wonder if I've overflowed some memory limitation? How 260 to check that...hmmm... Well, comma stores where HERE 261 points, and that's in memory reserved with Linux's brk 262 syscall. How about I bump that from 0x16000 to 0x64000 263 bytes: 264 265 add eax, 0x64000 ; add our desired number of bytes to break addr 266 267 Nope, exact same error: 268 269 _COMMA () at nasmjf.asm:697 270 271 And same thing if I add another zero to the number. So 272 much for an easy answer. So I guess, ideally, I would 273 break when COMMA is trying to stosd at an address in edi 274 that is outside the reserved FORTH data area. But first 275 I need to know what that area is. 276 277 I'm going to add some custom FORTH variables to capture 278 this so it'll be easy to examine. I'll test in gdb 279 first: 280 281 Breakpoint 2, _start () at nasmjf.asm:103 282 103 xor ebx, ebx 283 104 mov eax, __NR_brk ; syscall brk 284 105 int 0x80 285 106 mov [var_HERE], eax ; eax has start addr of data segment 286 (gdb) p/x $eax 287 $4 = 0x804e000 288 107 mov [var_CSTART], eax ; store info: start address of data segment 289 108 add eax, 0x16000 ; add our desired number of bytes to break addr 290 (gdb) p/x (int)var_HERE 291 $5 = 0x804e000 292 (gdb) p/x (int)var_CSTART 293 $6 = 0x804e000 294 295 So far so good, CSTART contains the start address of the 296 data area. 297 298 109 mov ebx, eax ; reserve memory by setting this new break addr 299 (gdb) p/x $eax 300 $7 = 0x80b2000 301 110 mov [var_CEND], eax ; store info: end address of data segment 302 111 mov eax, __NR_brk ; syscall brk again 303 (gdb) p/x $eax 304 $8 = 0x80b2000 305 112 int 0x80 306 (gdb) p/x $eax 307 117 mov ecx, 0 ; LOADJF read only flag for open 308 (gdb) p/x $eax 309 $10 = 0x80b2000 310 (gdb) p/x (int)var_CEND 311 $11 = 0x80b2000 312 313 That looks right. CEND contains the end address of the 314 data segment. Did I get my requested 0x64000 bytes? 315 316 Let's use the new FORTH vars to find out: 317 318 (gdb) c 319 Continuing. 320 HEX CSTART @ . CEND @ . CEND @ CSTART @ - . 321 804E000 80B2000 64000 322 323 Looking good. Now to catch the bad address being used in 324 COMMA. I added two compares and an ".oops" label: 325 326 _COMMA: 327 mov edi, [var_HERE] 328 cmp edi, [var_CSTART] 329 jl .oops 330 cmp edi, [var_CEND] 331 jg .oops 332 stosd 333 mov [var_HERE], edi 334 ret 335 .oops: 336 nop 337 338 (gdb) break _COMMA.oops 339 Breakpoint 2 at 0x8049343: file nasmjf.asm, line 707. 340 (gdb) c 341 Continuing. 342 343 Breakpoint 2, _COMMA.oops () at nasmjf.asm:707 344 707 nop 345 (gdb) p var_HERE 346 'var_HERE' has unknown type; cast it to its declared type 347 (gdb) p (int)var_HERE 348 $1 = 61368 349 (gdb) p (int)var_CSTART 350 $2 = 134537216 351 (gdb) p (int)var_CEND 352 $3 = 134627328 353 354 Okay, so HERE has been set to an invalid address 355 somehow. I wish backtraces worked. Then I'd be able to 356 see which word this came from. 357 358 I'm tempted to divide and conquer...and it looks like 359 if I stop execution of jonesforth.f right before the 360 definition of SEE, it doesn't segfault. 361 362 So I'll continue testing 'til there and then tackle the 363 problem head-on. 364 365 Anyway, where was I? Oh yeah, VALUE! 366 367 To quote myself: 368 369 "VALUE is like VARIABLE, except that the result is a word 370 which leaves its value on the stack like a constant 371 instead of its address. Unlike a constant, it can be 372 updated by another word, TO." 373 374 10 VALUE foo create foo, set to 10 375 20 TO foo update foo to 20 376 377 Sounds good: 378 379 10 VALUE foo 380 foo . 381 10 382 20 TO foo 383 foo . 384 20 385 386 That's better. I'll continue with the word testing in 387 the next log.