colorful rat Ratfactor.com > Dave's Repos

meow5

A stack-based pure inlining concatenative programming language written in NASM assembly
git clone http://ratfactor.com/repos/meow5/meow5.git

meow5/log02.txt

Download raw file: log02.txt

1 So log01.txt concluded with a nice little demonstration 2 of programatically inlining machine code at runtime to 3 "compile" a program and run it. 4 5 The next step is to start to turn this into an actual 6 language by creating headers for words (I've decided 7 I'll use the Forth term "word" to refer to the functions 8 we create in this language). 9 10 [x] Look up word length from header so it doesn't 11 have to be manually created and sent to the 12 inline function. 13 14 [x] Look up word by stored ASCII name in header at 15 runtime. That'll be exciting. I'll practically 16 have a programming language at that point. 17 18 I think I'll use a linked list of words like many 19 traditional Forths, since that's what I learned how to 20 implement in my JONESFORTH port, nasmjf. 21 22 Note: I added design-notes.txt to this repo because I 23 have been having some on-going thoughts about how 24 to implement this program as a whole, but they're 25 not things I can act upon right away and I don't 26 want to have to come back here searching in these 27 logs to find them (or worse, forget about them 28 entirely!) 29 30 Okay, now I've got #1 from above list working. Instead 31 of a "header", I've got "tails" at the end of my words. 32 Ha ha, cats have tails. So this just keeps getting 33 better. 34 35 I did it that way because then it becomes trivial to get 36 the length of the machine code. Here's the definition of 37 the exit word now, with its tail: 38 39 exit: 40 mov ebx, 0 ; exit with happy 0 41 mov eax, SYS_EXIT 42 int 0x80 43 exit_tail: 44 dd 0 ; null link is end of linked list 45 dd (exit_tail - exit) ; len of machine code 46 db "exit", 0 ; name, null-terminated 47 48 So now I don't have to give the length of the word's 49 machine code to inline anymore, just the tail address. 50 inline gets the stored length and does all the rest! 51 52 Here's the new inline: 53 54 ; inline function! 55 ; input: esi - tail of the word to inline 56 inline: 57 mov edi, [here] ; destination 58 mov ecx, [esi + 4] ; get len into ecx 59 sub esi, ecx ; sub len from esi (start of code) 60 rep movsb ; movsb copies from esi to esi+ecx into edi 61 add edi, ecx ; update here pointer... 62 mov [here], edi ; ...and store it 63 ret 64 65 Still not too complicated. And I think this might even 66 be its final form? 67 68 Let's see if this works... 69 70 Program received signal SIGSEGV, Segmentation fault. 71 inline () at meow5.asm:67 72 67 rep movsb 73 74 Darn it. 75 76 Oh, wait! It was inlining the meows just fine, it was 77 doing exit that failed. I simply hadn't updated it to 78 point to the tail yet. Simple mistake: 79 80 ; inline exit 81 mov esi, exit <---- oops! 82 call inline 83 84 needs to be: 85 86 ; inline exit 87 mov esi, exit_tail 88 call inline 89 90 How about now... 91 92 $ mrun 93 Meow. 94 Meow. 95 Meow. 96 Meow. 97 Meow. 98 99 Awesome! Guess I can start making it find words by ASCII 100 name in the tails, searching by linked list. Very 101 exciting progress tonight! 102 103 I've got two more todos: 104 105 [x] Add tails to anything that should be a word 106 107 [ ] Make all words take params from the stack, not 108 from pre-defined registers. Yes, we're losing 109 some speed by going to main memory, but I have 110 a feeling the stack is surely in CPU cache most 111 of the time? I should look that up someday... 112 113 So I'm going to call my word that looks up other words 114 by string name by searching through a linked list of 115 words 'find', just like in Forth. (Well, except there 116 it's FIND, of course.) 117 118 Two nights later: I've written the 'find' word and added 119 tails to all of my words so far. But I've got a 120 segfault: 121 122 dave@cygnus~/meow5$ mr 123 ./build.sh: line 33: 1966 Segmentation fault ./$F 124 125 So it's GDB time: 126 127 dave@cygnus~/meow5$ mb 128 Reading symbols from meow5... 129 ... 130 143 push temp_meow_name ; the name string to find 131 144 call find ; answer will be in eax 132 81 pop eax ; first param from stack! 133 84 mov ecx, [last] 134 86 cmp ecx, 0 ; a null pointer (0) is end of list 135 87 je .not_found 136 93 lea edx, [ecx + 8] ; set dictionary name pointer 137 94 mov ebx, eax ; (re)set name to find pointer 138 139 Okay, so here's where I'm comparing the search string to 140 be found against the first (well, last) word's name in 141 the linked list ("dictionary"). So let's see if I got 142 the name from the dictionary entry's "tail" correctly. 143 144 Oh, and here's my comment block from 'find' explaining 145 the register use: 146 147 ; input: 148 ; stack -> eax 149 ; register use: 150 ; eax - start of null-terminated name to find 151 ; ebx - name to find byte pointer 152 ; ecx - dictionary list pointer 153 ; edx - dictionary name byte pointer 154 155 The first thing in the tail should be a link to the next 156 word in the dictionary. The ecx register should have that 157 link: 158 159 (gdb) x/a $ecx 160 0x804908d <find_tail>: 0x8049052 <inline_tail> 161 162 Yup! That's right. The next word is 'inline'. 163 164 The next thing is the length of the word's machine code: 165 166 (gdb) x/dw $ecx+4 167 0x8049091: 39 168 169 39 bytes seems reasonable. Okay, the next thing should be 170 the null-terminated string of the word name: 171 172 (gdb) x/s $ecx+8 173 0x8049095: "find" 174 175 Yes! 176 177 And have I correctly pointed to the first byte of this 178 string in the edx register? 179 180 (gdb) x/s $edx 181 0x8049095: "find" 182 183 Wow, also yes! 184 185 Okay, so the next thing to confirm is that I have the 186 address of the string to match in register eax: 187 188 (gdb) x/a $eax 189 0x80490c1 <inline_a_meow+10>: 0x74e8308b 190 191 Oops! That's not right. That's an address somewhere in my 192 loop that inlines meow five times... 193 194 I see it now! 195 196 143 push temp_meow_name ; the name string to find 197 144 call find ; answer will be in eax 198 81 pop eax ; first param from stack! 199 200 I forgot that 'call' will push the return address onto 201 the stack. Which is why I can't just pop my parameter. 202 203 I need to use the stack pointer and an offset to get the 204 value... 205 206 I use arrays as stacks all the time in higher level 207 languages, so a PUSH and POP are second nature to me. 208 But I must confess that in an assembly language context, 209 I get super confused by terms like "top", "bottom" and 210 "low" and "high". 211 212 So I prefer to make all of this SUPER CONCRETE. Here's 213 my own personal explanation: 214 215 push eax ; containing 0xAAA 216 push ebx ; containing 0xBBB 217 push ecx ; containing 0xCCC 218 push edx ; containing 0xDDD 219 pop edx 220 pop ecx 221 222 The Stack: 223 ---------- 224 0xAAA <-- esp + 4 225 0xBBB <-- esp 226 0xCCC <-- esp - 4 227 0xDDD <-- esp - 8 228 229 Heck, I'm gonna verify that for myself right now with 230 all of you watching: 231 232 (gdb) s 233 125 mov eax, 0xAAA 234 126 mov ebx, 0xBBB 235 127 mov ecx, 0xCCC 236 128 mov edx, 0xDDD 237 129 push eax 238 130 push ebx 239 131 push ecx 240 132 push edx 241 133 pop edx 242 134 pop ecx 243 (gdb) x $esp + 4 244 0xffffd77c: 0x00000aaa 245 (gdb) x $esp 246 0xffffd778: 0x00000bbb 247 (gdb) x $esp - 4 248 0xffffd774: 0x00000ccc 249 (gdb) x $esp - 8 250 0xffffd770: 0x00000ddd 251 252 Whew! At least I've got that much right. :-) 253 254 So my fix is this: 255 256 mov eax, [esp + 4] ; first param from stack! 257 258 And now let's see what we've got in eax: 259 260 (gdb) x/a $eax 261 0x804a006 <temp_meow_name>: 0x776f656d 262 263 Perfect. And $ebx should be the same to begin with: 264 265 (gdb) x/a $ebx 266 0x804a006 <temp_meow_name>: 0x776f656d 267 268 Yup. Good so far. 269 270 ...wait. This next line isn't right: 271 272 96 cmp edx, ebx 273 274 What am I doing? I'm comparing the two addresses here, 275 not the characters they point to. Even worse, I can't 276 compare two pointed-to *values* at the same time. I need 277 to actually store at least one of the two *values* to 278 compare in a register! 279 280 Sheesh. Lemme fix this up. Okay, so here's the new 281 register use, which I'm trying to make as conventional 282 as I know how... 283 284 ; register use: 285 ; al - to-find name character being checked 286 ; ebx - start of dict word's name string 287 ; ecx - byte offset counter (each string character) 288 ; edx - dictionary list pointer 289 ; ebp - start of to-find name string 290 291 And the code has changed quite a bit, so I'm gonna step 292 through it again: 293 294 (gdb) s 295 146 push temp_meow_name ; the name string to find 296 147 call find ; answer will be in eax 297 find () at meow5.asm:80 298 80 mov ebp, [esp + 4] ; first param from stack! 299 83 mov edx, [last] 300 find.test_word () at meow5.asm:85 301 85 cmp edx, 0 ; a null pointer (0) is end of list 302 86 je .not_found 303 92 lea ebx, [edx + 8] ; set dict. word name pointer 304 93 mov ecx, 0 ; reset byte offset counter 305 306 Okay, first the ebx register should now point to the 307 current dictionary word's name that we're gonna test: 308 309 (gdb) x/s $ebx 310 0x804909f: "find" 311 312 Good. 313 314 And the ebp register should point to the to-find name: 315 316 (gdb) x/s $ebp 317 0x804a006 <temp_meow_name>: "meow" 318 319 Good. 320 321 find.compare_names_loop () at meow5.asm:95 322 95 mov al, [ebp + ecx] ; get next to-find name byte 323 96 cmp al, [ebx + ecx] ; compare with next dict word byte 324 325 Now the character in byte register al should be the first 326 one from the to-find name "meow": 327 328 (gdb) p/c $al 329 $2 = 109 'm' 330 331 Good. 332 333 And the character pointed to by ebx+ecx should be the 334 first one from the dict word "find": 335 336 (gdb) x/c $ebx+$ecx 337 0x804909f: 102 'f' 338 339 Good. 340 341 And since these don't match, the jump should take us to 342 the next word... 343 344 97 jne .try_next_word ; found a mismatch! 345 find.try_next_word () at meow5.asm:102 346 102 mov ecx, [ecx] ; follow the tail! (linked list) 347 Program received signal SIGSEGV, Segmentation fault. 348 349 Oh, right. Silly me. I'm storing the dictionary word 350 links in the edx register now, not ecx! I missed this 351 one... 352 353 Okay, how about now? 354 355 find.try_next_word () at meow5.asm:103 356 103 mov edx, [edx] ; follow the tail! (linked list) 357 (gdb) x/a $edx 358 0x8049097 <find_tail>: 0x8049052 <inline_tail> 359 (gdb) s 360 104 jmp .test_word 361 362 That's better. Let's see if we're testing "meow" vs 363 "inline" now (well, 'm' vs 'i'): 364 365 (gdb) p/c $al 366 $1 = 109 'm' 367 (gdb) x/c $ebx+$ecx 368 0x804905a: 105 'i' 369 370 Good! 371 372 And the next word should be "meow", so 'm' vs 'm': 373 374 (gdb) p/c $al 375 $2 = 109 'm' 376 (gdb) x/c $ebx+$ecx 377 0x8049037: 109 'm' 378 98 jne .try_next_word ; found a mismatch! 379 99 cmp al, 0 ; both hit 0 terminator at same time 380 100 je .found_it 381 find.try_next_word () at meow5.asm:103 382 103 mov edx, [edx] ; follow the tail! (linked list) 383 384 What? 385 386 Oh. <facepalm> It just dropped through. I forgot the 387 388 jmp .compare_names_loop 389 390 at the end of my loop... 391 392 I'll spare you the second go where I had an infinite loop 393 because I had *also* forgotten to increment the ecx 394 register to check the next letter in the strings... 395 396 Okay, and now? 397 398 Reading symbols from meow5... 399 (gdb) break 97 400 Breakpoint 1 at 0x8049081: file meow5.asm, line 97. 401 1: /c $al = <error: No registers.> 402 (gdb) r 403 Starting program: /home/dave/meow5/meow5 404 Breakpoint 1, find.compare_names_loop () at meow5.asm:97 405 97 cmp al, [ebx + ecx] ; compare with next dict word byte 406 (gdb) display /c *($ebx + $ecx) 407 (gdb) display /c $al 408 1: /c $al = 109 'm' 409 2: /c *($ebx + $ecx) = 102 'f' 410 (gdb) c 411 Continuing. 412 Breakpoint 1, find.compare_names_loop () at meow5.asm:97 413 97 cmp al, [ebx + ecx] ; compare with next dict word byte 414 1: /c $al = 109 'm' 415 2: /c *($ebx + $ecx) = 105 'i' 416 ... 417 1: /c $al = 109 'm' 418 2: /c *($ebx + $ecx) = 109 'm' 419 ... 420 1: /c $al = 101 'e' 421 2: /c *($ebx + $ecx) = 101 'e' 422 ... 423 1: /c $al = 111 'o' 424 2: /c *($ebx + $ecx) = 111 'o' 425 ... 426 1: /c $al = 119 'w' 427 2: /c *($ebx + $ecx) = 119 'w' 428 ... 429 1: /c $al = 0 '\000' 430 2: /c *($ebx + $ecx) = 0 '\000' 431 (gdb) c 432 Continuing. 433 434 Program received signal SIGSEGV, Segmentation fault. 435 inline_a_meow () at meow5.asm:152 436 152 mov esi, [eax] ; putting directly in reg for now 437 438 Yay! (Not the segfault, but the apparent correct matching 439 of the strings.) 440 441 Now let's see what's happening once we get a match, 442 because clearly eax is not getting returned with a valid 443 word tail address... 444 445 (gdb) break find.found_it 446 ... 447 Breakpoint 1, find.found_it () at meow5.asm:113 448 113 mov eax, ecx ; pointer to tail of dictionary word 449 450 Gah! I see it. Another ecx that should be an edx. I 451 could have sworn I searched for these... 452 453 Reading symbols from meow5... 454 (gdb) break find.found_it 455 Breakpoint 1 at 0x8049097: file meow5.asm, line 113. 456 (gdb) r 457 Starting program: /home/dave/meow5/meow5 458 459 Breakpoint 1, find.found_it () at meow5.asm:113 460 113 mov eax, edx ; pointer to tail of dictionary word 461 (gdb) p/a $edx 462 $1 = 0x804902f <meow_tail> 463 464 That's better. So yeah, we definitely found the meow 465 word by string. Very cool. Let's see what happens next... 466 467 (gdb) s 468 114 ret ; (using call/ret for now) 469 (gdb) 470 inline_a_meow () at meow5.asm:152 471 152 mov esi, [eax] ; putting directly in reg for now 472 (gdb) 473 153 call inline 474 (gdb) 475 inline () at meow5.asm:62 476 62 mov edi, [here] ; destination 477 478 Yes, very nice... 479 480 Breakpoint 1, find.found_it () at meow5.asm:113 481 Breakpoint 1, find.found_it () at meow5.asm:113 482 Breakpoint 1, find.found_it () at meow5.asm:113 483 Breakpoint 1, find.found_it () at meow5.asm:113 484 485 That's four more 'meow's getting inlined... 486 487 Breakpoint 1, find.found_it () at meow5.asm:113 488 489 That's the 'exit'... 490 491 113 mov eax, edx ; pointer to tail of dictionary word 492 (gdb) c 493 Continuing. 494 495 Program received signal SIGSEGV, Segmentation fault. 496 inline () at meow5.asm:63 497 63 mov ecx, [esi + 4] ; get len into ecx 498 499 Wait, how did esi get the wrong value? 500 501 Oh jeez, I have these brackets around eax here: 502 503 mov esi, [eax] ; putting directly in reg for now 504 505 But I want the address in eax, not the value it's pointing 506 to. Yet another easy fix: 507 508 mov esi, eax ; putting directly in reg for now 509 510 You know what? I feel like this should be good now. 511 512 Let's do this: 513 514 dave@cygnus~/meow5$ mr 515 Meow. 516 Meow. 517 Meow. 518 Meow. 519 Meow. 520 521 Yes! 522 523 I'm now able to find words by string name in the 524 dictionary and "compile" them into memory and run them. 525 526 The only TODO "checkbox" I didn't check in this log was 527 this one: 528 529 [ ] Make all words take params from the stack, not 530 from pre-defined registers. 531 532 Which should be no problem. That'll be a nice easy way 533 to start the next log, so I'll see you in log03.txt 534 with that!