colorful rat Ratfactor.com > Dave's Repos

meow5

A stack-based pure inlining concatenative programming language written in NASM assembly
git clone http://ratfactor.com/repos/meow5/meow5.git

meow5/log05.txt

Download raw file: log05.txt

1 Howdy! So I've got immediate and compile modes basically 2 working. Now I've got two TODOs carrying over from the 3 last log and I'll add some new ones. I'm gonna put this 4 in the order I think they should probably be done, too: 5 6 [ ] turn return_addr into a stack so i can call 7 words from words 8 [ ] make a "print" word and use it 9 [ ] colon word store name somewhere 10 [ ] semicolon should write tail 11 12 From there, I think the roadmap will be fairly clear and 13 largely driven by whatever I think will be useful or fun 14 to grow the language. 15 16 * Print dictionary/word tails 17 * How to handle variables and strings 18 * Interactive REPL 19 * ANSI colors 20 * Print current stack (automatically in REPL?) 21 * Write out any word as an ELF executable? 22 * Other debugging features? 23 * Some simple benchmarks - size vs speed 24 25 This whole thing is just an experiment in simplicity by 26 taking a concept to extremes (in a good way) so I want 27 to keep it tiny and comprehensible. Too many features 28 will detract from that, so I'll need to pick them 29 carefully. 30 31 Okay, I'll start with turning the single return address 32 I'm currently using for immediate word calls into a 33 proper return stack. Here's the BSS reservations: 34 35 return_stack: resb 512 36 return_ptr: resb 4 37 38 And I've modified the CALLWORD and ENDWORD macros to 39 push and pop the return stack appropriately. 40 41 I think it's worth noting that I'm extremely not 42 concerned about performance in immediate mode because 43 I'm expecting that to be pretty much always be invoked 44 by the user. The compiled code doesn't use this return 45 stack because nothing ever gets called - it'll just be a 46 continuous flow of instructions! 47 48 Anyway, let's see what I've got when I try the new 49 return stack: 50 51 Program received signal SIGSEGV, Segmentation fault. 52 get_next_token () at meow5.asm:73 53 73 mov dword [eax], %%return_to ; CALLWORD 54 55 Oh, wait! I forgot to initialize the return_ptr so it 56 starts off pointing at the start of return_stack! So of 57 course it's gonna segfault - I'm trying to mov to 58 address 0! 59 60 Program received signal SIGSEGV, Segmentation fault. 61 0x0804a689 in return_ptr () 62 63 Oops! And that one is because I was trying to jump to 64 the address in return_ptr, not to where it _pointed to_! 65 66 Program received signal SIGSEGV, Segmentation fault. 67 0x0804b000 in ?? () 68 69 Yuck, that one will be in the compiled code. 70 71 I think I'll call it a night. 72 73 Next night: Nope! It was in immediate word execution and 74 it was pretty silly. I'm doing a bit of a hack to 75 perform immediate word execution - passing a register to 76 the CALLWORD macro instead of an address literal. Turns 77 out, the register I was using for this was eax. And, 78 naturally, that's the register I used to implement the 79 new call stack stuff in CALLWORD! 80 81 Changing this: 82 83 CALLWORD eax 84 85 To this: 86 87 CALLWORD ebx 88 89 And let's see if we can at least do what we could do 90 before I made my changes: 91 92 $ mr 93 Immediate Meow! 94 Immediate Meow! 95 Meow. 96 Meow. 97 Meow. 98 99 Nice. So then I can move on to new functionality. Let's 100 try a nested call. I'll make some simple test words: 101 102 DEFWORD foo 103 CALLWORD bar 104 push 0 ; for exit 105 CALLWORD exit 106 ENDWORD foo, "foo", (IMMEDIATE) 107 108 DEFWORD bar 109 CALLWORD meow 110 ENDWORD bar, "bar", (IMMEDIATE) 111 112 And calling them with this input string: 113 114 db 'meow bar foo', 0 115 116 Should result in three immediate meows and exit: 117 118 $ mr 119 Immediate Meow! 120 Immediate Meow! 121 Immediate Meow! 122 123 Wow, nice. So I can allow words to call other words now, 124 which will let me clean up some of the most redundant 125 bits of code (like string printing). 126 127 I think next will be a 'print' word that prints 128 null-terminated strings. 129 130 I'll have 'print' call a 'strlen' word to calculate a 131 null-terminated string to get its length as well. 132 133 Since this is the first time I'll have a real word that 134 calls another word in Meow5, it occurs to me that this 135 version of 'print' can *only* be an immediate mode word 136 because my compiler won't be able to inline the 'strlen' 137 machine code inside of print. Thus, only words composed 138 entirely out of other words OR completely self-contained 139 can be COMPILE mode words??? 140 141 So at some point, I'll need to re-define 'print' by 142 "compiling" it properly in the Meow5 fashion (inlining 143 'strlen') so it can be compiled into other words! 144 145 Ah, jeez, and then I've got a whole chicken-and-egg 146 problem where calling a "compiled" word needs to happen 147 after it's been compiled - so I can't use it until it's 148 been compiled. Which means I've got to be real careful 149 about the "bootstrapping" order of my definitions. 150 151 Well, that's what this whole wacky experiment is all 152 about and by keeping it small, I can re-write as needed. 153 154 MOVING ON, let's get this immediate mode-only print 155 working!!! 156 157 Okay, got it working. Here's 'strlen' and 'print' 158 stripped of comments to show how simple they are and how 159 nice and neat my token_not_found printing has become: 160 161 DEFWORD strlen 162 mov eax, [esp] 163 mov ecx, 0 164 .find_null: 165 cmp byte [eax + ecx], 0 166 je .done 167 inc ecx 168 jmp .find_null 169 .done: 170 push ecx 171 ENDWORD strlen, "strlen", (IMMEDIATE & COMPILE) 172 173 DEFWORD print 174 CALLWORD strlen 175 mov ebx, STDOUT 176 pop edx 177 pop ecx 178 mov eax, SYS_WRITE 179 int 0x80 180 ENDWORD print, "print", (IMMEDIATE) 181 182 ... 183 184 .token_not_found: 185 push not_found_str1 186 CALLWORD print 187 push token_buffer 188 CALLWORD print 189 push not_found_str2 190 CALLWORD print 191 CALLWORD exit 192 193 Having a print word to call reduced token_not_found from 194 over 20 lines of repetitious code to this. 195 196 But for the reasons I explain above, I can't use 'print' 197 in my compiled 'meow' word or any other compiled words. 198 Which left me in a bit of a quandry. 199 200 How do I "inline" the 'strlen' code into 'print' in the 201 meow5 interpreter when it's being assembled (or when it 202 starts up). 203 204 I _could_ define all compound words as strings and 205 actually run the interpreter on them at startup... 206 207 db ": print strlen print_with_len ;" 208 209 ...but I feel like that's too inefficient even for this 210 toy language. 211 212 Or I could limit myself to a CALLWORD depth of one. 213 Which would just require that I always call 'strlen' 214 before print: 215 216 .token_not_found: 217 push not_found_str1 218 CALLWORD strlen 219 CALLWORD print 220 push token_buffer 221 CALLWORD strlen 222 CALLWORD print 223 push not_found_str2 224 CALLWORD strlen 225 CALLWORD print 226 CALLWORD exit 227 228 But yuck. And I feel like I'm likely to regret having 229 that limitation later when I'm adding more advanced 230 functionality? 231 232 So how do I inline in NASM itself? 233 234 Well, I've had all day to think about it and the answer 235 was kind of staring me right in the face: macros! 236 237 New TODO: 238 239 [ ] Create word code macros so words can be inlined 240 while in assembly. 241 242 I can put the 'strlen' code in a macro and then use that 243 to define a 'strlen' word _and_ embed it directly into 244 'print' as if I'd typed it. 245 246 Then I did the same thing with 'print' itself (I wonder 247 if there's a limit to the depth of these macros 248 containing macros in NASM?) and then I could use it in 249 the 'meow' words also. 250 251 With the actual strlen and print code removed, here's 252 what all four words look like now: 253 254 %macro strlen_code 0 255 <code> 256 %endmacro 257 DEFWORD strlen 258 strlen_code 259 ENDWORD strlen, "strlen", (IMMEDIATE & COMPILE) 260 261 %macro print_code 0 262 strlen_code 263 <code> 264 %endmacro 265 DEFWORD print ; (straddr) print () 266 print_code 267 ENDWORD print, "print", (IMMEDIATE & COMPILE) 268 269 DEFWORD imm_meow 270 push imm_meow_str 271 print_code 272 ENDWORD imm_meow, "meow", (IMMEDIATE) 273 274 DEFWORD meow 275 push meow_str 276 print_code 277 ENDWORD meow, "meow", (COMPILE) 278 279 And does it still work? 280 281 db 'meow : meow meow meow exit', 0 282 283 Should print one immediate meow, three compiled meows 284 and an exit: 285 286 $ mr 287 Immediate Meow! 288 Meow. 289 Meow. 290 Meow. 291 292 You bet it does. 293 294 Using macros this way to inline code solves a lot of 295 problems. I'm even wondering if I can remove the 296 CALLWORD macros and the return aparatus, etc? 297 298 Or wait, I still need to return when calling these in 299 immediate mode in the interpreter, right? 300 301 For now, I want to get the colon and semicolon words 302 working so I can make the _real_ meow5 word as fortold 303 by the prophets. 304 305 Many nights later: I got sick and had to skip some 306 evenings. Compound that with the fact that cascade of 307 changes needed to let semicolon (';') compile the tail 308 of a new word and I've got a bit of a challenge getting 309 these notes in order. 310 311 At the moment, I've done the following changes: 312 313 1. The tiny three-instruction 'return' code that gets 314 appended to the end of all word definitions (so they 315 can be called directly as functions) has been turned 316 into a macro so it can be included in the ENDWORD 317 macro and as a word of its own so it can be inlined 318 by semicolon. 319 320 2. The 'inline' word's machine code is now also in a 321 macro so it can be included elsewhere (such as 322 semicolon). So yeah, 'inline' gets inlined! 323 324 3. Same with 'get_token' which is now needed by colon 325 (':') to save the name of a new word definition: 326 327 : meow5 meow meow meow meow meow ; 328 329 Semicolon compiles the token "meow5" into the new 330 word's tail so 'find' can find it in the dictionary. 331 332 4. For now, I have 'copystr' including 'strlen' (I use 333 null-terminated strings), but the usage in tail 334 makes me think these should be separated to make 335 them more generalized. More about that in a moment. 336 337 (NOTE: No, copystr doesn't actually contain 'strlen', 338 I was wrong about that. 'print' is the only word that 339 contains 'strlen'.) 340 341 5. 'print' is used in a number of places, so that's 342 been turned into a macro as well. Actually, it looks 343 like I already noted that would be happening above. 344 But I hadn't yet settled on ALL CAPS for the code 345 macros. Here's the entire 'print' definition right 346 now: 347 348 ; Prints a null-terminated string by address on stack. 349 %macro PRINT_CODE 0 350 STRLEN_CODE ; (after: straddr, len) 351 mov ebx, STDOUT ; write destination file 352 pop edx ; strlen 353 pop ecx ; start address 354 mov eax, SYS_WRITE ; syscall 355 int 0x80 ; interrupt to linux! 356 %endmacro 357 DEFWORD print ; (straddr) print () 358 PRINT_CODE 359 ENDWORD print, "print", (IMMEDIATE | COMPILE) 360 361 6. 'semicolon' is coming along nicely, but I was having 362 trouble testing it until I realized I had completely 363 forgotten to implement my new word flag: RUNCOMP. 364 365 The RUNCOMP flag is my way of saying "this word runs 366 immediately, even in COMPILE mode." 367 368 It's exactly the same as Forth's "immediate" words, 369 but I think my naming makes the meaning a little more 370 explicit? At least, that's the intention. 371 372 7. I've gotten much more comfortable using the stack to 373 store local values rather than trying to juggle 374 registers or use global variables. My first real 375 stack workout has been in 'semicolon', where I have 376 ended up using it just a tiny amount, but I've 377 already started to have minor revelations about how 378 the stack helps me manage memory, but nothing I can 379 put into words very well just yet. 380 381 So given that update, my current TODOs to get 382 "completely 100% legit word compilation from a string of 383 input" working are in this order: 384 385 [ ] Create new word: 'get_flags' will put 32 bits of 386 flags from a word's tail on the stack. 387 [ ] Create new word: 'is_runcomp' will leave 0 on 388 the stack if RUNCOMP flag is NOT set in flags. 389 [ ] Get 'semicolon' just working for now, which 390 currently means having two duplicate calls to 391 'strlen', which is annoying, but if I change too 392 much more, I'm going to have an impossible time 393 debugging all of these changes at once! 394 [ ] AFTER semicolon works, pull 'strlen' out of 395 'copystr' and maaayyybeee out of 'print' too, 396 though I'm on the fence about that one. 397 [ ] And beyond that, I'm thinking 'copystr' may not 398 even need to exist, because it's a specialized 399 case of copying memory...and I've already got an 400 instance of that in 'inline' - so maybe the two of 401 them should share that functionality in something 402 like a 'copymem' word? 403 404 (NOTE: After getting semicolon working, I realized that 405 I was being super premature about trying to factor 406 'strlen' out of 'copystr' and avoiding looping through 407 the word name string twice in semicolon. I mean, the 408 word name is only 32 bytes MAXIMUM right now. Also, 409 copystr doesn't even contain strlen - I had remembered 410 that incorrectly. So scratch those last two TODOs!) 411 412 Okay, now I've written the two new words and am calling 413 them to check the word: 414 415 ; We're in compile mode... 416 CALLWORD get_flags 417 CALLWORD is_runcomp 418 pop eax ; get result 419 cmp eax, 0 ; if NOT equal, word was RUNCOMP 420 jne .exec_word ; yup, RUNCOMP 421 CALLWORD inline ; nope, "compile" it. 422 423 But I've got something wrong, because obviously this 424 shouldn't be happening: 425 426 $ mr 427 Immediate Meow! 428 Could not find word "meow5" 429 430 I could certainly figure this out with GDB, but that's 431 getting increasingly painful now that the interpreter is 432 becoming more complex. 433 434 So I think it's high time for more "print debugging" in 435 the interpreter itself. 436 437 First off, I've had a TODO to add the current mode to my 438 word not found error message. So I'll start with that: 439 440 $ mr 441 Immediate Meow! 442 Could not find word "meow5" while looking in COMPILE mode. 443 444 Oh! I wasn't expecting that. We shouldn't be in COMPILE 445 mode after semicolon (;). 446 447 Breakpoint 1, semicolon () at meow5.asm:331 448 331 mov eax, [here] 449 332 push eax ; push end of machine code to stack 450 push () at meow5.asm:73 451 73 mov eax, [return_ptr] ; current return stack pos 452 74 sub dword [return_ptr], 4 ; "pop" return stack 453 75 jmp [eax] ; go to return addr! 454 455 Huh? It's calling 'return' right at the beginning of 456 semicolon? What is...oh, jeez, I had RETURN_CODE of 457 just "return" - I wanted the label as the address to 458 inline, but I accidentally included the return function. 459 Okay, that was just a dumb mistake made by a sleepy 460 person. 461 462 Wrong: 463 464 push RETURN_CODE ; push what to inline on stack 465 466 Fixed: 467 468 push return ; push what to inline on stack 469 470 Moving on, now I have no doubt I'll be stepping through 471 semicolon for a while. Even though I worked it out on 472 paper, there are so many opportunities for mistakes. 473 474 Here we go... 475 476 Program received signal SIGSEGV, Segmentation fault. 477 semicolon () at meow5.asm:127 478 127 rep movsb ; copy [esi]...[esi+ecx] into [edi] 479 480 That looks like 'inline' - so that'll be my use of the 481 INLINE_CODE macro in semicolon. Let's see... 482 483 (gdb) break semicolon 484 Breakpoint 1 at 0x8049397: file meow5.asm, line 331. 485 Starting program: /home/dave/meow5/meow5 486 Immediate Meow! 487 Breakpoint 1, semicolon () at meow5.asm:331 488 331 mov eax, [here] 489 332 push eax ; push end of machine code to stack 490 semicolon () at meow5.asm:339 491 339 push return ; push what to inline on stack 492 semicolon () at meow5.asm:121 493 121 pop esi ; param1: tail of word to inline 494 semicolon () at meow5.asm:122 495 122 mov edi, [here] ; destination 496 497 We're in INLINE_CODE now. Let's see what we're currently 498 pointing to: 499 123 mov eax, [esi + T_CODE_LEN] ; get len of code 500 (gdb) p $eax 501 $1 = -668105976 502 503 Uh...that's definitely not the correct length for the 504 'return' word's machine code... 505 506 Oh, another bone-headed mistake. Perhaps you saw it 507 above when I tried to fix it and were screaming at me? 508 Sorry, I couldn't hear you. 509 510 Still wrong: 511 512 push return ; push what to inline on stack 513 514 Fixed for real this time: 515 516 push tail_return ; push what to inline on stack 517 518 Yeah, I wasn't thinking straight. The label "return" 519 points to the beginning of the word's machine code. The 520 label "tail_return" points to the tail...which is what 521 the 'inline' word is expecting! 522 523 Alright, with that fixed: 524 525 $ mr 526 Immediate Meow! 527 Could not find word "meow5" while looking in IMMEDIATE mode. 528 529 Yeah, still broken, but in a much better way. I can see 530 that semicolon has at least returned us to IMMEDIATE 531 mode. 532 533 Probably just a simple bug in semicolon compiling the 534 new word's tail... 535 536 ...yup! Just an ebx where I should have had an eax. The 537 last word address wasn't being updated for the 538 dictionary linked list. 539 540 Crossing fingers... 541 542 $ mr 543 Immediate Meow! 544 Meow. 545 Meow. 546 Meow. 547 548 Gasp! It works! In some ways, this is 100% of the 549 original project goal. Oh, wait, almost. That's just 3 550 meows. Ha ha. One second... 551 552 input_buffer: 553 db ': meow5 meow meow meow meow meow ; ' 554 db 'meow5 ' 555 db 'newline ' 556 db 'exit',0 557 558 And drum-roll please... 559 560 $ mr 561 Meow. Meow. Meow. Meow. Meow. 562 563 Yahoo!!! 564 565 That completes the TODO list I'd built up: 566 567 568 569 [x] Turn return_addr into a stack so i can call words 570 from words (oops! turns out that doesn't make 571 sense when words are inlined...) 572 [x] make a "print" word and use it 573 [x] colon word store name somewhere 574 [x] semicolon should write tail 575 [x] Create word code macros so words can be inlined 576 while in assembly. 577 [x] Create new word: 'get_flags' will put 32 bits of 578 flags from a word's tail on the stack. 579 [x] Create new word: 'is_runcomp' will leave 0 on 580 the stack if RUNCOMP flag is NOT set in flags. 581 [x] Get 'semicolon' working.