colorful rat Ratfactor.com > Dave's Repos

meow5

A stack-based pure inlining concatenative programming language written in NASM assembly
git clone http://ratfactor.com/repos/meow5/meow5.git

meow5/log04.txt

Download raw file: log04.txt

1 The last update was very exciting. Now I'm actually 2 reading a single word's definition from a string, 3 inlining all of the code into memory, and executing it. 4 5 To put it in concrete terms, this 'meow5' definition: 6 7 "meow meow meow meow meow exit" 8 9 Was turned into this in memory: 10 11 <meow word machine code> 12 <meow word machine code> 13 <meow word machine code> 14 <meow word machine code> 15 <exit word machine code> 16 17 The 'exit' word even pops the exit status code from the 18 stack. Between that and all of the meowing, we're 19 getting extremely "conCATenative" here. Sorry. 20 21 So I need to figure out what step comes next. I need to: 22 23 1. Get user input from STDIN 24 2. Figure out how immediate mode will work 25 (currently, i start in compile mode and when 26 that's done, I execute whatever was compiled!) 27 3. Create the colon ':' and semicolon ';' words to 28 toggle compile mode (and create word definitions!) 29 30 I would also like to have introspection and diagnostics 31 and visualizations as early in this project as possible! 32 But for now, I'm gonna stay the course towards an 33 absolutely minimumal proof of concept. I want to be able 34 to type this: 35 36 : meow5 meow meow meow meow meow exit ; 37 meow5 38 39 And see (something like) this: 40 41 Meow. 42 Meow. 43 Meow. 44 Meow. 45 Meow. 46 BYE! 47 $ 48 49 So how about #2 and/or #3 from the list above - how 50 simple can the colon command be? 51 52 So I've updated the input string: 53 54 db 'meow : meow5 meow meow meow meow meow ; 55 meow5 exit', 0 56 57 (ignore the newline) 58 Which reads as: 59 60 1. call meow right now in "immediate" mode 61 2. : switches to compile mode and 62 3. store "meow5" as name 63 4. inline 5 meow words 64 5. ; writes tail (including saved name) and 65 6. switches back to immediate mode 66 7. call new meow5 word 67 8. exit 68 69 and have created a mode var and added imm/comp flags to 70 tails. todo: 71 72 [ ] colon word store name somewhere 73 [ ] find should also match mode flag (use &) 74 [ ] semicolon should write tail 75 [ ] immediate mode should find and exec words...somehow 76 77 Next two nights: Hmm...okay, so adding more words that 78 will execute as they're entered ("immediate" words) is 79 forcing me to deal with how they should return execution 80 to whatever called them. 81 82 To recap: 83 84 * Compiled code in meow5 will be concatenated 85 together, so there is no such thing as "return" 86 _within_ a compiled word - execution truly just 87 flows from the end of one word to the beginning of 88 the next. 89 90 * Many words (':' or 'colon' is an example), which 91 must be able to operate outside of a compiled word 92 because it is needed to do the compiling! 93 94 * Some words can execute _both_ ways in a single 95 definition. 'exit' is my only example currently - 96 it's simple because no part of the program needs 97 to execute after it's done, of course. 98 99 * A select few words will even need to be executed 100 from within the meow5 binary itself (in assembly) 101 to make the initial functionality of the 102 interpreter available. 'find' and 'inline' are two 103 such fundamental words. 104 105 * I've slowly been converting all of the traditional 106 procedure calls in this prototype into simple 107 jumps and manually keeping track of a single level 108 of return address. 109 110 Now the ':' command forces me to implement a return 111 stack for immediate execution, at the very least, 112 because it will need to call, for instance, 'get_token', 113 to get the name of the word being defined: 114 115 : meow 5 ...; 116 117 Here 'meow5' is the name of the new word. 118 119 Anyway, after sleeping on it, I think I'll solve this by 120 having macros to start and end a word in assembly. In 121 addition to taking care of the housekeeping duties of 122 creating the tail metadata, they'll also setup return 123 jumping and stack poppin'. The length of the word in the 124 tail will NOT include the return stuff so it won't be 125 included when the word is inlined. 126 127 Anyway, it makes sense in my head. 128 129 The basic word-making macros are easy enough: 130 131 %macro DEFWORD 1 ; takes name of word to make 132 %1: 133 %endmacro 134 135 %macro ENDWORD 3 136 end_%1: 137 ; todo: immediate "return" goes here 138 tail_%1: 139 dd LAST_WORD_TAIL ; linked list 140 %define LAST_WORD_TAIL tail_%1 141 dd (tail_%1 - %1) ; length of word 142 dd %3 ; flags 143 db %2, 0 ; name as string 144 %endmacro 145 146 I tested this and I'll spare you the GDB walkthrough. It 147 works and I was able to execute this word from my input 148 string. 149 150 DEFWORD foo 151 mov eax, 42 152 ENDWORD foo, "foo", IMMEDIATE 153 154 So I'll test a call/return action with this foo, then 155 convert them all. 156 157 It worked. Now converting... 158 159 Worked out some bugs. 160 161 Silly little mistakes. 162 163 Here's the thing: it's getting pretty annoying to have 164 to bust out GDB, guess where to set a break point, step 165 through the code, try to remember the C-dominated syntax 166 to print stuff, etc., only to find out that I forgot to 167 add a line or I put the wrong thing in a string data 168 declaration. 169 170 Don't get me wrong, I'm grateful for GDB. It's been a 171 good tool and I know I should probably re-learn some of 172 its customization options. 173 174 But what I really want is better debugging in my program 175 itself. 176 177 So I've added "word not found" handling in the main 178 routine, so it goes like this: 179 180 get_next_token: 181 CALLWORD get_token 182 if all done, jump to .run_it 183 CALLWORD find 184 if not found, jump to .token_not_found 185 CALLWORD inline 186 jmp get_next_token 187 188 .run_it: 189 jmp data_segment 190 191 .token_not_found: 192 print first part of error message 193 print token name 194 print last part of error message 195 196 I'll test it out: 197 198 input_buffer_start: 199 db 'honk meow meow meow meow meow exit', 0 200 201 $ mr 202 Could not find word "honk" 203 204 Excellent, that'll save me untold minutes of debugging 205 right there. 206 207 Now let's see if I've converted everthing to my new 208 macros DEFWORD ... ENDWORD properly: 209 210 $ mr 211 Meow! 212 Meow! 213 Meow! 214 Meow! 215 Meow! 216 Meow! 217 Meow! 218 ... 219 220 Oh no! I've got an infinite loop somehow. 221 222 Even though I'm putting in some of the "infrastructure" 223 for it, I'm not doing any immediate mode execution yet, 224 so it's nothing like that. 225 226 Nothing for it but to debug with GDB... 227 228 (gdb) break get_next_token.run_it 229 Breakpoint 1 at 0x80491c2: file meow5.asm, line 272. 230 ... 231 273 jmp data_segment ; jump to the "compiled" program 232 0x0804a054 in data_segment () 233 (gdb) 234 Single stepping until exit from function data_segment, 235 which has no line number information. 236 237 Oh, right. There's no debugger info for the machine code 238 I've inlined into memory and executed. 239 240 All the more reason to have debugging tools built into 241 my program itself. But I don't have those yet, so at 242 least GDB can give me a disassembly: 243 244 (gdb) disas &data_segment,&here 245 Dump of assembler code from 0x804a054 to 0x804a454: 246 0x0804a054 <data_segment+0>: mov $0x1,%ebx 247 => 0x0804a059: mov $0x804a006,%ecx 248 0x0804a05e: mov $0x6,%edx 249 0x0804a063: mov $0x4,%eax 250 0x0804a068: int $0x80 251 0x0804a06a: jmp *0x804a459 252 0x0804a070: mov $0x1,%ebx 253 0x0804a075: mov $0x804a006,%ecx 254 0x0804a07a: mov $0x6,%edx 255 0x0804a07f: mov $0x4,%eax 256 0x0804a084: int $0x80 257 0x0804a086: jmp *0x804a459 258 259 ... repeats three more times... 260 261 0x0804a0e0: pop %ebx 262 0x0804a0e1: mov $0x1,%eax 263 0x0804a0e6: int $0x80 264 0x0804a0e8: jmp *0x804a459 265 0x0804a0ee: add %al,(%eax) 266 0x0804a0f0: add %al,(%eax) 267 268 So the nice thing about 5 "meows" in a row is that the 269 repetition is really easy to spot. 270 271 The weird thing is that they all end with a jump back to 272 the exact same place near the beginning (but not exactly 273 at the begining) of the inlined code. 274 275 Where is that jump coming from? 276 277 Oh, ha ha, I found it almost immediately. It's the 278 "return" that I put in my ENDWORD macro. That's not 279 supposed to be inlined with the "compiled" version of 280 words and it's due to a silly mistake. 281 282 The last line here: 283 284 end_%1: 285 jmp [return_addr] 286 tail_%1: 287 dd LAST_WORD_TAIL 288 dd (tail_%1 - %1) 289 290 Should be: 291 292 dd (end_%1 - %1) 293 294 So the jmp [return_addr] doesn't get inlined! 295 296 I'll fix that. 297 298 And now? 299 300 (gdb) disas &data_segment,&here 301 Dump of assembler code from 0x804a054 to 0x804a454: 302 0x0804a054 <data_segment+0>: push %es 303 0x0804a055: mov 0x6ba0804,%al 304 0x0804a05a: add %al,(%eax) 305 0x0804a05c: add %bh,0x4(%eax) 306 0x0804a062: int $0x80 307 0x0804a064: jmp *0x804a459 308 0x0804a06a: push %es 309 0x0804a06b: mov 0x6ba0804,%al 310 0x0804a070: add %al,(%eax) 311 0x0804a072: add %bh,0x4(%eax) 312 0x0804a078: int $0x80 313 0x0804a07a: jmp *0x804a459 314 0x0804a080: push %es 315 ... 316 317 What on earth? That ain't right. 318 319 Next night: ohhhh...crud. Yeah, the problem is due to 320 the "return" code at the end of each word. My 321 dirt-simple inline is going to need an additional 322 length: there's a distance from the tail to the 323 beginning of the machine code and a separate length of 324 the machine code. (They used to be the same thing.) 325 326 The DEFWORD macro produces this for "meow": 327 328 meow: 329 ... 330 end_meow: 331 jmp [return_addr] 332 tail_meow: 333 ... 334 dd (end_meow - meow) 335 dd (tail_meow - meow) <-- need to add this 336 337 And any other code that reads the tail (I guess that's 338 just 'find' right now?) will also need to be updated. I 339 wonder if I should be storing these "tail offsets" in 340 NASM macros as constants so I don't have to hunt them 341 down if they change in the future? 342 343 Yeah, I'll do that too. In addition to making changes 344 painless, it will make my intent clearer in the code 345 than bare offset numbers and a comment ever could. 346 347 ; Memory offsets for each item in tail: 348 %define T_CODE_LEN 4 349 %define T_CODE_OFFSET 8 350 %define T_FLAGS 12 351 %define T_NAME 16 352 353 Inline is re-worked to use the length and offset of the 354 machine code in relation to the tail address: 355 356 DEFWORD inline 357 pop esi ; param1: tail of word to inline 358 mov edi, [here] ; destination 359 mov eax, [esi + T_CODE_LEN] ; get len of code 360 mov ebx, [esi + T_CODE_OFFSET] ; get start of code 361 sub esi, ebx ; set start of code for movsb 362 mov ecx, eax ; set len of code for movsb 363 rep movsb ; copy [esi]...[esi+ecx] into [edi] 364 add [here], eax ; save current position 365 ENDWORD inline, "inline", (IMMEDIATE) 366 367 Crossing fingers... 368 369 $ mr 370 Meow. 371 Meow. 372 Meow. 373 Meow. 374 Meow. 375 376 Yay, working again! 377 378 Now I can try to do something _new_ with these changes: 379 find immediate mode and compile mode words. 380 381 And to _really_ do this right, I'll use the FORTH colon 382 word ':' as my immediate/compile mode separator. 383 384 Here's my new "input buffer" string: 385 386 db 'meow meow : meow meow meow exit', 0 387 388 For now the definition of ':' will _just_ set the mode: 389 390 DEFWORD colon 391 mov dword [mode], COMPILE 392 ENDWORD colon, ":", (IMMEDIATE) 393 394 And I've got two different definitions of 'meow' all 395 ready to go. They're both called "meow" in the 396 dictionary, but one of them has an IMMEDIATE flag and 397 the other has the COMPILE flag to specify which mode 398 they should match. The only difference is that they 399 print different strings. 400 401 If all goes well, the "input buffer" string I set above 402 should print two immediate meows and then compile three 403 compile meows and an exit and then run that... 404 405 $ mr 406 Immediate Meow! 407 Immediate Meow! 408 Meow. 409 Meow. 410 Meow. 411 412 Wow! 413 414 So I guess I've done two of the four TODOs I set at the 415 start of this log above: 416 417 [ ] colon word store name somewhere 418 [x] find should also match mode flag (use &) 419 [ ] semicolon should write tail 420 [x] immediate mode should find and exec words...somehow 421 422 The colon word isn't storing the word name and there's 423 no semicolon yet, so I'm not adding the new words to the 424 dictionary yet, but I also made progress in other areas. 425 426 I'll start a new log now with the other two TODOs. 427 428 See you in log05.txt!