The last update was very exciting. Now I'm actually reading a single word's definition from a string, inlining all of the code into memory, and executing it. To put it in concrete terms, this 'meow5' definition: "meow meow meow meow meow exit" Was turned into this in memory: The 'exit' word even pops the exit status code from the stack. Between that and all of the meowing, we're getting extremely "conCATenative" here. Sorry. So I need to figure out what step comes next. I need to: 1. Get user input from STDIN 2. Figure out how immediate mode will work (currently, i start in compile mode and when that's done, I execute whatever was compiled!) 3. Create the colon ':' and semicolon ';' words to toggle compile mode (and create word definitions!) I would also like to have introspection and diagnostics and visualizations as early in this project as possible! But for now, I'm gonna stay the course towards an absolutely minimumal proof of concept. I want to be able to type this: : meow5 meow meow meow meow meow exit ; meow5 And see (something like) this: Meow. Meow. Meow. Meow. Meow. BYE! $ So how about #2 and/or #3 from the list above - how simple can the colon command be? So I've updated the input string: db 'meow : meow5 meow meow meow meow meow ; meow5 exit', 0 (ignore the newline) Which reads as: 1. call meow right now in "immediate" mode 2. : switches to compile mode and 3. store "meow5" as name 4. inline 5 meow words 5. ; writes tail (including saved name) and 6. switches back to immediate mode 7. call new meow5 word 8. exit and have created a mode var and added imm/comp flags to tails. todo: [ ] colon word store name somewhere [ ] find should also match mode flag (use &) [ ] semicolon should write tail [ ] immediate mode should find and exec words...somehow Next two nights: Hmm...okay, so adding more words that will execute as they're entered ("immediate" words) is forcing me to deal with how they should return execution to whatever called them. To recap: * Compiled code in meow5 will be concatenated together, so there is no such thing as "return" _within_ a compiled word - execution truly just flows from the end of one word to the beginning of the next. * Many words (':' or 'colon' is an example), which must be able to operate outside of a compiled word because it is needed to do the compiling! * Some words can execute _both_ ways in a single definition. 'exit' is my only example currently - it's simple because no part of the program needs to execute after it's done, of course. * A select few words will even need to be executed from within the meow5 binary itself (in assembly) to make the initial functionality of the interpreter available. 'find' and 'inline' are two such fundamental words. * I've slowly been converting all of the traditional procedure calls in this prototype into simple jumps and manually keeping track of a single level of return address. Now the ':' command forces me to implement a return stack for immediate execution, at the very least, because it will need to call, for instance, 'get_token', to get the name of the word being defined: : meow 5 ...; Here 'meow5' is the name of the new word. Anyway, after sleeping on it, I think I'll solve this by having macros to start and end a word in assembly. In addition to taking care of the housekeeping duties of creating the tail metadata, they'll also setup return jumping and stack poppin'. The length of the word in the tail will NOT include the return stuff so it won't be included when the word is inlined. Anyway, it makes sense in my head. The basic word-making macros are easy enough: %macro DEFWORD 1 ; takes name of word to make %1: %endmacro %macro ENDWORD 3 end_%1: ; todo: immediate "return" goes here tail_%1: dd LAST_WORD_TAIL ; linked list %define LAST_WORD_TAIL tail_%1 dd (tail_%1 - %1) ; length of word dd %3 ; flags db %2, 0 ; name as string %endmacro I tested this and I'll spare you the GDB walkthrough. It works and I was able to execute this word from my input string. DEFWORD foo mov eax, 42 ENDWORD foo, "foo", IMMEDIATE So I'll test a call/return action with this foo, then convert them all. It worked. Now converting... Worked out some bugs. Silly little mistakes. Here's the thing: it's getting pretty annoying to have to bust out GDB, guess where to set a break point, step through the code, try to remember the C-dominated syntax to print stuff, etc., only to find out that I forgot to add a line or I put the wrong thing in a string data declaration. Don't get me wrong, I'm grateful for GDB. It's been a good tool and I know I should probably re-learn some of its customization options. But what I really want is better debugging in my program itself. So I've added "word not found" handling in the main routine, so it goes like this: get_next_token: CALLWORD get_token if all done, jump to .run_it CALLWORD find if not found, jump to .token_not_found CALLWORD inline jmp get_next_token .run_it: jmp data_segment .token_not_found: print first part of error message print token name print last part of error message I'll test it out: input_buffer_start: db 'honk meow meow meow meow meow exit', 0 $ mr Could not find word "honk" Excellent, that'll save me untold minutes of debugging right there. Now let's see if I've converted everthing to my new macros DEFWORD ... ENDWORD properly: $ mr Meow! Meow! Meow! Meow! Meow! Meow! Meow! ... Oh no! I've got an infinite loop somehow. Even though I'm putting in some of the "infrastructure" for it, I'm not doing any immediate mode execution yet, so it's nothing like that. Nothing for it but to debug with GDB... (gdb) break get_next_token.run_it Breakpoint 1 at 0x80491c2: file meow5.asm, line 272. ... 273 jmp data_segment ; jump to the "compiled" program 0x0804a054 in data_segment () (gdb) Single stepping until exit from function data_segment, which has no line number information. Oh, right. There's no debugger info for the machine code I've inlined into memory and executed. All the more reason to have debugging tools built into my program itself. But I don't have those yet, so at least GDB can give me a disassembly: (gdb) disas &data_segment,&here Dump of assembler code from 0x804a054 to 0x804a454: 0x0804a054 : mov $0x1,%ebx => 0x0804a059: mov $0x804a006,%ecx 0x0804a05e: mov $0x6,%edx 0x0804a063: mov $0x4,%eax 0x0804a068: int $0x80 0x0804a06a: jmp *0x804a459 0x0804a070: mov $0x1,%ebx 0x0804a075: mov $0x804a006,%ecx 0x0804a07a: mov $0x6,%edx 0x0804a07f: mov $0x4,%eax 0x0804a084: int $0x80 0x0804a086: jmp *0x804a459 ... repeats three more times... 0x0804a0e0: pop %ebx 0x0804a0e1: mov $0x1,%eax 0x0804a0e6: int $0x80 0x0804a0e8: jmp *0x804a459 0x0804a0ee: add %al,(%eax) 0x0804a0f0: add %al,(%eax) So the nice thing about 5 "meows" in a row is that the repetition is really easy to spot. The weird thing is that they all end with a jump back to the exact same place near the beginning (but not exactly at the begining) of the inlined code. Where is that jump coming from? Oh, ha ha, I found it almost immediately. It's the "return" that I put in my ENDWORD macro. That's not supposed to be inlined with the "compiled" version of words and it's due to a silly mistake. The last line here: end_%1: jmp [return_addr] tail_%1: dd LAST_WORD_TAIL dd (tail_%1 - %1) Should be: dd (end_%1 - %1) So the jmp [return_addr] doesn't get inlined! I'll fix that. And now? (gdb) disas &data_segment,&here Dump of assembler code from 0x804a054 to 0x804a454: 0x0804a054 : push %es 0x0804a055: mov 0x6ba0804,%al 0x0804a05a: add %al,(%eax) 0x0804a05c: add %bh,0x4(%eax) 0x0804a062: int $0x80 0x0804a064: jmp *0x804a459 0x0804a06a: push %es 0x0804a06b: mov 0x6ba0804,%al 0x0804a070: add %al,(%eax) 0x0804a072: add %bh,0x4(%eax) 0x0804a078: int $0x80 0x0804a07a: jmp *0x804a459 0x0804a080: push %es ... What on earth? That ain't right. Next night: ohhhh...crud. Yeah, the problem is due to the "return" code at the end of each word. My dirt-simple inline is going to need an additional length: there's a distance from the tail to the beginning of the machine code and a separate length of the machine code. (They used to be the same thing.) The DEFWORD macro produces this for "meow": meow: ... end_meow: jmp [return_addr] tail_meow: ... dd (end_meow - meow) dd (tail_meow - meow) <-- need to add this And any other code that reads the tail (I guess that's just 'find' right now?) will also need to be updated. I wonder if I should be storing these "tail offsets" in NASM macros as constants so I don't have to hunt them down if they change in the future? Yeah, I'll do that too. In addition to making changes painless, it will make my intent clearer in the code than bare offset numbers and a comment ever could. ; Memory offsets for each item in tail: %define T_CODE_LEN 4 %define T_CODE_OFFSET 8 %define T_FLAGS 12 %define T_NAME 16 Inline is re-worked to use the length and offset of the machine code in relation to the tail address: DEFWORD inline pop esi ; param1: tail of word to inline mov edi, [here] ; destination mov eax, [esi + T_CODE_LEN] ; get len of code mov ebx, [esi + T_CODE_OFFSET] ; get start of code sub esi, ebx ; set start of code for movsb mov ecx, eax ; set len of code for movsb rep movsb ; copy [esi]...[esi+ecx] into [edi] add [here], eax ; save current position ENDWORD inline, "inline", (IMMEDIATE) Crossing fingers... $ mr Meow. Meow. Meow. Meow. Meow. Yay, working again! Now I can try to do something _new_ with these changes: find immediate mode and compile mode words. And to _really_ do this right, I'll use the FORTH colon word ':' as my immediate/compile mode separator. Here's my new "input buffer" string: db 'meow meow : meow meow meow exit', 0 For now the definition of ':' will _just_ set the mode: DEFWORD colon mov dword [mode], COMPILE ENDWORD colon, ":", (IMMEDIATE) And I've got two different definitions of 'meow' all ready to go. They're both called "meow" in the dictionary, but one of them has an IMMEDIATE flag and the other has the COMPILE flag to specify which mode they should match. The only difference is that they print different strings. If all goes well, the "input buffer" string I set above should print two immediate meows and then compile three compile meows and an exit and then run that... $ mr Immediate Meow! Immediate Meow! Meow. Meow. Meow. Wow! So I guess I've done two of the four TODOs I set at the start of this log above: [ ] colon word store name somewhere [x] find should also match mode flag (use &) [ ] semicolon should write tail [x] immediate mode should find and exec words...somehow The colon word isn't storing the word name and there's no semicolon yet, so I'm not adding the new words to the dictionary yet, but I also made progress in other areas. I'll start a new log now with the other two TODOs. See you in log05.txt!