Howdy! So I've got immediate and compile modes basically working. Now I've got two TODOs carrying over from the last log and I'll add some new ones. I'm gonna put this in the order I think they should probably be done, too: [ ] turn return_addr into a stack so i can call words from words [ ] make a "print" word and use it [ ] colon word store name somewhere [ ] semicolon should write tail From there, I think the roadmap will be fairly clear and largely driven by whatever I think will be useful or fun to grow the language. * Print dictionary/word tails * How to handle variables and strings * Interactive REPL * ANSI colors * Print current stack (automatically in REPL?) * Write out any word as an ELF executable? * Other debugging features? * Some simple benchmarks - size vs speed This whole thing is just an experiment in simplicity by taking a concept to extremes (in a good way) so I want to keep it tiny and comprehensible. Too many features will detract from that, so I'll need to pick them carefully. Okay, I'll start with turning the single return address I'm currently using for immediate word calls into a proper return stack. Here's the BSS reservations: return_stack: resb 512 return_ptr: resb 4 And I've modified the CALLWORD and ENDWORD macros to push and pop the return stack appropriately. I think it's worth noting that I'm extremely not concerned about performance in immediate mode because I'm expecting that to be pretty much always be invoked by the user. The compiled code doesn't use this return stack because nothing ever gets called - it'll just be a continuous flow of instructions! Anyway, let's see what I've got when I try the new return stack: Program received signal SIGSEGV, Segmentation fault. get_next_token () at meow5.asm:73 73 mov dword [eax], %%return_to ; CALLWORD Oh, wait! I forgot to initialize the return_ptr so it starts off pointing at the start of return_stack! So of course it's gonna segfault - I'm trying to mov to address 0! Program received signal SIGSEGV, Segmentation fault. 0x0804a689 in return_ptr () Oops! And that one is because I was trying to jump to the address in return_ptr, not to where it _pointed to_! Program received signal SIGSEGV, Segmentation fault. 0x0804b000 in ?? () Yuck, that one will be in the compiled code. I think I'll call it a night. Next night: Nope! It was in immediate word execution and it was pretty silly. I'm doing a bit of a hack to perform immediate word execution - passing a register to the CALLWORD macro instead of an address literal. Turns out, the register I was using for this was eax. And, naturally, that's the register I used to implement the new call stack stuff in CALLWORD! Changing this: CALLWORD eax To this: CALLWORD ebx And let's see if we can at least do what we could do before I made my changes: $ mr Immediate Meow! Immediate Meow! Meow. Meow. Meow. Nice. So then I can move on to new functionality. Let's try a nested call. I'll make some simple test words: DEFWORD foo CALLWORD bar push 0 ; for exit CALLWORD exit ENDWORD foo, "foo", (IMMEDIATE) DEFWORD bar CALLWORD meow ENDWORD bar, "bar", (IMMEDIATE) And calling them with this input string: db 'meow bar foo', 0 Should result in three immediate meows and exit: $ mr Immediate Meow! Immediate Meow! Immediate Meow! Wow, nice. So I can allow words to call other words now, which will let me clean up some of the most redundant bits of code (like string printing). I think next will be a 'print' word that prints null-terminated strings. I'll have 'print' call a 'strlen' word to calculate a null-terminated string to get its length as well. Since this is the first time I'll have a real word that calls another word in Meow5, it occurs to me that this version of 'print' can *only* be an immediate mode word because my compiler won't be able to inline the 'strlen' machine code inside of print. Thus, only words composed entirely out of other words OR completely self-contained can be COMPILE mode words??? So at some point, I'll need to re-define 'print' by "compiling" it properly in the Meow5 fashion (inlining 'strlen') so it can be compiled into other words! Ah, jeez, and then I've got a whole chicken-and-egg problem where calling a "compiled" word needs to happen after it's been compiled - so I can't use it until it's been compiled. Which means I've got to be real careful about the "bootstrapping" order of my definitions. Well, that's what this whole wacky experiment is all about and by keeping it small, I can re-write as needed. MOVING ON, let's get this immediate mode-only print working!!! Okay, got it working. Here's 'strlen' and 'print' stripped of comments to show how simple they are and how nice and neat my token_not_found printing has become: DEFWORD strlen mov eax, [esp] mov ecx, 0 .find_null: cmp byte [eax + ecx], 0 je .done inc ecx jmp .find_null .done: push ecx ENDWORD strlen, "strlen", (IMMEDIATE & COMPILE) DEFWORD print CALLWORD strlen mov ebx, STDOUT pop edx pop ecx mov eax, SYS_WRITE int 0x80 ENDWORD print, "print", (IMMEDIATE) ... .token_not_found: push not_found_str1 CALLWORD print push token_buffer CALLWORD print push not_found_str2 CALLWORD print CALLWORD exit Having a print word to call reduced token_not_found from over 20 lines of repetitious code to this. But for the reasons I explain above, I can't use 'print' in my compiled 'meow' word or any other compiled words. Which left me in a bit of a quandry. How do I "inline" the 'strlen' code into 'print' in the meow5 interpreter when it's being assembled (or when it starts up). I _could_ define all compound words as strings and actually run the interpreter on them at startup... db ": print strlen print_with_len ;" ...but I feel like that's too inefficient even for this toy language. Or I could limit myself to a CALLWORD depth of one. Which would just require that I always call 'strlen' before print: .token_not_found: push not_found_str1 CALLWORD strlen CALLWORD print push token_buffer CALLWORD strlen CALLWORD print push not_found_str2 CALLWORD strlen CALLWORD print CALLWORD exit But yuck. And I feel like I'm likely to regret having that limitation later when I'm adding more advanced functionality? So how do I inline in NASM itself? Well, I've had all day to think about it and the answer was kind of staring me right in the face: macros! New TODO: [ ] Create word code macros so words can be inlined while in assembly. I can put the 'strlen' code in a macro and then use that to define a 'strlen' word _and_ embed it directly into 'print' as if I'd typed it. Then I did the same thing with 'print' itself (I wonder if there's a limit to the depth of these macros containing macros in NASM?) and then I could use it in the 'meow' words also. With the actual strlen and print code removed, here's what all four words look like now: %macro strlen_code 0 %endmacro DEFWORD strlen strlen_code ENDWORD strlen, "strlen", (IMMEDIATE & COMPILE) %macro print_code 0 strlen_code %endmacro DEFWORD print ; (straddr) print () print_code ENDWORD print, "print", (IMMEDIATE & COMPILE) DEFWORD imm_meow push imm_meow_str print_code ENDWORD imm_meow, "meow", (IMMEDIATE) DEFWORD meow push meow_str print_code ENDWORD meow, "meow", (COMPILE) And does it still work? db 'meow : meow meow meow exit', 0 Should print one immediate meow, three compiled meows and an exit: $ mr Immediate Meow! Meow. Meow. Meow. You bet it does. Using macros this way to inline code solves a lot of problems. I'm even wondering if I can remove the CALLWORD macros and the return aparatus, etc? Or wait, I still need to return when calling these in immediate mode in the interpreter, right? For now, I want to get the colon and semicolon words working so I can make the _real_ meow5 word as fortold by the prophets. Many nights later: I got sick and had to skip some evenings. Compound that with the fact that cascade of changes needed to let semicolon (';') compile the tail of a new word and I've got a bit of a challenge getting these notes in order. At the moment, I've done the following changes: 1. The tiny three-instruction 'return' code that gets appended to the end of all word definitions (so they can be called directly as functions) has been turned into a macro so it can be included in the ENDWORD macro and as a word of its own so it can be inlined by semicolon. 2. The 'inline' word's machine code is now also in a macro so it can be included elsewhere (such as semicolon). So yeah, 'inline' gets inlined! 3. Same with 'get_token' which is now needed by colon (':') to save the name of a new word definition: : meow5 meow meow meow meow meow ; Semicolon compiles the token "meow5" into the new word's tail so 'find' can find it in the dictionary. 4. For now, I have 'copystr' including 'strlen' (I use null-terminated strings), but the usage in tail makes me think these should be separated to make them more generalized. More about that in a moment. (NOTE: No, copystr doesn't actually contain 'strlen', I was wrong about that. 'print' is the only word that contains 'strlen'.) 5. 'print' is used in a number of places, so that's been turned into a macro as well. Actually, it looks like I already noted that would be happening above. But I hadn't yet settled on ALL CAPS for the code macros. Here's the entire 'print' definition right now: ; Prints a null-terminated string by address on stack. %macro PRINT_CODE 0 STRLEN_CODE ; (after: straddr, len) mov ebx, STDOUT ; write destination file pop edx ; strlen pop ecx ; start address mov eax, SYS_WRITE ; syscall int 0x80 ; interrupt to linux! %endmacro DEFWORD print ; (straddr) print () PRINT_CODE ENDWORD print, "print", (IMMEDIATE | COMPILE) 6. 'semicolon' is coming along nicely, but I was having trouble testing it until I realized I had completely forgotten to implement my new word flag: RUNCOMP. The RUNCOMP flag is my way of saying "this word runs immediately, even in COMPILE mode." It's exactly the same as Forth's "immediate" words, but I think my naming makes the meaning a little more explicit? At least, that's the intention. 7. I've gotten much more comfortable using the stack to store local values rather than trying to juggle registers or use global variables. My first real stack workout has been in 'semicolon', where I have ended up using it just a tiny amount, but I've already started to have minor revelations about how the stack helps me manage memory, but nothing I can put into words very well just yet. So given that update, my current TODOs to get "completely 100% legit word compilation from a string of input" working are in this order: [ ] Create new word: 'get_flags' will put 32 bits of flags from a word's tail on the stack. [ ] Create new word: 'is_runcomp' will leave 0 on the stack if RUNCOMP flag is NOT set in flags. [ ] Get 'semicolon' just working for now, which currently means having two duplicate calls to 'strlen', which is annoying, but if I change too much more, I'm going to have an impossible time debugging all of these changes at once! [ ] AFTER semicolon works, pull 'strlen' out of 'copystr' and maaayyybeee out of 'print' too, though I'm on the fence about that one. [ ] And beyond that, I'm thinking 'copystr' may not even need to exist, because it's a specialized case of copying memory...and I've already got an instance of that in 'inline' - so maybe the two of them should share that functionality in something like a 'copymem' word? (NOTE: After getting semicolon working, I realized that I was being super premature about trying to factor 'strlen' out of 'copystr' and avoiding looping through the word name string twice in semicolon. I mean, the word name is only 32 bytes MAXIMUM right now. Also, copystr doesn't even contain strlen - I had remembered that incorrectly. So scratch those last two TODOs!) Okay, now I've written the two new words and am calling them to check the word: ; We're in compile mode... CALLWORD get_flags CALLWORD is_runcomp pop eax ; get result cmp eax, 0 ; if NOT equal, word was RUNCOMP jne .exec_word ; yup, RUNCOMP CALLWORD inline ; nope, "compile" it. But I've got something wrong, because obviously this shouldn't be happening: $ mr Immediate Meow! Could not find word "meow5" I could certainly figure this out with GDB, but that's getting increasingly painful now that the interpreter is becoming more complex. So I think it's high time for more "print debugging" in the interpreter itself. First off, I've had a TODO to add the current mode to my word not found error message. So I'll start with that: $ mr Immediate Meow! Could not find word "meow5" while looking in COMPILE mode. Oh! I wasn't expecting that. We shouldn't be in COMPILE mode after semicolon (;). Breakpoint 1, semicolon () at meow5.asm:331 331 mov eax, [here] 332 push eax ; push end of machine code to stack push () at meow5.asm:73 73 mov eax, [return_ptr] ; current return stack pos 74 sub dword [return_ptr], 4 ; "pop" return stack 75 jmp [eax] ; go to return addr! Huh? It's calling 'return' right at the beginning of semicolon? What is...oh, jeez, I had RETURN_CODE of just "return" - I wanted the label as the address to inline, but I accidentally included the return function. Okay, that was just a dumb mistake made by a sleepy person. Wrong: push RETURN_CODE ; push what to inline on stack Fixed: push return ; push what to inline on stack Moving on, now I have no doubt I'll be stepping through semicolon for a while. Even though I worked it out on paper, there are so many opportunities for mistakes. Here we go... Program received signal SIGSEGV, Segmentation fault. semicolon () at meow5.asm:127 127 rep movsb ; copy [esi]...[esi+ecx] into [edi] That looks like 'inline' - so that'll be my use of the INLINE_CODE macro in semicolon. Let's see... (gdb) break semicolon Breakpoint 1 at 0x8049397: file meow5.asm, line 331. Starting program: /home/dave/meow5/meow5 Immediate Meow! Breakpoint 1, semicolon () at meow5.asm:331 331 mov eax, [here] 332 push eax ; push end of machine code to stack semicolon () at meow5.asm:339 339 push return ; push what to inline on stack semicolon () at meow5.asm:121 121 pop esi ; param1: tail of word to inline semicolon () at meow5.asm:122 122 mov edi, [here] ; destination We're in INLINE_CODE now. Let's see what we're currently pointing to: 123 mov eax, [esi + T_CODE_LEN] ; get len of code (gdb) p $eax $1 = -668105976 Uh...that's definitely not the correct length for the 'return' word's machine code... Oh, another bone-headed mistake. Perhaps you saw it above when I tried to fix it and were screaming at me? Sorry, I couldn't hear you. Still wrong: push return ; push what to inline on stack Fixed for real this time: push tail_return ; push what to inline on stack Yeah, I wasn't thinking straight. The label "return" points to the beginning of the word's machine code. The label "tail_return" points to the tail...which is what the 'inline' word is expecting! Alright, with that fixed: $ mr Immediate Meow! Could not find word "meow5" while looking in IMMEDIATE mode. Yeah, still broken, but in a much better way. I can see that semicolon has at least returned us to IMMEDIATE mode. Probably just a simple bug in semicolon compiling the new word's tail... ...yup! Just an ebx where I should have had an eax. The last word address wasn't being updated for the dictionary linked list. Crossing fingers... $ mr Immediate Meow! Meow. Meow. Meow. Gasp! It works! In some ways, this is 100% of the original project goal. Oh, wait, almost. That's just 3 meows. Ha ha. One second... input_buffer: db ': meow5 meow meow meow meow meow ; ' db 'meow5 ' db 'newline ' db 'exit',0 And drum-roll please... $ mr Meow. Meow. Meow. Meow. Meow. Yahoo!!! That completes the TODO list I'd built up: [x] Turn return_addr into a stack so i can call words from words (oops! turns out that doesn't make sense when words are inlined...) [x] make a "print" word and use it [x] colon word store name somewhere [x] semicolon should write tail [x] Create word code macros so words can be inlined while in assembly. [x] Create new word: 'get_flags' will put 32 bits of flags from a word's tail on the stack. [x] Create new word: 'is_runcomp' will leave 0 on the stack if RUNCOMP flag is NOT set in flags. [x] Get 'semicolon' working.