I left myself a nice easy one to start this log: [x] Make all words take params from the stack, not from pre-defined registers. Which ought to be simple: just push the values I need before calling the function. Then have the function pop the values into the registers and off we go. +----------------------------------------------------+ | NOTE: I'm still using call/ret to use the | | 'find' and 'inline' words when the program | | initially runs. I've got a bit of a | | chicken-and-egg problem here because without a | | return, these won't seamlessly move on to the | | next instructions when they're done and I | | can't find or inline them because THEY are | | 'find' and 'inline'! | | | | I feel like that will be solved when I've got | | more of the interpreter or REPL in place. If | | not, I've got a puzzle on my hands. For the | | moment, things are just a bit...messy. | +----------------------------------------------------+ Well, I thought that was going to be easy. I mean, it is pretty easy. But it has a few snags I hadn't yet considered. It turns out, using the same stack for call/ret return address storage AND for passing values between functions in a truly concatenative manner gets real complicated real quick. And since I was using call/ret temporarily anyway, I have zero desire to do anything fancy to make it work. So I'm going to basically do my own return by storing a return address and jumping to it at the end of both 'inline' and 'find'. I'm making a new variable in BSS to hold my return address. I only need one, not a stack, because I'm not making any nested calls. temp_return_addr: resb 4 I'll put in a mockup of the code to get the assembled instruction lengths right (I hope) so I can figure out the address we just jump back to as a "return". (I'm pretty sure I can't just store the instruction pointer register because that'll be a point before the "call" jump and then I'll have an infinite loop, right?) I'll use NASM's listing feature for that. It comes out super wide (well, compared to the 60 columns I give myself on my little split screen setup!), so I'll see if I can reformat it enough to fit here: 152 00DA 68[0600] push temp_meow_name 153 00DD 66C706[0904]- mov dword [temp_return_addr], $ 153 00E2 [DD000000] 154 00E6 EB8A jmp find 155 00E8 6650 push eax The listing is so fun to look at and I find it almost fetishisticly beautiful. I mean, I've had all of these _questions_ about how all of this actually works and here, if you can read them, are all of the _answers_. I mean, I know the CPU still has secrets down below even this machine code layer. But for the application programmer, this is _it_. This is the bedrock upon which we lay all of our hopes and dreams. In the hex column on the left are the real instructions, no longer hidden by mnemonics or symbols. Anyway, where was I? Oh, yeah, so '$' is NASM for "the address at the beginning of this line". Which is very handy. And that's exactly what's gonna be put into temp_return_addr: 0904 is little-endian for temp_return_addr at 0409 (which I could see further up my listing) DD000000 is the address returned by $ (again, in little-endian) And assuming the assembled code won't change, it looks like I want my return address to be an additional... E8 - DD ...bytes. Which, uh, I'll ask my cat to subtract for me. Hmm. No, he purred, but was not forthcoming with the answer. Okay, how about dc? $ dc 16 i e8 dd - p dc: 'e' (0145) unimplemented E8 DD - p 11 Okay, so dc hissed at me once for not entering the hex values in upper case. So score one point for my cat. But then it gave me the correct answer after that, so score one point for dc. Looks like this match is even. So I wanna add 11 bytes to my return addresses. Here's the new listing: 153 00DD 66C706[0904]- mov dword [temp_return_addr], ($ + 11) 153 00E2 [E8000000] 154 00E6 EB8A jmp find 155 00E8 6650 push eax Looks right to me, we want to jump ("ret") back to 00E8 after the jump ("call") to find. Of course, this seems super fragile, but it's also super temporary. Let's just see if it works... Okay, dang it, a segfault. My changes have required another change and now the addresses are a little different, but the 11 bytes should still be the same: 154 000000DF 66C706[0904]- mov dword [temp_return_addr], ($ + 11) 154 000000E4 [EA000000] 155 000000E8 EB88 jmp find 156 000000EA 6650 push eax Let's try it now: (gdb) break find.found_it Breakpoint 1, find.found_it () at meow5.asm:116 116 mov eax, edx ; pointer to tail of dictionary word (gdb) p/a $edx $1 = 0x8049030 So far so good. Now the return jump? (gdb) s 117 jmp [temp_return_addr] (gdb) p/a (int)temp_return_addr $2 = 0x80490df And just where might that be, exactly? 0x080490d4 <+5>: c7 05 19 a4 04 08 df 90 04 08 movl $0x80490df, 0x804a419 0x080490de <+15>: eb 88 jmp 0x8049068 0x080490e0 <+17>: 50 push %eax 0x080490e1 <+18>: e8 57 ff ff ff call 0x804903d Hmmm...looks off by 1. 0x80490df points to the second byte of the jmp find instruction... Program received signal SIGSEGV, Segmentation fault. 0x080490df in inline_a_meow () at meow5.asm:155 155 jmp find ; answer will be in eax Yeah. So... 12 bytes? And to think, I waxed all poetic about the NASM listing. I don't know how to explain the byte discrepancy. Let's see if this works: (gdb) break find.found_it Breakpoint 1, find.found_it () at meow5.asm:116 116 mov eax, edx ; pointer to tail of dictionary word (gdb) s 117 jmp [temp_return_addr] (gdb) s inline_a_meow () at meow5.asm:156 156 push eax ; put it on the stack for inline Yes! But that was evidently even _more_ fragile than I'd expected. So I'll just bit the bullet and hold my nose and use some temporary labels. It's still quite compact, so I'll just paste it here: push temp_meow_name ; the name string to find mov dword [temp_return_addr], t1 jmp find ; answer will be in eax t1: push eax ; put it on the stack for inline mov dword [temp_return_addr], t2 jmp inline t2: dec byte [meow_counter] jnz inline_a_meow ; inline exit push temp_exit_name ; the name string to find mov dword [temp_return_addr], t3 jmp find ; answer will be in eax t3: push eax ; put it on the stack for inline mov dword [temp_return_addr], t4 jmp inline t4: ; Run! push 0 ; push exit code to stack for exit jmp data_segment ; jump to the "compiled" program Does it work? dave@cygnus~/meow5$ mr Meow. Meow. Meow. Meow. Meow. Yes! One last thing, now - 'find' is still leaving its answer in the eax register. If I have it push the answer to the stack instead, 'inline' will pop it and have what it needs - no need for that "push eax" beween the two functions/words (at labels t1 and t3 above). Now find.not_found and find.found_it push their return values on the stack: .not_found: push 0 ; return 0 to indicate not found jmp [temp_return_addr] .found_it: push edx ; return pointer to tail of dictionary word jmp [temp_return_addr] And the calls simply flow one after the other without any explicit data passing: jmp find t1: mov dword [temp_return_addr], t2 jmp inline t2: ... And does _that_ work? dave@cygnus~/meow5$ mr Meow. Meow. Meow. Meow. Meow. Yes, and now I can check that little box at the top of this log. We're doing pure stack-based concatenative programming now. Next step: [x] Parse the string "meow meow meow meow meow exit" as a program (pretend we're already in "compile mode" and we're gathering word tokens and compiling them) and execute it. It begins! Here's the string in the .data segment: input_buffer_start: db 'meow meow meow meow meow exit', 0 input_buffer_end: And here's the .bss segment "variables": token_buffer: resb 32 ; For get_token input_buffer_pos: resb 4 ; Save position of read tokens Yup, just 32 chars for token names (well, 31 because I'm null-terminating the string). Hey, it's my language. Ha ha, I can always bump this up later. But 31 is actually quite long, you know? abcdefghijklmnopqrstuvwxyz01234 I've created a word called 'get_token' which will do the job of both 'WORD' and 'KEY' in Forth. And I was just about to 'call' it to test it, but I can't bear to put in another manual temporary label So, it's macro time! %macro CALLWORD 1 mov dword [return_addr], %%return_to jmp %1 %%return_to: %endmacro And it should be super easy to use. First, I'll test my temporary "manual" 'meow' and 'exit' inlines to make sure it works. They're about to go away, but they'll make a good test. Look at how clean the 'exit' one is: push temp_exit_name ; the name string to find CALLWORD find CALLWORD inline But does it work? dave@cygnus~/meow5$ mr Meow. Meow. Meow. Meow. Meow. First try! No way. I mean, of _course_ it worked first try and I never doubted it would. Okay, now let's get into this get_token function: (gdb) break get_next_token Breakpoint 1 at 0x804912e: file meow5.asm, line 189. (gdb) r Starting program: /home/dave/meow5/meow5 Breakpoint 1, get_next_token () at meow5.asm:189 189 mov dword [return_addr], %%return_to ; CALLWORD 190 jmp %1 ; CALLWORD 150 mov ebx, [input_buffer_pos] ; set input read addr 151 mov edx, token_buffer ; set output write addr 152 mov ecx, 0 ; position index 154 mov al, [ebx + ecx] ; input addr + position index 155 cmp al, 0 ; end of input? (gdb) p/c $al $2 = 109 'm' Nice! So the 'm' from the first 'meow' has been collected so far. Now the rest of the token... (gdb) break 155 Breakpoint 2 at 0x80490c7: file meow5.asm, line 155. (gdb) c 155 cmp al, 0 ; end of input? (gdb) p/c $al $3 = 101 'e' ... $4 = 111 'o' $5 = 119 'w' $6 = 32 ' ' We have 'meow' and the space should be our token separator. 155 cmp al, 0 ; end of input? 156 je .end_of_input ; yes 157 cmp al, ' ' ; token separator? (space) 158 je .return_token ; yes 170 add [input_buffer_pos], ecx ; save input position 171 mov [edx + ecx], byte 0 ; terminate str null Looks good. Did we collect 4 characters as expected? (gdb) p $ecx $7 = 4 Yup. Then get_token will "return" the token string address so 'find' can use it to find the 'meow' word: 172 push DWORD token_buffer ; return str address 173 jmp [return_addr] 219 cmp DWORD [esp], 0 ; check return without popping 220 je run_it ; all out of tokens! 189 mov dword [return_addr], %%return_to ; CALLWORD 190 jmp %1 ; CALLWORD 96 pop ebp ; first param from stack! find () at meow5.asm:99 99 mov edx, [last] Okay, the execution looks right. And did we pass the address correctly on the stack? (gdb) p $ebp $9 = (void *) 0x804a43d Yup! And does it contain the expected 'meow' token? (gdb) x/s $ebp 0x804a43d : "meow" Nice! I'm going to assume 'find' and 'inline' will work correctly. Let's see if we can get the next token from the input string: (gdb) c Continuing. Breakpoint 2, get_token.get_char () at meow5.asm:155 155 cmp al, 0 ; end of input? Alright, we're back in get_token. This should be the first character of the second 'meow' token: (gdb) p/c $al $10 = 32 ' ' Uh oh. That doesn't look right. I'll continue anyway... (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. inline () at meow5.asm:76 76 mov ecx, [esi + 4] ; get len into ecx Yeah, that makes sense. 'find' will have failed to find the '' token and then 'inline' crashes when trying to read from address 0 (the null pointer return value from 'find'). The best way to handle this is probably to ignore any leading spaces - that will not only be useful later, it will take care of this current character problem. In a higher-level language, I might choose to do this with nested logic, like so: if (char === ' ') if (token.len > 0) 'eat' space (move to next input char) else return the token end end But in assembly, this all gets flattened. It's a surprisingly interesting exercise to formulate the logic in terms of jumps. (At least at first. I'm sure the novelty wears off after a while.) Anyway, here's my solution: cmp al, ' ' ; token separator? (space) jne .add_char ; nope! get char cmp ecx, 0 ; yup! do we have a token yet? je .eat_space ; no jmp .return_token ; yes, return it .eat_space: inc ebx ; 'eat' space by advancing input jmp .get_char I'll make sure that works in GDB. I changed the input string to: db ' meow meow meow meow meow exit', 0 with a leading space and two spaces before the second meow token. That'll make it easy to test: 155 cmp al, 0 ; end of input? (gdb) p/c $al $1 = 32 ' ' 157 cmp al, ' ' ; token separator? (space) 158 jne .add_char ; nope! get char 162 cmp ecx, 0 163 je .eat_space 171 inc ebx 172 jmp .get_char Yup! That line 171 is my "eat the leading space" action and now we should get the 'm' in "meow" and store it: 154 mov al, [ebx + ecx] ; input addr + position index 155 cmp al, 0 ; end of input? 156 je .end_of_input ; yes (gdb) p/c $al $2 = 109 'm' 157 cmp al, ' ' ; token separator? (space) 158 jne .add_char ; nope! get char 175 mov [edx + ecx], al ; write character Yeah. This is looking good. You know what? I'm just gonna go for it: dave@cygnus~/meow5$ mr Meow. Meow. Meow. Meow. Meow. Yes! So that's another item checked off! This is really coming along. At nearly 500 lines, this log is complete. I'll see you in the next one, log04.txt. :-)