Welcome back! In the last log, I got string literals
    working with the 'quote' word.

    One thing lead to another and I ended up solving a lot
    of different problems so that now I can define words
    not just from "primitives" (machine code words written
    in assembler), but also words composed from *other*
    non-primitive words.

    So yeah, you can only make programs that print strings,
    but they are real programs.

    Okay, so now the fun TODOs!

        [ ] Pretty-print meta info about word!
        [ ] Loop through dictionary, list all names
        [ ] Loop through dictionary, pretty-print all

    (And I'm not being sarcastic or something - I'm really
    looking forward to these.)

    First, I need general number handling! I need to parse
    numbers from input and I need to be able to print
    numbers as strings.

        [ ] New word: str2num (ascii to integer)
        [ ] New word: num2str (integer to ascii)

    I'd also like to have the ability to intuitively append
    strings. Ideally:

        "There are " 5 " chickens."
        "Your name is '" username "'?"

    (I also don't have variables yet, so the username one is
    even more hypothetical. But you get the idea.)

    My new 'str2num' word seems to work. I realized I could
    use the exit status of the epplication since it is
    already popping a number from the stack (which I'd
    completely forgotten about, ha ha).

    I updated my `mr` (meow5 run) alias to print the exit
    status:

        alias mr=`./build.sh run ; echo "Exit status: $?"'

    The current parse/print number radix is set in a new
    "variable" currently set to decimal on startup:

        mov dword [radix], 10

    And created these words to manage the radix:

        radix (takes number from stack)
        dec  - sets radix to 10
        hex  - sets radix to 16
        oct  - sets radix to 8
        bin  - sets radix to 2

    Now I'll try str2num out with a decimal number:

        42 exit

$ mr
Exit status: 42

    And I'll mess with it by giving it a hex number without
    updating the radix to hex:

        ff exit

$ mr
Could not find word "ff" while looking in IMMEDIATE mode.
Exit status: 1

    Now I'll compact the results a bit to not take up too
    much room here:

hex ff exit         <-- hexadecimal
Exit status: 255


oct 100 exit        <-- octal
Exit status: 64

bin 101010 exit     <-- binary
Exit status: 42         (uh...seems right)
bin 11111111 exit
Exit status: 255        (yup, that's definitely right)

    Alright! Now this is getting fun. It'll be even better
    when it's taking input from STDIN to be fully
    interactive.

    Number printing is interesting. So I could just have it
    print numbers when requested, like Forth's DOT (.) word.
    But to print some mixed strings and numbers would look
    like this:

        "You own " print 16 printnum " snakes." print

    And like I wrote above in this log, I'd rather have
    strings and numbers automatically appended like this:

        "You own " 16 " snakes." print

    But I'm not sure if that's a good idea or not because
    I'm going to either have to further complicate my
    interpreter (it's already doing look-ahead for quotes so
    I don't have to have a space like in Forth: " foo") or
    some up with something else.

    One thing I've thought about is having interpolated
    strings. As long as I have some unambiguous symbol, I
    could have it pop numbers off the stack and append them
    to the string as I go:

        16 "You own $ snakes." print

    I planed to have escapes anyway, so \$ for a literal '$'
    is no problem.

    Okay, I like that.

    So I need to write num2str now and then add number
    interpolation to my 'quote' word. This is gonna be cool!

    Next night: Wrote num2str. I've got a division error.
    Here's how I'm testing:

        ; test num2str
        push 42           ; num
        push dword [free] ; store here
        CALLWORD num2str
        pop eax        ; chars written

    First, let's double-check the input:

(gdb) break num2str
528	    pop ebp ; address of string destination
num2str () at meow5.asm:529
529	    pop eax ; number
num2str () at meow5.asm:530
530	    mov ecx, 0 ; counter of digit characters
531	    mov ebx, [var_radix]
532	    mov edx, 0
(gdb) p/a $ebp
$1 = 0x804b114 <data_area>  <-- addr for string storage
(gdb) p $eax
$2 = 42                     <-- num to convert
(gdb) p $ebx
$3 = 10                     <-- radix

    That's all good:

        string storage addr: 0x804b114
             num to convert: 42
                      radix: 10

    Now we'll start converting. I divide the number by the
    radix. The remainder is the digit that was less than the
    radix (so 0-9 for dec, 0-f for hex, etc.). The result of
    the division is the rest of the number.

(gdb) s
num2str.divide_next () at meow5.asm:534
534	    idiv ebx     ; eax / ebx = eax, remainder in edx
(gdb) p $eax
$6 = 4                      <-- quotient
(gdb) p $ebx
$7 = 10                     <-- radix (double-checking)
(gdb) p $edx
$8 = 2                      <-- remainder (digit to convert)

    Yup, that looks good. We've got the first digit (working
    lowest to highest) to convert to a character (2) and the
    rest of the number with the first digit gone (4).

    I convert to the character equivalent and store that on
    the stack temporarily. This is because I convert
    starting from the least significant digit, but we want
    the string to have the number in the expected order,
    starting with the greatest significant digit.

(gdb) s
535	    cmp edx, 9   ; digit bigger than 9? (radix allows a-z)
536	    jg .toalpha  ; yes, convert to 'a'-'z'
537	    add edx, '0' ; no, convert to '0'-'9'
538	    jmp .store_char
num2str.store_char () at meow5.asm:542
542	    push edx ; put on stack (pop later to reverse order)
543	    inc ecx
544	    cmp eax, 0        ; are we done converting?
545	    jne .divide_next  ; no, loop

    Now it's on to the next division:

Program received signal SIGFPE, Arithmetic exception.
num2str.divide_next () at meow5.asm:534
534	    idiv ebx     ; eax / ebx = eax, remainder in edx

    There it is. Well, I'm not dividing by 0, so that's
    good. But then I remember idiv is signed division, and
    I'm not doing signed integers just yet. Maybe that's it?

    Ah, a quick look at a reference:

        "32-bit division with IDIV requires EAX be
        sign-extended into EDX."

    Yeah, I don't want to do signed division anyway because
    I don't have the ability to parse or print negative
    numbers (yet?) so I should really be using DIV anyway:

Program received signal SIGFPE, Arithmetic exception.

    Nope! I know this isn't a divide-by-zero problem...

    Oh, I needed to read a bit more:

        "DIV Always divides the 64 bits value accross
        EDX:EAX by a value."

    Oh, right, I need to clear EDX to 0 in my division loop
    because it currently holds the character to convert. It
    makes total sense that the answer is overflowing. I was
    dividing by a huge 64-bit number the second time around!

    So I just need to clear edx before my division:

533	    mov edx, 0   ; div actually divides edx:eax / ebx!
534	    div ebx      ; eax / ebx = eax, remainder in edx
535	    cmp edx, 9   ; digit bigger than 9? (radix allows a-z)
(gdb) p $eax
$1 = 0
(gdb) p $edx
$2 = 4

    That's got it! The final digit (remainder after
    division) is 4 and there's nothing left to divide after
    that (quotient is 0).

    Once it sees the rest of the number is now 0, I store
    the string at the address provided and return the number
    of characters written (digits in radix) on the stack:

554	    push eax  ; return num chars written
end_num2str () at meow5.asm:149
149	    mov eax, [return_addr] ; RETURN
(gdb) p $eax
$1 = 2
(gdb) x/s $ebp
0x804b114 <data_area>:	"42"

    Correct! Two characters were written, and the string is
    "42", my value in decimal!

    Since I can now, I'll make the test print real nice and
    use the exit value to show the number of digits written
    (to make sure it's correct):

        ; test num2str
        CALLWORD decimal
        push dword 42     ; num
        push dword [free] ; store here
        CALLWORD num2str
        PRINTSTR "Answer: "
        push dword [free] ; print from here
        CALLWORD print
        CALLWORD newline
        CALLWORD exit     ; stack still has chars written

$ mr
Answer: 42
Exit status: 2

    Correct. Let's see some others:

        CALLWORD oct
        push dword 64

Answer: 100
Exit status: 3

        CALLWORD hex
        push dword 3735928559

Answer: deadbeef
Exit status: 8

        CALLWORD bin
        push dword 7

Answer: 111
Exit status: 3

        CALLWORD bin
        push dword 257

Answer: 100000001
Exit status: 9

    Looks good! Now I'm kinda wishing the interpreter was
    interactive because that would have been way more fun
    than changing the assembly test for each run. But
    that'll be coming up soon enough.

    After sleeping on it, I'm really excited about the idea
    of string interpolation using '$' as a placeholder. So
    I'll make that the next todo:

        [ ] Add "$" placeholders so the 'quote' word so it
            can interpolate numbers from the stack into the
            string.

    I suspect the hard part will be remembering to push the
    stack values in the opposite order that they appear in
    the string...so we'll see what that's like in practice.

    Okay, I've got it working, but only just barely. It
    feels incredibly fragile. In particular, the interpreter
    has become way too complicated for my taste. Juggling
    registers willy-nilly and using the stack when I run out
    of places to put things is just sloppy.

    It does work. Cleaned up example:

        42 "The answer is $." print
        42 hex "The answer is 0x$ in hex." print
        42 bin "The answer is $ in computer." print
        42 oct "The answer is $ in octal." print

    Output:

The answer is 42.
The answer is 0x2a in hex.
The answer is 101010 in computer.
The answer is 52 in octal.

    Which is really cool. But I need to get my mess under
    control.

    For one thing, I've been using the registers very
    inconsistantly because I don't have any plan regarding
    where to put stuff.

    This article has helped a ton:

        https://www.swansontec.com/sregisters.html

    I'm not sure I'm going to re-write EVERYTHING to
    follow these guidelines, but new stuff definitely will
    and I'm going to immediately switch the 'quote' word to
    use the esi and edi registers!

    Using esi and edi in the 'quote' word helped make the
    usage much clearer and simplified some of the address
    manipulation considerably (the source and destination
    pointers aren't always moving in sync because escapes
    and placeholders in the source string may expand to a
    different number of characters in the destination
    string.

    Another thing that makes the interpreter so fragile is
    the way I have to juggle registers and the stack around
    to check for a valid numeric literal when an input token
    doesn't match any of the exisitng words.

    So what I'm thinking is that just like quote, I'll test
    for tokens that start with a number and handle them
    _before_ trying to find them in the dictionary.

    Consequently, I won't be able to start any word or
    variable names with a number. (Like Forth allows.)
    In exchange, it should simplify the interpreter a fair
    amount in some messy areas.

        [ ] Break out 'number' into separate word that does
            pre-check like 'quote'.

    This will also give it a nice symetry with the way I'm
    handling string literals (my first big departure from
    pur Forth syntax).

    Three sleepy nights later: Well, it works...but I've now
    exposed another problem.

    This is what the top of my interpreter loop looks like:

        get_next_token:
            CALLWORD eat_spaces ; skip whitespace
            CALLWORD quote      ; handle any string literals
            CALLWORD number     ; handle any number literals
            CALLWORD get_token

    It's nice and neat and readable. The problem is that it
    doesn't work unless you have an optional quote followed
    by an optional number followed by a non-string,
    non-quote token.

    Two strings in a row or a number followed by a string or
    various other combos won't work.

    Here's what needs to happen:

        interpreter loop:
            eat spaces
            if quote
                make string literal
                restart interpreter loop
            if number
                make number literal
                restart interpreter loop
            if token
                find/execute/compile
                restart interpreter loop
            else
                error

    So as much as I liked having 'quote' and 'number' handle
    all aspects of those respective inputs, it's just making
    everying too interdependant. I need to break them up
    into even smaller words and give more control back to
    the interpreter where it belongs.

    So more refactoring awaits me before I can do the fun
    stuff. Ah well, this is where the "slow and steady"
    approach wins - I know I'll eventually push through this
    kinda boring housekeeping stuff.

    Next night: Progress made on that refactor. Then there's
    a segfault. So that's what I'll work on tomorrow night.

    Next night: Looks like the segfault is happening in old
    code (specifically inline, but because of wrong
    calculations in semicolon ";"). Since the old code won't
    just start failing for no reason, this has to be due to
    bad input from all the changes I've made in the outer
    interpreter loop.

    Hard to track down with GDB with breakpoints and watch
    statements. So I added some print debugging and sure
    enough:

COLON here: 0804a290
inlining print
SEMICOLON there: 0804b2e6

    That's the problem. 'colon' is trying to save the
    machine code start address for 'meow' on the stack and
    'semicolon' should be getting that same value, but it's
    not. It's while executing this, by the way:

        : meow "Meow." print ;

    The botched semicolon tail writing is why trying to use
    'meow' in 'meow5' fails - the calculations are wildly
    wrong (it thinks the machine code length is -4144).

    So I need to figure out where I'm leaving something on
    the stack. But how?

    Oh, I've got an idea, That wrong value from the stack
    looks like an address. I'll fire up GDB and see what's at
    that address. Maybe that will help.

(gdb) x 0x0804b2e6
0x804b2e6:	0x776f654d

    Hmmm. Looks like a string?

(gdb) x/s 0x0804b2e6
0x804b2e6:	"Meow."

    Aha! Interesting. It's the address from the string
    literal before the print. What's that doing there? In
    compile mode, 'quote' should be compiling that address
    into the new word's machine code memory, not on the
    stack.

    Oh, right, I knew that would come back to haunt me. At
    some point, my run/compile logic got lost (could be a
    dozen commits ago by now). I'm currently compiling the
    instruction to push the string's address *and* I'm also
    actually pushing the string's address every single time.
    So this has actually been broken for a while and my
    refactoring has (almost?) nothing to do with it.

    This is why I need a set of proper regression tests
    ASAP. Well, I'll get there soon. Gotta be able to take
    input before I can do that!

    Anyway, time to add (back) some conditional jumps in
    'quote'!

The answer is 42.
The answer is 0x2a in hex.
The answer is 101010 in computer.
The answer is 52 in octal.
Meow.Meow.Meow.Meow.Meow.

    Much better.

    And I was reminded that I had been in the middle of
    adding escape sequences to string literals! Let's try
    them out:

        42 "I paid \$$ for beans\\cheese.\nOkay?\n" print

$ mr
I paid $42 for beans\cheese.
Okay?

    Wow, that was super satisfying.

    Okay, I think *now* I'm finally ready to make those
    introspective word-printing words!

    Next night: Uh, so I didn't think far enough ahead. I
    made some really nice string making and printing
    facilities, but I can't use them easily from assembly.
    Nor can I define something as complex as a "dictionary"
    printing word (to use the Forth term for all the defined
    words.)

    Hmmm... this is a quandary.

    Okay, it wasn't so bad. I just needed to write an
    additional word 'printnum' (with macro PRINTNUM_CODE) to
    help it out.

    After that, this is all it takes to write an 'inspect'
    that takes a word tail address and prints the word name
    and bytes of machine code (from the tail metadata):

        DEFWORD inspect
            pop esi ; get tail addr
            lea eax, [esi + T_NAME]
            push eax
            PRINT_CODE 
            PRINTSTR ": "
            mov eax, [esi + T_CODE_LEN]
            push esi ; preserve tail addr
            ; param 1: num to be stringified
            push eax
            PRINTNUM_CODE
            PRINTSTR " bytes "
            NEWLINE_CODE
        ENDWORD inspect, 'inspect', (IMMEDIATE)


    Here it is printing the word 'find':

        "find" find inspect

find: 58 bytes

    Neat!

    Oh, and I guess I can show my new 'printnum' in
    standalone action:

        bin 11101111 hex printnum

ef

    Nice.

  +--------------------------------------------------------+
  |                                                        |
  |  Just so it's clear I'm not trying to trick anybody,   |
  |  the interpreter input is still hard-coded into        |
  |  memory, so that calling line really looks like this   |
  |  in assembly:                                          |
  |                                                        |
  |      db 'bin 11101111 hex printnum '                   |
  |                                                        |
  |  I'm just trying to make the log readable, so I leave  |
  |  off the 'db' and quotes.                              |
  |                                                        |
  |  I'll take input on STDIN very soon (next!).           |
  |                                                        |
  +--------------------------------------------------------+

    And now I can find out how big a "meow" printing
    function is in machine code (minus the "Meow." string
    itself, because that's stored elsewhere. And also the
    combined five meows:

        "meow" find inspect
        "meow5" find inspect

meow: 38 bytes
meow5: 190 bytes

    There we go. So even repeating the entire printing
    routine five times is still under 200 bytes of machine
    code. When's the last time you shed a tear over 200
    bytes?

    Okay, now I just need a loop to call inspect for every
    defined word and I'm all set.

    I'm not sure I want to use the term "dictionary" from
    Forth (or "word", for that matter). Hmm, I guess I can
    avoid the issue by just calling it 'inspect_all'...

    Almost there on inspect_all.  It seems something has
    left the token str addr on the stack and it's getting in
    the way of popping the link addr and ruining everything.
    At least, that's what it looks like.

    Next night: So what I woke up thinking in the morning is
    that if I'm not maintaining my stack correctly between
    words, there are three ways I could debug this:

    1. Painfully, with GDB stepping through...a lot.

    2. Write a DEBUG_STACK macro that non-destructively
       prints the values on the stack from anywhere within
       the assembly.

    3. Write a script (perhaps in awk?) that checks the
       push/pop balance by following all of the called or
       inlined code and incrementing a counter for each push
       and decrementing for each pop.

    Obviously I'm trying to avoid Option 1.

    Option 3 sounds neat, but I'm not sure how to *display*
    that information so that it helps me track down the
    problem - some words push more than they pop and
    vice-versa. It'd still be a lot of manual checking.

    Option 2 seems generally useful anyway, so I'm going to
    give that a shot and hope that peppering it about the
    program will help me figure out what's going on.

    Hmmm...well, printing the stack at the Meow5 language
    level is no problem, but printing it at the assembly
    level is a big problem. I've got a chicken-and-egg
    problem where I need to preserve the registers it uses
    AND preserve the stack.

    I could preserve the registers in a purpose-built bit of
    memory.

    Well, I think I did that right. Here's the whole ugly
    thing:

        section .bss
            ds_eax: resb 4
            ds_ebx: resb 4
            ds_ecx: resb 4
            ds_edx: resb 4
            ds_esi: resb 4
            ds_edi: resb 4
        section .text
        %macro DEBUG_STACK 0
            mov [ds_eax], eax
            mov [ds_ebx], ebx
            mov [ds_ecx], ecx
            mov [ds_edx], edx
            mov [ds_esi], esi
            mov [ds_edi], edi
            PRINTSTR "Stack: "
            PRINTSTACK_CODE
            NEWLINE_CODE
            mov eax, [ds_eax]
            mov ebx, [ds_ebx]
            mov ecx, [ds_ecx]
            mov edx, [ds_edx]
            mov esi, [ds_esi]
            mov edi, [ds_edi]
        %endmacro

    But talk about chicken-and-egg problem. Or is it more
    of a "who watches the watchmen?" thing? Or maybe there's
    an even better expression for it, but my problem is that
    I think I'm acutally calling whatever is doing bad stack
    things WHILE trying to print the stack... :-(

    Okay, it's worse than even that. I don't know how I've
    gotten away with it this long, but I've been using a
    bunch of words mid-assembly to debug stuff...and a lot
    of them aren't safe for that (like 'newline') because
    they don't preserve the registers. 

    So it's possible that by trying to observe what's going
    on for the 'inspect_all' feature, I've been ruining it!

    I have to remember that most of my words are only safe
    if I'm treating them as if they're being called from the
    language - transfering values on the stack and not
    giving a hoot what happens to the registers in between!

    Okay, I think I got it. No idea how this got here, but
    I had an extra pop for no reason I could understand in
    'printnum'! I have no idea how it got there, but that
    would definitely mess things up:

        pop esi ; get preserved <-----??????

    I think the lesson is to test all new words more
    thoroughly before moving on!

    Okay, can we _finally_ see the output of 'inspect_all'?

$ mr
Meow. Meow. Meow. Meow. Meow. 
meow5: 190 bytes IMMEDIATE COMPILE 
meow: 38 bytes IMMEDIATE COMPILE 
inspect_all: 359 bytes IMMEDIATE 
inspect: 340 bytes IMMEDIATE 
ps: 177 bytes IMMEDIATE COMPILE 
printmode: 100 bytes IMMEDIATE COMPILE 
printnum: 116 bytes IMMEDIATE COMPILE 
number: 306 bytes IMMEDIATE COMPILE 
decimal: 10 bytes IMMEDIATE COMPILE 
bin: 10 bytes IMMEDIATE COMPILE 
oct: 10 bytes IMMEDIATE COMPILE 
hex: 10 bytes IMMEDIATE COMPILE 
radix: 6 bytes IMMEDIATE COMPILE 
str2num: 95 bytes IMMEDIATE COMPILE 
quote: 247 bytes IMMEDIATE COMPILE 
num2str: 58 bytes IMMEDIATE COMPILE 
;: 150 bytes COMPILE RUNCOMP 
return: 7 bytes IMMEDIATE 
:: 137 bytes IMMEDIATE 
copystr: 18 bytes IMMEDIATE COMPILE 
get_token: 55 bytes IMMEDIATE 
eat_spaces: 38 bytes IMMEDIATE COMPILE 
find: 58 bytes IMMEDIATE 
is_runcomp: 5 bytes IMMEDIATE COMPILE 
get_flags: 7 bytes IMMEDIATE COMPILE 
inline: 25 bytes IMMEDIATE 
print: 33 bytes IMMEDIATE COMPILE 
newline: 26 bytes IMMEDIATE COMPILE 
strlen: 16 bytes IMMEDIATE COMPILE 
exit: 8 bytes IMMEDIATE COMPILE 
Goodbye.
Exit status: 0

    Yay! At last! Every word in the system with the number
    of bytes of machine code and mode(s).

    And with the help of the shell, we can sort by size.

$ ./meow5 |  awk '
        /^.*: [0-9]+/ {t=t+$2; print $2, $1}
        END{print "Total bytes:", t}
  ' | sort -n

5 is_runcomp:
6 radix:
7 get_flags:
7 return:
8 exit:
10 bin:
10 decimal:
10 hex:
10 oct:
16 strlen:
18 copystr:
25 inline:
26 newline:
33 print:
38 eat_spaces:
38 meow:
55 get_token:
58 find:
58 num2str:
95 str2num:
100 printmode:
116 printnum:
137 ::
150 ;:
177 ps:
190 meow5:
247 quote:
306 number:
340 inspect:
359 inspect_all:

Total bytes: 2655

    (Command line formatted for semi-readability and total
    bytes separated out for clarity.)

    So the smallest word is 'is_runcomp', which tests for
    the RUNCOMP flag in a mode and pushes the result.

    The largest word is not a surprise to me, 'inspect_all',
    which contains 'inspect' and puts a loop around it. In
    turn, 'inspect' includes a bunch of string and number
    printing code.

    So let's see how I did:

        [x] Pretty-print meta info about word!
        [ ] Loop through dictionary, list all names
        [x] Loop through dictionary, pretty-print all
        [x] New word: str2num (ascii to integer)
        [x] New word: num2str (integer to ascii)
        [x] Add "$" placeholders so the 'quote' word so it
            can interpolate numbers from the stack into the
            string.
        [x] Break out 'number' into separate word that does
            pre-check like 'quote'.

    I forgot about a loop that just prints all the word
    names. And 'number' works a little differently now than
    I'd envisioned. But I'm gonna call this part done for
    now and close out this log file.