1 Howdy! So I've got immediate and compile modes basically
2 working. Now I've got two TODOs carrying over from the
3 last log and I'll add some new ones. I'm gonna put this
4 in the order I think they should probably be done, too:
5
6 [ ] turn return_addr into a stack so i can call
7 words from words
8 [ ] make a "print" word and use it
9 [ ] colon word store name somewhere
10 [ ] semicolon should write tail
11
12 From there, I think the roadmap will be fairly clear and
13 largely driven by whatever I think will be useful or fun
14 to grow the language.
15
16 * Print dictionary/word tails
17 * How to handle variables and strings
18 * Interactive REPL
19 * ANSI colors
20 * Print current stack (automatically in REPL?)
21 * Write out any word as an ELF executable?
22 * Other debugging features?
23 * Some simple benchmarks - size vs speed
24
25 This whole thing is just an experiment in simplicity by
26 taking a concept to extremes (in a good way) so I want
27 to keep it tiny and comprehensible. Too many features
28 will detract from that, so I'll need to pick them
29 carefully.
30
31 Okay, I'll start with turning the single return address
32 I'm currently using for immediate word calls into a
33 proper return stack. Here's the BSS reservations:
34
35 return_stack: resb 512
36 return_ptr: resb 4
37
38 And I've modified the CALLWORD and ENDWORD macros to
39 push and pop the return stack appropriately.
40
41 I think it's worth noting that I'm extremely not
42 concerned about performance in immediate mode because
43 I'm expecting that to be pretty much always be invoked
44 by the user. The compiled code doesn't use this return
45 stack because nothing ever gets called - it'll just be a
46 continuous flow of instructions!
47
48 Anyway, let's see what I've got when I try the new
49 return stack:
50
51 Program received signal SIGSEGV, Segmentation fault.
52 get_next_token () at meow5.asm:73
53 73 mov dword [eax], %%return_to ; CALLWORD
54
55 Oh, wait! I forgot to initialize the return_ptr so it
56 starts off pointing at the start of return_stack! So of
57 course it's gonna segfault - I'm trying to mov to
58 address 0!
59
60 Program received signal SIGSEGV, Segmentation fault.
61 0x0804a689 in return_ptr ()
62
63 Oops! And that one is because I was trying to jump to
64 the address in return_ptr, not to where it _pointed to_!
65
66 Program received signal SIGSEGV, Segmentation fault.
67 0x0804b000 in ?? ()
68
69 Yuck, that one will be in the compiled code.
70
71 I think I'll call it a night.
72
73 Next night: Nope! It was in immediate word execution and
74 it was pretty silly. I'm doing a bit of a hack to
75 perform immediate word execution - passing a register to
76 the CALLWORD macro instead of an address literal. Turns
77 out, the register I was using for this was eax. And,
78 naturally, that's the register I used to implement the
79 new call stack stuff in CALLWORD!
80
81 Changing this:
82
83 CALLWORD eax
84
85 To this:
86
87 CALLWORD ebx
88
89 And let's see if we can at least do what we could do
90 before I made my changes:
91
92 $ mr
93 Immediate Meow!
94 Immediate Meow!
95 Meow.
96 Meow.
97 Meow.
98
99 Nice. So then I can move on to new functionality. Let's
100 try a nested call. I'll make some simple test words:
101
102 DEFWORD foo
103 CALLWORD bar
104 push 0 ; for exit
105 CALLWORD exit
106 ENDWORD foo, "foo", (IMMEDIATE)
107
108 DEFWORD bar
109 CALLWORD meow
110 ENDWORD bar, "bar", (IMMEDIATE)
111
112 And calling them with this input string:
113
114 db 'meow bar foo', 0
115
116 Should result in three immediate meows and exit:
117
118 $ mr
119 Immediate Meow!
120 Immediate Meow!
121 Immediate Meow!
122
123 Wow, nice. So I can allow words to call other words now,
124 which will let me clean up some of the most redundant
125 bits of code (like string printing).
126
127 I think next will be a 'print' word that prints
128 null-terminated strings.
129
130 I'll have 'print' call a 'strlen' word to calculate a
131 null-terminated string to get its length as well.
132
133 Since this is the first time I'll have a real word that
134 calls another word in Meow5, it occurs to me that this
135 version of 'print' can *only* be an immediate mode word
136 because my compiler won't be able to inline the 'strlen'
137 machine code inside of print. Thus, only words composed
138 entirely out of other words OR completely self-contained
139 can be COMPILE mode words???
140
141 So at some point, I'll need to re-define 'print' by
142 "compiling" it properly in the Meow5 fashion (inlining
143 'strlen') so it can be compiled into other words!
144
145 Ah, jeez, and then I've got a whole chicken-and-egg
146 problem where calling a "compiled" word needs to happen
147 after it's been compiled - so I can't use it until it's
148 been compiled. Which means I've got to be real careful
149 about the "bootstrapping" order of my definitions.
150
151 Well, that's what this whole wacky experiment is all
152 about and by keeping it small, I can re-write as needed.
153
154 MOVING ON, let's get this immediate mode-only print
155 working!!!
156
157 Okay, got it working. Here's 'strlen' and 'print'
158 stripped of comments to show how simple they are and how
159 nice and neat my token_not_found printing has become:
160
161 DEFWORD strlen
162 mov eax, [esp]
163 mov ecx, 0
164 .find_null:
165 cmp byte [eax + ecx], 0
166 je .done
167 inc ecx
168 jmp .find_null
169 .done:
170 push ecx
171 ENDWORD strlen, "strlen", (IMMEDIATE & COMPILE)
172
173 DEFWORD print
174 CALLWORD strlen
175 mov ebx, STDOUT
176 pop edx
177 pop ecx
178 mov eax, SYS_WRITE
179 int 0x80
180 ENDWORD print, "print", (IMMEDIATE)
181
182 ...
183
184 .token_not_found:
185 push not_found_str1
186 CALLWORD print
187 push token_buffer
188 CALLWORD print
189 push not_found_str2
190 CALLWORD print
191 CALLWORD exit
192
193 Having a print word to call reduced token_not_found from
194 over 20 lines of repetitious code to this.
195
196 But for the reasons I explain above, I can't use 'print'
197 in my compiled 'meow' word or any other compiled words.
198 Which left me in a bit of a quandry.
199
200 How do I "inline" the 'strlen' code into 'print' in the
201 meow5 interpreter when it's being assembled (or when it
202 starts up).
203
204 I _could_ define all compound words as strings and
205 actually run the interpreter on them at startup...
206
207 db ": print strlen print_with_len ;"
208
209 ...but I feel like that's too inefficient even for this
210 toy language.
211
212 Or I could limit myself to a CALLWORD depth of one.
213 Which would just require that I always call 'strlen'
214 before print:
215
216 .token_not_found:
217 push not_found_str1
218 CALLWORD strlen
219 CALLWORD print
220 push token_buffer
221 CALLWORD strlen
222 CALLWORD print
223 push not_found_str2
224 CALLWORD strlen
225 CALLWORD print
226 CALLWORD exit
227
228 But yuck. And I feel like I'm likely to regret having
229 that limitation later when I'm adding more advanced
230 functionality?
231
232 So how do I inline in NASM itself?
233
234 Well, I've had all day to think about it and the answer
235 was kind of staring me right in the face: macros!
236
237 New TODO:
238
239 [ ] Create word code macros so words can be inlined
240 while in assembly.
241
242 I can put the 'strlen' code in a macro and then use that
243 to define a 'strlen' word _and_ embed it directly into
244 'print' as if I'd typed it.
245
246 Then I did the same thing with 'print' itself (I wonder
247 if there's a limit to the depth of these macros
248 containing macros in NASM?) and then I could use it in
249 the 'meow' words also.
250
251 With the actual strlen and print code removed, here's
252 what all four words look like now:
253
254 %macro strlen_code 0
255 <code>
256 %endmacro
257 DEFWORD strlen
258 strlen_code
259 ENDWORD strlen, "strlen", (IMMEDIATE & COMPILE)
260
261 %macro print_code 0
262 strlen_code
263 <code>
264 %endmacro
265 DEFWORD print ; (straddr) print ()
266 print_code
267 ENDWORD print, "print", (IMMEDIATE & COMPILE)
268
269 DEFWORD imm_meow
270 push imm_meow_str
271 print_code
272 ENDWORD imm_meow, "meow", (IMMEDIATE)
273
274 DEFWORD meow
275 push meow_str
276 print_code
277 ENDWORD meow, "meow", (COMPILE)
278
279 And does it still work?
280
281 db 'meow : meow meow meow exit', 0
282
283 Should print one immediate meow, three compiled meows
284 and an exit:
285
286 $ mr
287 Immediate Meow!
288 Meow.
289 Meow.
290 Meow.
291
292 You bet it does.
293
294 Using macros this way to inline code solves a lot of
295 problems. I'm even wondering if I can remove the
296 CALLWORD macros and the return aparatus, etc?
297
298 Or wait, I still need to return when calling these in
299 immediate mode in the interpreter, right?
300
301 For now, I want to get the colon and semicolon words
302 working so I can make the _real_ meow5 word as fortold
303 by the prophets.
304
305 Many nights later: I got sick and had to skip some
306 evenings. Compound that with the fact that cascade of
307 changes needed to let semicolon (';') compile the tail
308 of a new word and I've got a bit of a challenge getting
309 these notes in order.
310
311 At the moment, I've done the following changes:
312
313 1. The tiny three-instruction 'return' code that gets
314 appended to the end of all word definitions (so they
315 can be called directly as functions) has been turned
316 into a macro so it can be included in the ENDWORD
317 macro and as a word of its own so it can be inlined
318 by semicolon.
319
320 2. The 'inline' word's machine code is now also in a
321 macro so it can be included elsewhere (such as
322 semicolon). So yeah, 'inline' gets inlined!
323
324 3. Same with 'get_token' which is now needed by colon
325 (':') to save the name of a new word definition:
326
327 : meow5 meow meow meow meow meow ;
328
329 Semicolon compiles the token "meow5" into the new
330 word's tail so 'find' can find it in the dictionary.
331
332 4. For now, I have 'copystr' including 'strlen' (I use
333 null-terminated strings), but the usage in tail
334 makes me think these should be separated to make
335 them more generalized. More about that in a moment.
336
337 (NOTE: No, copystr doesn't actually contain 'strlen',
338 I was wrong about that. 'print' is the only word that
339 contains 'strlen'.)
340
341 5. 'print' is used in a number of places, so that's
342 been turned into a macro as well. Actually, it looks
343 like I already noted that would be happening above.
344 But I hadn't yet settled on ALL CAPS for the code
345 macros. Here's the entire 'print' definition right
346 now:
347
348 ; Prints a null-terminated string by address on stack.
349 %macro PRINT_CODE 0
350 STRLEN_CODE ; (after: straddr, len)
351 mov ebx, STDOUT ; write destination file
352 pop edx ; strlen
353 pop ecx ; start address
354 mov eax, SYS_WRITE ; syscall
355 int 0x80 ; interrupt to linux!
356 %endmacro
357 DEFWORD print ; (straddr) print ()
358 PRINT_CODE
359 ENDWORD print, "print", (IMMEDIATE | COMPILE)
360
361 6. 'semicolon' is coming along nicely, but I was having
362 trouble testing it until I realized I had completely
363 forgotten to implement my new word flag: RUNCOMP.
364
365 The RUNCOMP flag is my way of saying "this word runs
366 immediately, even in COMPILE mode."
367
368 It's exactly the same as Forth's "immediate" words,
369 but I think my naming makes the meaning a little more
370 explicit? At least, that's the intention.
371
372 7. I've gotten much more comfortable using the stack to
373 store local values rather than trying to juggle
374 registers or use global variables. My first real
375 stack workout has been in 'semicolon', where I have
376 ended up using it just a tiny amount, but I've
377 already started to have minor revelations about how
378 the stack helps me manage memory, but nothing I can
379 put into words very well just yet.
380
381 So given that update, my current TODOs to get
382 "completely 100% legit word compilation from a string of
383 input" working are in this order:
384
385 [ ] Create new word: 'get_flags' will put 32 bits of
386 flags from a word's tail on the stack.
387 [ ] Create new word: 'is_runcomp' will leave 0 on
388 the stack if RUNCOMP flag is NOT set in flags.
389 [ ] Get 'semicolon' just working for now, which
390 currently means having two duplicate calls to
391 'strlen', which is annoying, but if I change too
392 much more, I'm going to have an impossible time
393 debugging all of these changes at once!
394 [ ] AFTER semicolon works, pull 'strlen' out of
395 'copystr' and maaayyybeee out of 'print' too,
396 though I'm on the fence about that one.
397 [ ] And beyond that, I'm thinking 'copystr' may not
398 even need to exist, because it's a specialized
399 case of copying memory...and I've already got an
400 instance of that in 'inline' - so maybe the two of
401 them should share that functionality in something
402 like a 'copymem' word?
403
404 (NOTE: After getting semicolon working, I realized that
405 I was being super premature about trying to factor
406 'strlen' out of 'copystr' and avoiding looping through
407 the word name string twice in semicolon. I mean, the
408 word name is only 32 bytes MAXIMUM right now. Also,
409 copystr doesn't even contain strlen - I had remembered
410 that incorrectly. So scratch those last two TODOs!)
411
412 Okay, now I've written the two new words and am calling
413 them to check the word:
414
415 ; We're in compile mode...
416 CALLWORD get_flags
417 CALLWORD is_runcomp
418 pop eax ; get result
419 cmp eax, 0 ; if NOT equal, word was RUNCOMP
420 jne .exec_word ; yup, RUNCOMP
421 CALLWORD inline ; nope, "compile" it.
422
423 But I've got something wrong, because obviously this
424 shouldn't be happening:
425
426 $ mr
427 Immediate Meow!
428 Could not find word "meow5"
429
430 I could certainly figure this out with GDB, but that's
431 getting increasingly painful now that the interpreter is
432 becoming more complex.
433
434 So I think it's high time for more "print debugging" in
435 the interpreter itself.
436
437 First off, I've had a TODO to add the current mode to my
438 word not found error message. So I'll start with that:
439
440 $ mr
441 Immediate Meow!
442 Could not find word "meow5" while looking in COMPILE mode.
443
444 Oh! I wasn't expecting that. We shouldn't be in COMPILE
445 mode after semicolon (;).
446
447 Breakpoint 1, semicolon () at meow5.asm:331
448 331 mov eax, [here]
449 332 push eax ; push end of machine code to stack
450 push () at meow5.asm:73
451 73 mov eax, [return_ptr] ; current return stack pos
452 74 sub dword [return_ptr], 4 ; "pop" return stack
453 75 jmp [eax] ; go to return addr!
454
455 Huh? It's calling 'return' right at the beginning of
456 semicolon? What is...oh, jeez, I had RETURN_CODE of
457 just "return" - I wanted the label as the address to
458 inline, but I accidentally included the return function.
459 Okay, that was just a dumb mistake made by a sleepy
460 person.
461
462 Wrong:
463
464 push RETURN_CODE ; push what to inline on stack
465
466 Fixed:
467
468 push return ; push what to inline on stack
469
470 Moving on, now I have no doubt I'll be stepping through
471 semicolon for a while. Even though I worked it out on
472 paper, there are so many opportunities for mistakes.
473
474 Here we go...
475
476 Program received signal SIGSEGV, Segmentation fault.
477 semicolon () at meow5.asm:127
478 127 rep movsb ; copy [esi]...[esi+ecx] into [edi]
479
480 That looks like 'inline' - so that'll be my use of the
481 INLINE_CODE macro in semicolon. Let's see...
482
483 (gdb) break semicolon
484 Breakpoint 1 at 0x8049397: file meow5.asm, line 331.
485 Starting program: /home/dave/meow5/meow5
486 Immediate Meow!
487 Breakpoint 1, semicolon () at meow5.asm:331
488 331 mov eax, [here]
489 332 push eax ; push end of machine code to stack
490 semicolon () at meow5.asm:339
491 339 push return ; push what to inline on stack
492 semicolon () at meow5.asm:121
493 121 pop esi ; param1: tail of word to inline
494 semicolon () at meow5.asm:122
495 122 mov edi, [here] ; destination
496
497 We're in INLINE_CODE now. Let's see what we're currently
498 pointing to:
499 123 mov eax, [esi + T_CODE_LEN] ; get len of code
500 (gdb) p $eax
501 $1 = -668105976
502
503 Uh...that's definitely not the correct length for the
504 'return' word's machine code...
505
506 Oh, another bone-headed mistake. Perhaps you saw it
507 above when I tried to fix it and were screaming at me?
508 Sorry, I couldn't hear you.
509
510 Still wrong:
511
512 push return ; push what to inline on stack
513
514 Fixed for real this time:
515
516 push tail_return ; push what to inline on stack
517
518 Yeah, I wasn't thinking straight. The label "return"
519 points to the beginning of the word's machine code. The
520 label "tail_return" points to the tail...which is what
521 the 'inline' word is expecting!
522
523 Alright, with that fixed:
524
525 $ mr
526 Immediate Meow!
527 Could not find word "meow5" while looking in IMMEDIATE mode.
528
529 Yeah, still broken, but in a much better way. I can see
530 that semicolon has at least returned us to IMMEDIATE
531 mode.
532
533 Probably just a simple bug in semicolon compiling the
534 new word's tail...
535
536 ...yup! Just an ebx where I should have had an eax. The
537 last word address wasn't being updated for the
538 dictionary linked list.
539
540 Crossing fingers...
541
542 $ mr
543 Immediate Meow!
544 Meow.
545 Meow.
546 Meow.
547
548 Gasp! It works! In some ways, this is 100% of the
549 original project goal. Oh, wait, almost. That's just 3
550 meows. Ha ha. One second...
551
552 input_buffer:
553 db ': meow5 meow meow meow meow meow ; '
554 db 'meow5 '
555 db 'newline '
556 db 'exit',0
557
558 And drum-roll please...
559
560 $ mr
561 Meow. Meow. Meow. Meow. Meow.
562
563 Yahoo!!!
564
565 That completes the TODO list I'd built up:
566
567
568
569 [x] Turn return_addr into a stack so i can call words
570 from words (oops! turns out that doesn't make
571 sense when words are inlined...)
572 [x] make a "print" word and use it
573 [x] colon word store name somewhere
574 [x] semicolon should write tail
575 [x] Create word code macros so words can be inlined
576 while in assembly.
577 [x] Create new word: 'get_flags' will put 32 bits of
578 flags from a word's tail on the stack.
579 [x] Create new word: 'is_runcomp' will leave 0 on
580 the stack if RUNCOMP flag is NOT set in flags.
581 [x] Get 'semicolon' working.