1 The last update was very exciting. Now I'm actually
2 reading a single word's definition from a string,
3 inlining all of the code into memory, and executing it.
4
5 To put it in concrete terms, this 'meow5' definition:
6
7 "meow meow meow meow meow exit"
8
9 Was turned into this in memory:
10
11 <meow word machine code>
12 <meow word machine code>
13 <meow word machine code>
14 <meow word machine code>
15 <exit word machine code>
16
17 The 'exit' word even pops the exit status code from the
18 stack. Between that and all of the meowing, we're
19 getting extremely "conCATenative" here. Sorry.
20
21 So I need to figure out what step comes next. I need to:
22
23 1. Get user input from STDIN
24 2. Figure out how immediate mode will work
25 (currently, i start in compile mode and when
26 that's done, I execute whatever was compiled!)
27 3. Create the colon ':' and semicolon ';' words to
28 toggle compile mode (and create word definitions!)
29
30 I would also like to have introspection and diagnostics
31 and visualizations as early in this project as possible!
32 But for now, I'm gonna stay the course towards an
33 absolutely minimumal proof of concept. I want to be able
34 to type this:
35
36 : meow5 meow meow meow meow meow exit ;
37 meow5
38
39 And see (something like) this:
40
41 Meow.
42 Meow.
43 Meow.
44 Meow.
45 Meow.
46 BYE!
47 $
48
49 So how about #2 and/or #3 from the list above - how
50 simple can the colon command be?
51
52 So I've updated the input string:
53
54 db 'meow : meow5 meow meow meow meow meow ;
55 meow5 exit', 0
56
57 (ignore the newline)
58 Which reads as:
59
60 1. call meow right now in "immediate" mode
61 2. : switches to compile mode and
62 3. store "meow5" as name
63 4. inline 5 meow words
64 5. ; writes tail (including saved name) and
65 6. switches back to immediate mode
66 7. call new meow5 word
67 8. exit
68
69 and have created a mode var and added imm/comp flags to
70 tails. todo:
71
72 [ ] colon word store name somewhere
73 [ ] find should also match mode flag (use &)
74 [ ] semicolon should write tail
75 [ ] immediate mode should find and exec words...somehow
76
77 Next two nights: Hmm...okay, so adding more words that
78 will execute as they're entered ("immediate" words) is
79 forcing me to deal with how they should return execution
80 to whatever called them.
81
82 To recap:
83
84 * Compiled code in meow5 will be concatenated
85 together, so there is no such thing as "return"
86 _within_ a compiled word - execution truly just
87 flows from the end of one word to the beginning of
88 the next.
89
90 * Many words (':' or 'colon' is an example), which
91 must be able to operate outside of a compiled word
92 because it is needed to do the compiling!
93
94 * Some words can execute _both_ ways in a single
95 definition. 'exit' is my only example currently -
96 it's simple because no part of the program needs
97 to execute after it's done, of course.
98
99 * A select few words will even need to be executed
100 from within the meow5 binary itself (in assembly)
101 to make the initial functionality of the
102 interpreter available. 'find' and 'inline' are two
103 such fundamental words.
104
105 * I've slowly been converting all of the traditional
106 procedure calls in this prototype into simple
107 jumps and manually keeping track of a single level
108 of return address.
109
110 Now the ':' command forces me to implement a return
111 stack for immediate execution, at the very least,
112 because it will need to call, for instance, 'get_token',
113 to get the name of the word being defined:
114
115 : meow 5 ...;
116
117 Here 'meow5' is the name of the new word.
118
119 Anyway, after sleeping on it, I think I'll solve this by
120 having macros to start and end a word in assembly. In
121 addition to taking care of the housekeeping duties of
122 creating the tail metadata, they'll also setup return
123 jumping and stack poppin'. The length of the word in the
124 tail will NOT include the return stuff so it won't be
125 included when the word is inlined.
126
127 Anyway, it makes sense in my head.
128
129 The basic word-making macros are easy enough:
130
131 %macro DEFWORD 1 ; takes name of word to make
132 %1:
133 %endmacro
134
135 %macro ENDWORD 3
136 end_%1:
137 ; todo: immediate "return" goes here
138 tail_%1:
139 dd LAST_WORD_TAIL ; linked list
140 %define LAST_WORD_TAIL tail_%1
141 dd (tail_%1 - %1) ; length of word
142 dd %3 ; flags
143 db %2, 0 ; name as string
144 %endmacro
145
146 I tested this and I'll spare you the GDB walkthrough. It
147 works and I was able to execute this word from my input
148 string.
149
150 DEFWORD foo
151 mov eax, 42
152 ENDWORD foo, "foo", IMMEDIATE
153
154 So I'll test a call/return action with this foo, then
155 convert them all.
156
157 It worked. Now converting...
158
159 Worked out some bugs.
160
161 Silly little mistakes.
162
163 Here's the thing: it's getting pretty annoying to have
164 to bust out GDB, guess where to set a break point, step
165 through the code, try to remember the C-dominated syntax
166 to print stuff, etc., only to find out that I forgot to
167 add a line or I put the wrong thing in a string data
168 declaration.
169
170 Don't get me wrong, I'm grateful for GDB. It's been a
171 good tool and I know I should probably re-learn some of
172 its customization options.
173
174 But what I really want is better debugging in my program
175 itself.
176
177 So I've added "word not found" handling in the main
178 routine, so it goes like this:
179
180 get_next_token:
181 CALLWORD get_token
182 if all done, jump to .run_it
183 CALLWORD find
184 if not found, jump to .token_not_found
185 CALLWORD inline
186 jmp get_next_token
187
188 .run_it:
189 jmp data_segment
190
191 .token_not_found:
192 print first part of error message
193 print token name
194 print last part of error message
195
196 I'll test it out:
197
198 input_buffer_start:
199 db 'honk meow meow meow meow meow exit', 0
200
201 $ mr
202 Could not find word "honk"
203
204 Excellent, that'll save me untold minutes of debugging
205 right there.
206
207 Now let's see if I've converted everthing to my new
208 macros DEFWORD ... ENDWORD properly:
209
210 $ mr
211 Meow!
212 Meow!
213 Meow!
214 Meow!
215 Meow!
216 Meow!
217 Meow!
218 ...
219
220 Oh no! I've got an infinite loop somehow.
221
222 Even though I'm putting in some of the "infrastructure"
223 for it, I'm not doing any immediate mode execution yet,
224 so it's nothing like that.
225
226 Nothing for it but to debug with GDB...
227
228 (gdb) break get_next_token.run_it
229 Breakpoint 1 at 0x80491c2: file meow5.asm, line 272.
230 ...
231 273 jmp data_segment ; jump to the "compiled" program
232 0x0804a054 in data_segment ()
233 (gdb)
234 Single stepping until exit from function data_segment,
235 which has no line number information.
236
237 Oh, right. There's no debugger info for the machine code
238 I've inlined into memory and executed.
239
240 All the more reason to have debugging tools built into
241 my program itself. But I don't have those yet, so at
242 least GDB can give me a disassembly:
243
244 (gdb) disas &data_segment,&here
245 Dump of assembler code from 0x804a054 to 0x804a454:
246 0x0804a054 <data_segment+0>: mov $0x1,%ebx
247 => 0x0804a059: mov $0x804a006,%ecx
248 0x0804a05e: mov $0x6,%edx
249 0x0804a063: mov $0x4,%eax
250 0x0804a068: int $0x80
251 0x0804a06a: jmp *0x804a459
252 0x0804a070: mov $0x1,%ebx
253 0x0804a075: mov $0x804a006,%ecx
254 0x0804a07a: mov $0x6,%edx
255 0x0804a07f: mov $0x4,%eax
256 0x0804a084: int $0x80
257 0x0804a086: jmp *0x804a459
258
259 ... repeats three more times...
260
261 0x0804a0e0: pop %ebx
262 0x0804a0e1: mov $0x1,%eax
263 0x0804a0e6: int $0x80
264 0x0804a0e8: jmp *0x804a459
265 0x0804a0ee: add %al,(%eax)
266 0x0804a0f0: add %al,(%eax)
267
268 So the nice thing about 5 "meows" in a row is that the
269 repetition is really easy to spot.
270
271 The weird thing is that they all end with a jump back to
272 the exact same place near the beginning (but not exactly
273 at the begining) of the inlined code.
274
275 Where is that jump coming from?
276
277 Oh, ha ha, I found it almost immediately. It's the
278 "return" that I put in my ENDWORD macro. That's not
279 supposed to be inlined with the "compiled" version of
280 words and it's due to a silly mistake.
281
282 The last line here:
283
284 end_%1:
285 jmp [return_addr]
286 tail_%1:
287 dd LAST_WORD_TAIL
288 dd (tail_%1 - %1)
289
290 Should be:
291
292 dd (end_%1 - %1)
293
294 So the jmp [return_addr] doesn't get inlined!
295
296 I'll fix that.
297
298 And now?
299
300 (gdb) disas &data_segment,&here
301 Dump of assembler code from 0x804a054 to 0x804a454:
302 0x0804a054 <data_segment+0>: push %es
303 0x0804a055: mov 0x6ba0804,%al
304 0x0804a05a: add %al,(%eax)
305 0x0804a05c: add %bh,0x4(%eax)
306 0x0804a062: int $0x80
307 0x0804a064: jmp *0x804a459
308 0x0804a06a: push %es
309 0x0804a06b: mov 0x6ba0804,%al
310 0x0804a070: add %al,(%eax)
311 0x0804a072: add %bh,0x4(%eax)
312 0x0804a078: int $0x80
313 0x0804a07a: jmp *0x804a459
314 0x0804a080: push %es
315 ...
316
317 What on earth? That ain't right.
318
319 Next night: ohhhh...crud. Yeah, the problem is due to
320 the "return" code at the end of each word. My
321 dirt-simple inline is going to need an additional
322 length: there's a distance from the tail to the
323 beginning of the machine code and a separate length of
324 the machine code. (They used to be the same thing.)
325
326 The DEFWORD macro produces this for "meow":
327
328 meow:
329 ...
330 end_meow:
331 jmp [return_addr]
332 tail_meow:
333 ...
334 dd (end_meow - meow)
335 dd (tail_meow - meow) <-- need to add this
336
337 And any other code that reads the tail (I guess that's
338 just 'find' right now?) will also need to be updated. I
339 wonder if I should be storing these "tail offsets" in
340 NASM macros as constants so I don't have to hunt them
341 down if they change in the future?
342
343 Yeah, I'll do that too. In addition to making changes
344 painless, it will make my intent clearer in the code
345 than bare offset numbers and a comment ever could.
346
347 ; Memory offsets for each item in tail:
348 %define T_CODE_LEN 4
349 %define T_CODE_OFFSET 8
350 %define T_FLAGS 12
351 %define T_NAME 16
352
353 Inline is re-worked to use the length and offset of the
354 machine code in relation to the tail address:
355
356 DEFWORD inline
357 pop esi ; param1: tail of word to inline
358 mov edi, [here] ; destination
359 mov eax, [esi + T_CODE_LEN] ; get len of code
360 mov ebx, [esi + T_CODE_OFFSET] ; get start of code
361 sub esi, ebx ; set start of code for movsb
362 mov ecx, eax ; set len of code for movsb
363 rep movsb ; copy [esi]...[esi+ecx] into [edi]
364 add [here], eax ; save current position
365 ENDWORD inline, "inline", (IMMEDIATE)
366
367 Crossing fingers...
368
369 $ mr
370 Meow.
371 Meow.
372 Meow.
373 Meow.
374 Meow.
375
376 Yay, working again!
377
378 Now I can try to do something _new_ with these changes:
379 find immediate mode and compile mode words.
380
381 And to _really_ do this right, I'll use the FORTH colon
382 word ':' as my immediate/compile mode separator.
383
384 Here's my new "input buffer" string:
385
386 db 'meow meow : meow meow meow exit', 0
387
388 For now the definition of ':' will _just_ set the mode:
389
390 DEFWORD colon
391 mov dword [mode], COMPILE
392 ENDWORD colon, ":", (IMMEDIATE)
393
394 And I've got two different definitions of 'meow' all
395 ready to go. They're both called "meow" in the
396 dictionary, but one of them has an IMMEDIATE flag and
397 the other has the COMPILE flag to specify which mode
398 they should match. The only difference is that they
399 print different strings.
400
401 If all goes well, the "input buffer" string I set above
402 should print two immediate meows and then compile three
403 compile meows and an exit and then run that...
404
405 $ mr
406 Immediate Meow!
407 Immediate Meow!
408 Meow.
409 Meow.
410 Meow.
411
412 Wow!
413
414 So I guess I've done two of the four TODOs I set at the
415 start of this log above:
416
417 [ ] colon word store name somewhere
418 [x] find should also match mode flag (use &)
419 [ ] semicolon should write tail
420 [x] immediate mode should find and exec words...somehow
421
422 The colon word isn't storing the word name and there's
423 no semicolon yet, so I'm not adding the new words to the
424 dictionary yet, but I also made progress in other areas.
425
426 I'll start a new log now with the other two TODOs.
427
428 See you in log05.txt!