1 So log01.txt concluded with a nice little demonstration
2 of programatically inlining machine code at runtime to
3 "compile" a program and run it.
4
5 The next step is to start to turn this into an actual
6 language by creating headers for words (I've decided
7 I'll use the Forth term "word" to refer to the functions
8 we create in this language).
9
10 [x] Look up word length from header so it doesn't
11 have to be manually created and sent to the
12 inline function.
13
14 [x] Look up word by stored ASCII name in header at
15 runtime. That'll be exciting. I'll practically
16 have a programming language at that point.
17
18 I think I'll use a linked list of words like many
19 traditional Forths, since that's what I learned how to
20 implement in my JONESFORTH port, nasmjf.
21
22 Note: I added design-notes.txt to this repo because I
23 have been having some on-going thoughts about how
24 to implement this program as a whole, but they're
25 not things I can act upon right away and I don't
26 want to have to come back here searching in these
27 logs to find them (or worse, forget about them
28 entirely!)
29
30 Okay, now I've got #1 from above list working. Instead
31 of a "header", I've got "tails" at the end of my words.
32 Ha ha, cats have tails. So this just keeps getting
33 better.
34
35 I did it that way because then it becomes trivial to get
36 the length of the machine code. Here's the definition of
37 the exit word now, with its tail:
38
39 exit:
40 mov ebx, 0 ; exit with happy 0
41 mov eax, SYS_EXIT
42 int 0x80
43 exit_tail:
44 dd 0 ; null link is end of linked list
45 dd (exit_tail - exit) ; len of machine code
46 db "exit", 0 ; name, null-terminated
47
48 So now I don't have to give the length of the word's
49 machine code to inline anymore, just the tail address.
50 inline gets the stored length and does all the rest!
51
52 Here's the new inline:
53
54 ; inline function!
55 ; input: esi - tail of the word to inline
56 inline:
57 mov edi, [here] ; destination
58 mov ecx, [esi + 4] ; get len into ecx
59 sub esi, ecx ; sub len from esi (start of code)
60 rep movsb ; movsb copies from esi to esi+ecx into edi
61 add edi, ecx ; update here pointer...
62 mov [here], edi ; ...and store it
63 ret
64
65 Still not too complicated. And I think this might even
66 be its final form?
67
68 Let's see if this works...
69
70 Program received signal SIGSEGV, Segmentation fault.
71 inline () at meow5.asm:67
72 67 rep movsb
73
74 Darn it.
75
76 Oh, wait! It was inlining the meows just fine, it was
77 doing exit that failed. I simply hadn't updated it to
78 point to the tail yet. Simple mistake:
79
80 ; inline exit
81 mov esi, exit <---- oops!
82 call inline
83
84 needs to be:
85
86 ; inline exit
87 mov esi, exit_tail
88 call inline
89
90 How about now...
91
92 $ mrun
93 Meow.
94 Meow.
95 Meow.
96 Meow.
97 Meow.
98
99 Awesome! Guess I can start making it find words by ASCII
100 name in the tails, searching by linked list. Very
101 exciting progress tonight!
102
103 I've got two more todos:
104
105 [x] Add tails to anything that should be a word
106
107 [ ] Make all words take params from the stack, not
108 from pre-defined registers. Yes, we're losing
109 some speed by going to main memory, but I have
110 a feeling the stack is surely in CPU cache most
111 of the time? I should look that up someday...
112
113 So I'm going to call my word that looks up other words
114 by string name by searching through a linked list of
115 words 'find', just like in Forth. (Well, except there
116 it's FIND, of course.)
117
118 Two nights later: I've written the 'find' word and added
119 tails to all of my words so far. But I've got a
120 segfault:
121
122 dave@cygnus~/meow5$ mr
123 ./build.sh: line 33: 1966 Segmentation fault ./$F
124
125 So it's GDB time:
126
127 dave@cygnus~/meow5$ mb
128 Reading symbols from meow5...
129 ...
130 143 push temp_meow_name ; the name string to find
131 144 call find ; answer will be in eax
132 81 pop eax ; first param from stack!
133 84 mov ecx, [last]
134 86 cmp ecx, 0 ; a null pointer (0) is end of list
135 87 je .not_found
136 93 lea edx, [ecx + 8] ; set dictionary name pointer
137 94 mov ebx, eax ; (re)set name to find pointer
138
139 Okay, so here's where I'm comparing the search string to
140 be found against the first (well, last) word's name in
141 the linked list ("dictionary"). So let's see if I got
142 the name from the dictionary entry's "tail" correctly.
143
144 Oh, and here's my comment block from 'find' explaining
145 the register use:
146
147 ; input:
148 ; stack -> eax
149 ; register use:
150 ; eax - start of null-terminated name to find
151 ; ebx - name to find byte pointer
152 ; ecx - dictionary list pointer
153 ; edx - dictionary name byte pointer
154
155 The first thing in the tail should be a link to the next
156 word in the dictionary. The ecx register should have that
157 link:
158
159 (gdb) x/a $ecx
160 0x804908d <find_tail>: 0x8049052 <inline_tail>
161
162 Yup! That's right. The next word is 'inline'.
163
164 The next thing is the length of the word's machine code:
165
166 (gdb) x/dw $ecx+4
167 0x8049091: 39
168
169 39 bytes seems reasonable. Okay, the next thing should be
170 the null-terminated string of the word name:
171
172 (gdb) x/s $ecx+8
173 0x8049095: "find"
174
175 Yes!
176
177 And have I correctly pointed to the first byte of this
178 string in the edx register?
179
180 (gdb) x/s $edx
181 0x8049095: "find"
182
183 Wow, also yes!
184
185 Okay, so the next thing to confirm is that I have the
186 address of the string to match in register eax:
187
188 (gdb) x/a $eax
189 0x80490c1 <inline_a_meow+10>: 0x74e8308b
190
191 Oops! That's not right. That's an address somewhere in my
192 loop that inlines meow five times...
193
194 I see it now!
195
196 143 push temp_meow_name ; the name string to find
197 144 call find ; answer will be in eax
198 81 pop eax ; first param from stack!
199
200 I forgot that 'call' will push the return address onto
201 the stack. Which is why I can't just pop my parameter.
202
203 I need to use the stack pointer and an offset to get the
204 value...
205
206 I use arrays as stacks all the time in higher level
207 languages, so a PUSH and POP are second nature to me.
208 But I must confess that in an assembly language context,
209 I get super confused by terms like "top", "bottom" and
210 "low" and "high".
211
212 So I prefer to make all of this SUPER CONCRETE. Here's
213 my own personal explanation:
214
215 push eax ; containing 0xAAA
216 push ebx ; containing 0xBBB
217 push ecx ; containing 0xCCC
218 push edx ; containing 0xDDD
219 pop edx
220 pop ecx
221
222 The Stack:
223 ----------
224 0xAAA <-- esp + 4
225 0xBBB <-- esp
226 0xCCC <-- esp - 4
227 0xDDD <-- esp - 8
228
229 Heck, I'm gonna verify that for myself right now with
230 all of you watching:
231
232 (gdb) s
233 125 mov eax, 0xAAA
234 126 mov ebx, 0xBBB
235 127 mov ecx, 0xCCC
236 128 mov edx, 0xDDD
237 129 push eax
238 130 push ebx
239 131 push ecx
240 132 push edx
241 133 pop edx
242 134 pop ecx
243 (gdb) x $esp + 4
244 0xffffd77c: 0x00000aaa
245 (gdb) x $esp
246 0xffffd778: 0x00000bbb
247 (gdb) x $esp - 4
248 0xffffd774: 0x00000ccc
249 (gdb) x $esp - 8
250 0xffffd770: 0x00000ddd
251
252 Whew! At least I've got that much right. :-)
253
254 So my fix is this:
255
256 mov eax, [esp + 4] ; first param from stack!
257
258 And now let's see what we've got in eax:
259
260 (gdb) x/a $eax
261 0x804a006 <temp_meow_name>: 0x776f656d
262
263 Perfect. And $ebx should be the same to begin with:
264
265 (gdb) x/a $ebx
266 0x804a006 <temp_meow_name>: 0x776f656d
267
268 Yup. Good so far.
269
270 ...wait. This next line isn't right:
271
272 96 cmp edx, ebx
273
274 What am I doing? I'm comparing the two addresses here,
275 not the characters they point to. Even worse, I can't
276 compare two pointed-to *values* at the same time. I need
277 to actually store at least one of the two *values* to
278 compare in a register!
279
280 Sheesh. Lemme fix this up. Okay, so here's the new
281 register use, which I'm trying to make as conventional
282 as I know how...
283
284 ; register use:
285 ; al - to-find name character being checked
286 ; ebx - start of dict word's name string
287 ; ecx - byte offset counter (each string character)
288 ; edx - dictionary list pointer
289 ; ebp - start of to-find name string
290
291 And the code has changed quite a bit, so I'm gonna step
292 through it again:
293
294 (gdb) s
295 146 push temp_meow_name ; the name string to find
296 147 call find ; answer will be in eax
297 find () at meow5.asm:80
298 80 mov ebp, [esp + 4] ; first param from stack!
299 83 mov edx, [last]
300 find.test_word () at meow5.asm:85
301 85 cmp edx, 0 ; a null pointer (0) is end of list
302 86 je .not_found
303 92 lea ebx, [edx + 8] ; set dict. word name pointer
304 93 mov ecx, 0 ; reset byte offset counter
305
306 Okay, first the ebx register should now point to the
307 current dictionary word's name that we're gonna test:
308
309 (gdb) x/s $ebx
310 0x804909f: "find"
311
312 Good.
313
314 And the ebp register should point to the to-find name:
315
316 (gdb) x/s $ebp
317 0x804a006 <temp_meow_name>: "meow"
318
319 Good.
320
321 find.compare_names_loop () at meow5.asm:95
322 95 mov al, [ebp + ecx] ; get next to-find name byte
323 96 cmp al, [ebx + ecx] ; compare with next dict word byte
324
325 Now the character in byte register al should be the first
326 one from the to-find name "meow":
327
328 (gdb) p/c $al
329 $2 = 109 'm'
330
331 Good.
332
333 And the character pointed to by ebx+ecx should be the
334 first one from the dict word "find":
335
336 (gdb) x/c $ebx+$ecx
337 0x804909f: 102 'f'
338
339 Good.
340
341 And since these don't match, the jump should take us to
342 the next word...
343
344 97 jne .try_next_word ; found a mismatch!
345 find.try_next_word () at meow5.asm:102
346 102 mov ecx, [ecx] ; follow the tail! (linked list)
347 Program received signal SIGSEGV, Segmentation fault.
348
349 Oh, right. Silly me. I'm storing the dictionary word
350 links in the edx register now, not ecx! I missed this
351 one...
352
353 Okay, how about now?
354
355 find.try_next_word () at meow5.asm:103
356 103 mov edx, [edx] ; follow the tail! (linked list)
357 (gdb) x/a $edx
358 0x8049097 <find_tail>: 0x8049052 <inline_tail>
359 (gdb) s
360 104 jmp .test_word
361
362 That's better. Let's see if we're testing "meow" vs
363 "inline" now (well, 'm' vs 'i'):
364
365 (gdb) p/c $al
366 $1 = 109 'm'
367 (gdb) x/c $ebx+$ecx
368 0x804905a: 105 'i'
369
370 Good!
371
372 And the next word should be "meow", so 'm' vs 'm':
373
374 (gdb) p/c $al
375 $2 = 109 'm'
376 (gdb) x/c $ebx+$ecx
377 0x8049037: 109 'm'
378 98 jne .try_next_word ; found a mismatch!
379 99 cmp al, 0 ; both hit 0 terminator at same time
380 100 je .found_it
381 find.try_next_word () at meow5.asm:103
382 103 mov edx, [edx] ; follow the tail! (linked list)
383
384 What?
385
386 Oh. <facepalm> It just dropped through. I forgot the
387
388 jmp .compare_names_loop
389
390 at the end of my loop...
391
392 I'll spare you the second go where I had an infinite loop
393 because I had *also* forgotten to increment the ecx
394 register to check the next letter in the strings...
395
396 Okay, and now?
397
398 Reading symbols from meow5...
399 (gdb) break 97
400 Breakpoint 1 at 0x8049081: file meow5.asm, line 97.
401 1: /c $al = <error: No registers.>
402 (gdb) r
403 Starting program: /home/dave/meow5/meow5
404 Breakpoint 1, find.compare_names_loop () at meow5.asm:97
405 97 cmp al, [ebx + ecx] ; compare with next dict word byte
406 (gdb) display /c *($ebx + $ecx)
407 (gdb) display /c $al
408 1: /c $al = 109 'm'
409 2: /c *($ebx + $ecx) = 102 'f'
410 (gdb) c
411 Continuing.
412 Breakpoint 1, find.compare_names_loop () at meow5.asm:97
413 97 cmp al, [ebx + ecx] ; compare with next dict word byte
414 1: /c $al = 109 'm'
415 2: /c *($ebx + $ecx) = 105 'i'
416 ...
417 1: /c $al = 109 'm'
418 2: /c *($ebx + $ecx) = 109 'm'
419 ...
420 1: /c $al = 101 'e'
421 2: /c *($ebx + $ecx) = 101 'e'
422 ...
423 1: /c $al = 111 'o'
424 2: /c *($ebx + $ecx) = 111 'o'
425 ...
426 1: /c $al = 119 'w'
427 2: /c *($ebx + $ecx) = 119 'w'
428 ...
429 1: /c $al = 0 '\000'
430 2: /c *($ebx + $ecx) = 0 '\000'
431 (gdb) c
432 Continuing.
433
434 Program received signal SIGSEGV, Segmentation fault.
435 inline_a_meow () at meow5.asm:152
436 152 mov esi, [eax] ; putting directly in reg for now
437
438 Yay! (Not the segfault, but the apparent correct matching
439 of the strings.)
440
441 Now let's see what's happening once we get a match,
442 because clearly eax is not getting returned with a valid
443 word tail address...
444
445 (gdb) break find.found_it
446 ...
447 Breakpoint 1, find.found_it () at meow5.asm:113
448 113 mov eax, ecx ; pointer to tail of dictionary word
449
450 Gah! I see it. Another ecx that should be an edx. I
451 could have sworn I searched for these...
452
453 Reading symbols from meow5...
454 (gdb) break find.found_it
455 Breakpoint 1 at 0x8049097: file meow5.asm, line 113.
456 (gdb) r
457 Starting program: /home/dave/meow5/meow5
458
459 Breakpoint 1, find.found_it () at meow5.asm:113
460 113 mov eax, edx ; pointer to tail of dictionary word
461 (gdb) p/a $edx
462 $1 = 0x804902f <meow_tail>
463
464 That's better. So yeah, we definitely found the meow
465 word by string. Very cool. Let's see what happens next...
466
467 (gdb) s
468 114 ret ; (using call/ret for now)
469 (gdb)
470 inline_a_meow () at meow5.asm:152
471 152 mov esi, [eax] ; putting directly in reg for now
472 (gdb)
473 153 call inline
474 (gdb)
475 inline () at meow5.asm:62
476 62 mov edi, [here] ; destination
477
478 Yes, very nice...
479
480 Breakpoint 1, find.found_it () at meow5.asm:113
481 Breakpoint 1, find.found_it () at meow5.asm:113
482 Breakpoint 1, find.found_it () at meow5.asm:113
483 Breakpoint 1, find.found_it () at meow5.asm:113
484
485 That's four more 'meow's getting inlined...
486
487 Breakpoint 1, find.found_it () at meow5.asm:113
488
489 That's the 'exit'...
490
491 113 mov eax, edx ; pointer to tail of dictionary word
492 (gdb) c
493 Continuing.
494
495 Program received signal SIGSEGV, Segmentation fault.
496 inline () at meow5.asm:63
497 63 mov ecx, [esi + 4] ; get len into ecx
498
499 Wait, how did esi get the wrong value?
500
501 Oh jeez, I have these brackets around eax here:
502
503 mov esi, [eax] ; putting directly in reg for now
504
505 But I want the address in eax, not the value it's pointing
506 to. Yet another easy fix:
507
508 mov esi, eax ; putting directly in reg for now
509
510 You know what? I feel like this should be good now.
511
512 Let's do this:
513
514 dave@cygnus~/meow5$ mr
515 Meow.
516 Meow.
517 Meow.
518 Meow.
519 Meow.
520
521 Yes!
522
523 I'm now able to find words by string name in the
524 dictionary and "compile" them into memory and run them.
525
526 The only TODO "checkbox" I didn't check in this log was
527 this one:
528
529 [ ] Make all words take params from the stack, not
530 from pre-defined registers.
531
532 Which should be no problem. That'll be a nice easy way
533 to start the next log, so I'll see you in log03.txt
534 with that!