1 First log session to test what I've got so far. GNU Debugger recorded in
2 GNU screen for the full GNU experience.
3 I'll clean up a lot of the gdb prompts and stuff for clarity.
4
5 Reading symbols from nasmjf...
6 Breakpoint 1 at 0x804900e: file nasmjf.asm, line 80.
7 Breakpoint 1, _start () at nasmjf.asm:80
8 80 cld ; Clear the "direction flag" which means the string instructions (such
9 82 mov [var_S0], esp ; save the regular stack pointer (used for data) in FORTH var S0!
10 84 mov ebp, return_stack_top ; Initialise the return stack pointer
11
12 Trying a defined "function" in GDB to cut down on the typing. I always
13 have to cast the NASM labels to (int) since the debugging info has no
14 way of telling GDB what I'm storing there. "int" in this case just means
15 I've got a 4-byte (32 bit) value. GDB has a strong C heritage.
16 p - displays the VALUE of the label, which is an address
17 x - displays the memory at the address
18 p/x and x/x displays as hexadecimal
19 *(int) uses the address stored AT the memory referenced by the label
20 (again, strong C heritage in this syntax)
21 All three of these won't always be relevant, but it saves a lot of typing.
22
23 (gdb) define foo
24 Type commands for definition of "foo".
25 End with a line saying just "end".
26 >p/x (int)$arg0
27 >x/x (int)$arg0
28 >x/x *(int)$arg0
29 >end
30
31 Initial nonsense over. now we use the main mechanism that drives the Forth
32 instructions: the NEXT macro is inlined at the end of every word and here
33 to bootstrap the action. cold_start contains the address of the "QUIT" word.
34 (quit is a silly name - it doesn't quit Forth, it "quits" TO the interpreter)
35 (side note: i'd like everything to be lowercase except assembly macros. But
36 after 'quit' and 'docol', I haven't been good about converting them. Will
37 probably do a couple rounds of cleanup at some point...)
38
39 NEXT loads the address of the next instruction and we jump to it, executing
40 the machine code there.
41
42 _start () at nasmjf.asm:88
43 88 mov esi, cold_start ; give next forth word to execute
44 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
45 28 jmp [eax] ; Jump to whatever code we're now pointing at.
46
47 Since QUIT is defined with the DEFWORD macro, it begins with a call to
48 the 'DOCOL' word - which, in essense, sets up the rest of the Forth word
49 to be executed (QUIT, in this case) for another call to NEXT.
50
51 docol () at nasmjf.asm:40
52 40 lea ebp, [ebp-4] ; "load effective address" of next stack position
53 41 mov [ebp], %1 ; "push" the register value to the address at ebp
54 70 add eax, 4 ; eax points to docol (me!) in word definition. Go to next.
55
56 Here I use that 'foo' function to see if that's true about the eax register.
57 Note that the add 4 instruction has NOT yet executed. GDB always shows the
58 next instruction before you tell it to step forward to that instruction!
59
60 (gdb) foo $eax
61 $9 = 0x804a010
62 0x804a010: 0x08049000
63 0x8049000 <docol>: 0x89fc6d8d
64
65 Yup! It points to DOCOL all right. Now we step and add 4 to eax:
66
67 (gdb) s
68 71 mov esi, eax ; Put the next word pointer into esi
69 (gdb) foo $eax
70 $10 = 0x804a014
71 0x804a014: 0x0804a12c
72 0x804a12c: 0x08049218
73
74 Every single Forth word ends with NEXT, which executes the next word.
75 In this case, it's happening at the end of DOCOL (and DOCOL's job is
76 to get everything set up to have NEXT execute the rest of the word...)
77
78 (gdb) s
79 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
80 28 jmp [eax] ; Jump to whatever code we're now pointing at.
81
82 Double-checking that the instructions in QUIT are what we'll be running
83 now...
84
85 (gdb) foo $eax
86 $12 = 0x804a12c
87 0x804a12c: 0x08049218
88 0x8049218 <code_R0>: 0x04c30868
89
90 Yes! The 'R0' constant is the first thing we run in QUIT! It's really wild
91 how constants in Forth are actually words with a single instruction that
92 pushes a value onto the stack! In this case, R0 is the top of the return
93 stack.
94
95 The push %5 line is from the DEFCONST macro, which, in turn, calls the
96 DEFCODE macro because consts are words. Then the NEXT macro continues to
97 the next word in QUIT...
98
99 code_R0 () at nasmjf.asm:568
100 568 push %5
101 code_R0 () at nasmjf.asm:27
102 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
103 28 jmp [eax] ; Jump to whatever code we're now pointing at.
104
105 ...which happens to be RSPSTORE, which puts a value on the return stack.
106
107 code_RSPSTORE () at nasmjf.asm:201
108 201 pop ebp
109 code_RSPSTORE () at nasmjf.asm:27
110 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
111 28 jmp [eax] ; Jump to whatever code we're now pointing at.
112
113 ...and then QUIT runs INTERPRET, which takes words on STDIN and then
114 ...calls _WORD to get a word from input which
115 ...calls _KEY to get a character ("key") of input
116
117 code_INTERPRET () at nasmjf.asm:209
118 209 call _WORD ; Returns %ecx = length, %edi = pointer to word.
119 _WORD.skip_non_words () at nasmjf.asm:310
120 310 call _KEY ; get next key, returned in %eax
121 _KEY () at nasmjf.asm:351
122
123 _KEY first checks to see if it needs input (currkey has reached
124 bufftop). On first run, they're both zero, so yeah, we need more
125 input.
126
127 Aside: again, "key" isn't how we would normally describe this in
128 a modern environment - it's the next "character" (and even that's
129 becoming a thing of the past now that Unicode is pretty much standard
130 everywhere...).
131
132 Anyway, comparing currkey (ebx = 0) and bufftop (0) sets the Zero
133 Flag (ZF) because the difference between them is the same. As we
134 can see in the 'info reg' display below:
135
136 351 mov ebx, [currkey]
137 352 cmp ebx, [bufftop]
138 353 jge .get_more_input
139 (gdb) info reg
140 ...
141 ebx 0x0 0
142 eflags 0x246 [ PF ZF IF ]
143 ...
144
145 We get more input by telling Linux to give us input from
146 STDIN into a fixed-size buffer:
147
148
149 _KEY.get_more_input () at nasmjf.asm:361
150 361 xor ebx,ebx ; 1st param: stdin
151 362 mov ecx,buffer ; 2nd param: buffer
152 363 mov [currkey],ecx
153 364 mov edx,buffer_size ; 3rd param: max length
154 365 mov eax,__NR_read ; syscall: read
155 366 int 0x80 ; syscall!
156
157 Now I type "foo<enter>":
158
159 foo
160
161 We check to make sure the input isn't zero-length.
162 I don't think it would ever be - the <enter> key would
163 always give us at least '\n'?
164
165 367 test eax,eax ; If %eax <= 0, then exit.
166 368 jbe .eof
167 369 add ecx,eax ; buffer+%eax = bufftop
168 370 mov [bufftop],ecx
169
170 We can see how long the input string is. Yup, 4 bytes is
171 right: "foo\n".
172
173 (gdb) foo $eax
174 $15 = 0x4
175
176 Now we're back to _KEY, having gathered some input.
177 We repeat the check...
178
179 371 jmp _KEY
180 _KEY () at nasmjf.asm:351
181 351 mov ebx, [currkey]
182 352 cmp ebx, [bufftop]
183 353 jge .get_more_input
184
185 This time we have input (and bufftop is at a higher
186 address than currkey), so we continue on by grabbing
187 the current "key" (character):
188
189 354 xor eax, eax
190 355 mov al, [ebx] ; get next key from input buffer
191
192 If that worked, the al register now has the first
193 character of "foo\n". Yup, there's the "f"! (p/c means
194 print as a character. We can also p/s to print a C-style
195 string.)
196
197 (gdb) p/c $al
198 $19 = 102 'f'
199
200 Now we set currkey to the next character and return...
201
202 356 inc ebx
203 357 mov [currkey], ebx ; increment currkey
204 358 ret
205
206 Back at _WORD, we check to see if we've hit a character
207 to skip. Forth is so syntactically simple, I just love it.
208
209 NOTE that the jbe instruction is "jump if compared value is
210 before (less than) or equal", so any character smaller
211 than an ASCII space (0x20) will cause us to keep seeking in the
212 .skip_non_words loop. This is a clever way to skip spaces,
213 tabs, newlines, returns, form feeds, etc. I'll improve the
214 comments for these instructions in the actual program now.
215
216 _WORD.skip_non_words () at nasmjf.asm:311
217 311 cmp al,'\' ; start of a comment?
218 312 je .skip_comment ; if so, skip the comment
219 313 cmp al,' ' ; space?
220 314 jbe .skip_non_words ; if so, keep looking
221
222 Nope, character looks good. So we add it to word_buffer
223 in memory. The stosb instruction implicitly copies what's
224 in the al register (the 'b' is for byte) to memory at
225 the address stored in the edi register.
226
227 Then edi is incremented so that the next time this happens,
228 the next byte will go to the next position, and so forth.
229 It turns out, this is the sort of thing we're guaranteeing
230 when we cleared the direction flag at the very beginning.
231
232 317 mov edi,word_buffer ; put addr to word return buffer in edi
233
234 Now that we've established that we're past any whitespace
235 and are gathering the actual input, we're in .collect_word.
236 I'll snip the stepping through _KEY for 'o', 'o', and '\n'
237
238 _WORD.collect_word () at nasmjf.asm:319
239 319 stosb ; add character to return buffer
240 320 call _KEY ; get next key, returned in %al
241
242 After every call to _KEY, we check to see if we're done
243 collecting the word. The ja instruction is "jump if the
244 compared value is after (greater than)," which is the
245 exact opposite of the jbe check above.
246 To put it straight: before we were looping WHILE the
247 character was whitespace, now we loop UNTIL the character
248 is whitespace.
249
250 321 cmp al,' ' ; is blank?
251 322 ja .collect_word ; if not, keep looping
252
253 Now _WORD returns the length and address of the collected word.
254
255 325 sub edi, word_buffer ; hmm, the len?
256 326 mov ecx, edi ; return it
257 327 mov edi, word_buffer ; return address of the word
258 328 ret
259
260 Then we return to _INTERPRET from _WORD:
261
262 code_INTERPRET () at nasmjf.asm:212
263 212 xor eax,eax ; back from _WORD...zero eax
264 ...
265
266 Let's check the return values now:
267
268 (gdb) p $ecx
269 $1 = 3
270 (gdb) x/3c $edi
271 0x804a068 <word_buffer>: 102 'f' 111 'o' 111 'o'
272
273 Yay! There's the "foo" string that was input.
274 Even though I've got some of the _FIND word that tries to
275 match the input word, I think this has been quite enough
276 for one log. :-)