1 Now having seen a correct response when nothing matches
2 idiotic input string "foo", let's take a look at input
3 that does match. I'll start with a defined word. How about
4 "FIND", since we know that exists.
5
6 I'm not feeling particularly clever tonight. So I'm just
7 going to step through everthing and cut down the log to
8 the good stuff.
9
10 The "FIND" below is me typing "FIND" when the interpreter
11 requests input.
12
13 (gdb) run
14 code_INTERPRET () at nasmjf.asm:209
15 _WORD.skip_non_words () at nasmjf.asm:308
16 _KEY () at nasmjf.asm:349
17 _KEY.get_more_input () at nasmjf.asm:359
18 ...
19 364 int 0x80 ; syscall!
20 (gdb)
21 FIND
22
23 Now let's see if FIND can find itself. :-)
24
25 code_INTERPRET () at nasmjf.asm:212
26 _FIND () at nasmjf.asm:485
27 485 push esi ; _FIND! Save esi, we'll use this reg for string comparison
28 488 mov edx,[var_LATEST] ; LATEST points to name header of the latest word in the diction
29
30 As before, we alternate between .test_word and .prev_word
31 for every word in the dictionary, starting with the latest
32 and using the stored pointers as a linked list until we
33 find a match or the beginning of the list.
34
35 _FIND.test_word () at nasmjf.asm:490
36 _FIND.prev_word () at nasmjf.asm:517
37 ...
38 490 test edx,edx ; NULL pointer? (end of the linked list)
39 491 je .not_found
40 496 xor eax,eax
41 497 mov al, [edx+4] ; al = flags+length field
42 498 and al,(F_HIDDEN|F_LENMASK) ; al = name length
43 499 cmp cl,al ; Length is the same?
44 500 jne .prev_word ; nope, try prev
45
46 Length of the word matched, now we'll check the actual
47 name string.
48
49 The key to understanding the comparison is knowing that
50 cmpsb implicitly uses the eci and edi registers as pointers
51 to the data to compare. The repe mnemonic stands for
52 "repeat while equal" and is a modifier for the cmpsb
53 instruction.
54
55 I've always been in the RISC camp (versus CISC), because
56 I love systems that compose from "simple" pieces. But I
57 have to admit that these "string" operations in x86
58 do make a lot of sense. After dwelling on this a bit
59 last night, I think I had a dream where I desiged some
60 new hardware (like Ben Eater's 8-bit breadboard CPU or
61 was it an FPGA?) and I was coming up with a new
62 instruction set architecture (ISA) that was strictly
63 "complex" instructions like these string operations,
64 but I think it was also inspired by the array languages
65 like APL and J. Anyway, I've abandoned my simplistic
66 noob stance on RISC: there are lots of different kinds
67 of simple and RISC trades one kind for another.
68
69 Let's see a repe cmpsb in action...
70
71 503 push ecx ; Save the length
72 504 push edi ; Save the address (repe cmpsb will move this pointer)
73 _FIND.test_word () at nasmjf.asm:505
74 505 lea esi,[edx+5] ; Dictionary string we are checking against.
75 506 repe cmpsb ; Compare the strings.
76 507 pop edi
77 _FIND.test_word () at nasmjf.asm:508
78 508 pop ecx
79 _FIND.test_word () at nasmjf.asm:509
80 509 jne .prev_word ; nope, try prev
81 512 pop esi
82 _FIND.test_word () at nasmjf.asm:513
83 513 mov eax,edx
84 514 ret ; FOUND!
85
86 Yay! We've got a match on the word.
87
88 Now back in INTERPRET, we have to do another comparison to
89 check the return value. Since we did, we can now act upon
90 the matched word.
91
92 code_INTERPRET () at nasmjf.asm:215
93 215 test eax,eax ; Found?
94 216 jz .try_literal
95 219 mov edi,eax ; edi = dictionary entry YES WE HAVE MATCHED A WORD!!!
96 220 mov al,[edi+4] ; Get name+flags.
97 221 push ax ; Just save it for now.
98 code_INTERPRET () at nasmjf.asm:222
99 222 call _TCFA ; Convert dictionary entry (in %edi) to codeword pointer.
100
101 So TCFA is the internal label for the Forth word
102 ">CFA" which I read as "To CFA" and Jones guesses
103 probably means "Code Field Address". It's job is
104 to take the given pointer to a word and return a
105 pointer to the word's code. Neat.
106
107 _TCFA () at nasmjf.asm:386
108 386 xor eax,eax
109 387 add edi,4 ; Skip link pointer.
110 388 mov al,[edi] ; Load flags+len into %al.
111 389 inc edi ; Skip flags+len byte.
112 390 and al,F_LENMASK ; Just the length, not the flags.
113 391 add edi,eax ; Skip the name.
114 392 add edi,3 ; The codeword is 4-byte aligned.
115 393 and edi,-3
116 394 ret
117
118 Then we return to INTERPRET again now that the edi
119 register contains the address of the matched word's
120 code (in this case, the code for FIND).
121
122 We can be in immediate mode and/or executing state.
123
124 In this case, we are NOT in immediate mode.
125
126 code_INTERPRET () at nasmjf.asm:223
127 223 pop ax
128 code_INTERPRET () at nasmjf.asm:224
129 224 and al,F_IMMED ; is IMMED flag set?
130 225 mov eax,edi
131 226 jnz .execute ; If IMMED, jump straight to executing.
132
133 We ARE in executing state.
134
135 227 jmp .check_state
136 code_INTERPRET.check_state () at nasmjf.asm:238
137 238 mov edx,[var_STATE]
138 239 test edx,edx
139 240 jz .execute ; Jump if executing.
140 code_INTERPRET.execute () at nasmjf.asm:253
141 253 mov ecx,[interpret_is_lit] ; Literal?
142 254 test ecx,ecx ; Literal?
143 255 jnz .do_literal
144
145 To execute the matched word, we simply jump
146 to the code address...
147
148 259 jmp [eax]
149
150 ...and now we're executing FIND, just as expected.
151
152 code_FIND () at nasmjf.asm:478
153 478 pop ecx ; length of word
154 479 pop edi ; buffer with word
155 480 call _FIND
156 _FIND () at nasmjf.asm:485
157 485 push esi ; _FIND! Save esi, we'll use this reg for string comparison
158 ...
159
160 I'm not even sure what FIND is looking for now
161 since I didn't bother examining any memory during
162 the rest of the run. I was just happy to see the
163 interpreter finding and executing the requested
164 word!
165
166 Eventually it got through the linked list and didn't
167 match anything.
168
169 _FIND.test_word () at nasmjf.asm:490
170 490 test edx,edx ; NULL pointer? (end of the linked list)
171 491 je .not_found
172 _FIND.not_found () at nasmjf.asm:521
173 521 pop esi
174 _FIND.not_found () at nasmjf.asm:522
175
176 And then Forth exited normally having run out
177 of code (the interpreter does not yet loop, so
178 it always exits after the first bit of input.
179
180 190 mov eax, 1 ; exit syscall
181 191 int 80h ; call kernel
182 [Inferior 1 (process 2531) exited normally]
183
184 Next will be handling numeric literals.