1 Warning, the examples with variables in this log are
2 all wrong. This update explains:
3
4 !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!!
5 ! In log19.txt, I realize that my variable !
6 ! handling is wrong. Variables should leave !
7 ! their addresses on the stack, not their !
8 ! values! We need FETCH to get the value from !
9 ! the address! !
10 !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!!
11
12 Two new words add to the word "compiler" abilities of
13 the interpreter:
14
15 IMMEDIATE sets the latest word to be "immediate"
16 HIDE takes the next "word" of input, looks it up
17 in the dictionary, and then sets that word
18 to be hidden (via the word HIDDEN)
19
20 HIDE seems the easiest to test, so we'll start with that:
21
22 : emit2 EMIT EMIT ;
23 66 65 emit2
24 AB
25 HIDE emit2
26 66 65 emit2
27 PARSE ERROR: 66 65 emit2
28
29 That worked: we get the parse error because emit2 has been
30 hidden and is no longer found in the dictionary.
31
32 The HIDDEN word used by HIDE actually toggles the hidden state,
33 so can we call HIDE again to unhide the word?
34
35 HIDE emit2
36
37 Program received signal SIGSEGV, Segmentation fault.
38 code_HIDDEN () at nasmjf.asm:635
39 635 xor [edi], word F_HIDDEN ; Toggle the HIDDEN bit in place.
40
41 Oh, ha ha, no, of course not. It's hidden, so HIDE can't
42 find it (and since there's absolutely no error checking,
43 we crash trying to toggle the bit in some random memory
44 location).
45
46 I guess we could use LATEST and HIDDEN to manually toggle
47 it back, but I can't be bothered tonight.
48
49 Onward to IMMEDIATE.
50
51 : ab 66 65 EMIT EMIT ;
52 IMMEDIATE
53 : foo 1000 . ;
54 foo
55 foo
56 55 EMIT
57 ;
58
59 ^C
60 Program received signal SIGINT, Interrupt.
61 _WORD.skip_non_words () at nasmjf.asm:339
62 339 call _KEY ; get next key, returned in %eax
63
64 Something went wrong. I had to Ctrl+C to end the
65 program. It was merrily taking input, but nothing
66 would execute, not even Ctrl+D to end the input
67 and exit.
68
69 Let's try that again and verify we're toggling the
70 right word...
71
72 (gdb) r
73 Starting program: /home/dave/nasmjf/nasmjf
74 (gdb) c
75 Continuing.
76 LATEST 4 + C@ .
77 6
78 LATEST 5 + C@ EMIT
79 L
80
81 Okay, just sanity checking LATEST - it points to
82 a word with 6 letters in the name and starts with
83 the letter "L" (it's LATEST itself).
84
85 I'll define my 'ab' word again, try it out (it should
86 print the letters "AB"), and check LATEST again...
87
88 : ab 66 65 EMIT EMIT ;
89 ab
90 AB
91 LATEST 4 + C@ .
92 2
93 LATEST 5 + C@ EMIT
94 a
95 IMMEDIATE
96 ab
97
98 Drat! Then it locked up again. So IMMEDIATE is
99 definitely not working right.
100
101 Next night: okay, let's see what's going on...
102
103 (gdb) break code_IMMEDIATE
104 Breakpoint 2 at 0x80494ec: file nasmjf.asm, line 1097.
105 (gdb) c
106 Continuing.
107 : ab 66 65 EMIT EMIT ;
108 ab
109 AB
110 IMMEDIATE
111
112 Breakpoint 2, code_IMMEDIATE () at nasmjf.asm:1097
113 (gdb) p/x (int)var_LATEST
114 $1 = 0x804e000
115 (gdb) x/10c (int)var_LATEST
116 0x804e000: ... 2 '\002' 97 'a' 98 'b' ...
117
118 So that's right - LATEST points at word 'ab'...
119
120 1098 add edi, 4 ; Point to name/flags byte.
121 1099 xor byte [edi], F_IMMED ; Toggle the IMMED bit.
122 (gdb) p/x $edi
123 $2 = 0x804a6b0
124
125 That's a dead giveaway, the address in register
126 edi should now be LATEST + 4. But it's actually
127 the _address_ of LATEST + 4!
128
129 (gdb) p/x (int)var_LATEST
130 $3 = 0x804e000
131
132 It still takes me a bit before I see it...
133
134 (gdb) disass 1099
135 No function contains specified address.
136 (gdb) disass code_IMMEDIATE
137 Dump of assembler code for function code_IMMEDIATE:
138 0x080494ec <+0>: mov edi,0x804a6ac <--- should be PTR
139 0x080494f1 <+5>: add edi,0x4
140 => 0x080494f4 <+8>: xor BYTE PTR [edi],0x80
141 0x080494f7 <+11>: lods eax,DWORD PTR ds:[esi]
142 0x080494f8 <+12>: jmp DWORD PTR [eax]
143 End of assembler dump.
144
145 I finally see it.
146
147 I have
148
149 mov edi, var_LATEST
150
151 where I should have
152
153 mov edi, [var_LATEST]
154
155 (so of course it wasn't working after that. LATEST was
156 incremented and no longer pointed at word. All further
157 interpretation would fail to match!)
158
159 With that fixed, it should work...
160
161 (gdb) load
162 (gdb) r
163 Starting program: /home/dave/nasmjf/nasmjf
164 : ab 66 65 EMIT EMIT ;
165 ab
166 AB
167 IMMEDIATE
168
169 So now 'ab' should execute as soon as the interpreter
170 sees it, even in compile mode:
171
172 : five 5 . ab ;
173 AB
174 five
175 5
176
177 Yeah! The call to 'ab' executed at "compile time" rather
178 than "run time" for the new word 'five'. Using this, we
179 could add new language features to FORTH in FORTH.
180
181 Next, the TICK (single quote ') word gets the address
182 of a word (supplied after the ' so it doesn't execute.
183 this is the same trick LIT uses).
184
185 This implementation can only work at compile time because
186 the interpreter needs to turn the word that follows as
187 a 4-byte address for ' to be able to read and then hope
188 over that value. Just for fun, let's try to print the
189 address of the EMIT word outside of the compile state:
190
191 ' EMIT .
192
193 Program received signal SIGSEGV, Segmentation fault.
194
195 See?
196
197 Now let's use it the same way, but in a new compiled word:
198
199 : addrofemit ' EMIT . ;
200 addrofemit
201 134521260
202
203 Looks like it worked, but is that address correct?
204
205 (gdb) info addr EMIT
206 Symbol "EMIT" is at 0x804a1ac in a file compiled without debugging.
207 (gdb) p/d 0x804a1ac
208 $1 = 134521260
209
210 Yup!
211
212 Then the next night, I've got a really exciting one,
213 BRANCH0.
214
215 But first, I'm trying to figure out how to even test
216 BRANCH, let alone its conditional big brother!
217
218 I even worked it out on paper the next morning, and I'm
219 still not seeing why this doesn't work:
220
221 : foo 65 EMIT BRANCH -12 ;
222 foo
223 A
224 Program received signal SIGSEGV, Segmentation fault.
225 code_BRANCH () at nasmjf.asm:27
226 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
227
228 By my calculations, esi starts off pointing at the
229 offset number's instruction (-12), then we should be
230 branching back to "65":
231
232 0 "-12"
233 -4 BRANCH
234 -8 EMIT
235 -12 "65"
236
237 And I've had a bummer of a time trying to step through
238 it by breaking on BRANCH because that word is used
239 (correctly) as part of the interpreter loop.
240
241 So I'm going to copy BRANCH with the silly name BRUNCH
242 and see why it's not correct!
243
244 DEFCODE "BRUNCH",6,,BRUNCH
245 add esi, [esi]
246 NEXT
247
248 Should be pretty simple, right? It's just a one-liner!
249
250 (gdb) break code_BRUNCH
251 (gdb) c
252 Continuing.
253 : foo 65 EMIT BRUNCH -12 ;
254 foo
255 A
256 Breakpoint 2, code_BRUNCH () at nasmjf.asm:251
257 251 add esi, [esi] ; add the offset to the instruction pointer
258
259 Okay, now let's thoroughly examine this. We're going to
260 add the negative number stored where esi points to FROM
261 esi. Where does esi point?
262
263 (gdb) p/x $esi
264 $3 = 0x804e01c
265 (gdb) x/x $esi
266 0x804e01c: 0x0804a0f0
267 (gdb) info sym *$esi
268 LIT in section .data of /home/dave/nasmjf/nasmjf
269 (gdb) x/b $esi+4
270 0x804e020: -12 '\364'
271
272 Yup, we can see that esi points to the address of LIT
273 followed by the value -12. As expected.
274
275 (gdb) s
276 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
277
278 Now that's run, where does esi point now?
279
280 (gdb) info sym *$esi
281 Cannot access memory at address 0x1009810c
282 (gdb) p/x $esi
283 $4 = 0x1009810c
284
285 What? That address isn't right. It should be 12 less
286 than before, not...oh wait...
287
288 (gdb) disass code_BRUNCH
289 Dump of assembler code for function code_BRUNCH:
290 0x08049054 <+0>: add esi,DWORD PTR [esi]
291 => 0x08049056 <+2>: lods eax,DWORD PTR ds:[esi]
292 0x08049057 <+3>: jmp DWORD PTR [eax]
293 End of assembler dump.
294
295 Now I see it. We subtracted the address of LIT, not
296 the -12 that follows it. No wonder I got a segfault.
297
298 So how do I get the value -12 right after BRANCH?
299
300 Next night: okay, so I reviewed the ported words
301 so far and I'm pretty sure COMMA (,) fits the bill.
302 It "compiles" the value on the stack to the current
303 position...
304
305 : foo 65 EMIT BRUNCH -12 , ;
306 foo
307 A
308 Breakpoint 3, code_BRUNCH () at nasmjf.asm:251
309 251 add esi, [esi] ; add the offset to the instruction pointer
310 (gdb) x/x **$esi
311 0x8049228 <code_LIT>: 0xffad50ad
312
313 ...no, dang it, that doesn't work either, and for
314 the same reason. Sure, ',' will store whatever's on
315 the stack, but we're still getting LIT -12 compiled
316 first when we're compiling.
317
318 So I really don't see any easy way to test BRANCH,
319 let alone 0BRANCH with an arbitrary snippet of code
320 at this point. :-(
321
322 Next evening: I've also just ported LITSTRING and TELL,
323 two more primitives that appear hard to test because
324 I'm not sure how to compile literal values into memory
325 yet.
326
327 So, this would be a pretty big let-down way to end a
328 log file but...
329
330 IT APPEARS THAT I'VE PORTED ALL OF THE ASSEMBLY!
331
332 Yeah, so starting with the next log, I'm going to
333 start feeding jonesforth.f, which is the second half
334 of the language implementation implemented in itself,
335 into my port and fix the inevitable bugs.
336
337 It's been about six months of slowly chipping away
338 at this port nearly every single evening. I can
339 barely believe this stage has arrived. This is so
340 cool. 8-)