1 The last log saw yet _another_ bug in my DEFVAR macro.
2 Sheesh. Hopefully I have variables working correctly
3 now?
4
5 As I mentioned last night, the next words look like a
6 real mixture of things. Let's jump into it:
7
8 WITHIN is a conditional that tests if a number is
9 between two other numbers. This highlights the advantage
10 of such a flexible language (and the simplicity of the
11 syntax). I'll format this slightly to make a bit of a
12 truth table out of it to demonstrate how within works:
13
14 1 2 3 WITHIN . 0
15 2 3 1 WITHIN . 0
16 3 1 2 WITHIN . 0
17 2 1 3 WITHIN . 1 <-- 2 is within 1 and 3
18
19 DEPTH gives us the depth of the stack:
20
21 1 2
22 .S
23 2 1
24 DEPTH .
25 8
26
27 Huh? Oh, it's in bytes, not number of items!
28
29 ALIGNED rounds a number up to the next multiple of 4,
30 which aligns addresses on 32-bit (4 byte) computers.
31 It's easy to test:
32
33 1 ALIGNED .
34 4
35 7 ALIGNED .
36 8
37
38 ALIGN performs ALIGNED on HERE. Here's the definition.
39
40 : ALIGN HERE @ ALIGNED HERE ! ;
41
42 I have a sense for when this would be used while
43 compiling, but don't really feel like trying to make up
44 a test for it. I'm content to see how it's used later
45 and test *that*.
46
47 And now, the moment we've all been waiting for: strings!
48
49 First, we need a new primitive to store the bytes of the
50 string. The word ',' (COMMA) does this for word-sized
51 (four bytes, the size of a 32-bit address) data. As with
52 the other byte-sized words, the name is prepended with a
53 letter 'C'.
54
55 So let's compare 'C,':
56
57 : C,
58 HERE @ C! ( store the character in the compiled image )
59 1 HERE +! ( increment HERE pointer by 1 byte )
60 ;
61
62 ...with the assembly definition of ',':
63
64 mov edi, [var_HERE]
65 stosd ; puts the value in eax at edi, increments edi
66 mov [var_HERE], edi
67
68 Same thing, only stosd increments edi (HERE) by 4 rather
69 than 1.
70
71 I've learned just to see these primitives in action.
72 Trying to use _some_ them on their own is challenging
73 and surpisingly unrewarding.
74
75 Next, S" is a word that stores the characters which
76 follow as a string until it hits the end quote: ".
77
78 Note that there has to be a space after S" or it
79 wouldn't be matched as the correct word. However, the
80 final quote is NOT a word, it's just the special
81 character S" is looking for.
82
83 Finally, S" pushes the address of the start of the
84 string onto the stack, followed by the length of the
85 string.
86
87 Okay, I think I've got all of that:
88
89 S" Hello World"
90 .
91 11
92
93 There we go. "Hello World" is 11 characters long.
94
95 DUP
96 HEX
97 .
98 804EADC
99 @ EMIT
100 H
101
102 It has been stored in memory at a particular address,
103 which means we can retrieve the string from there.
104 (I've EMITted the 'H' from Hello.)
105
106 In compile mode, it's stored in the word being compiled,
107 along with the LITSTRING word we defined in assembly.
108
109 Bug where in memory is this stored when in immediate
110 mode? Jones explains that this implementation stores it
111 in the same place where we compile words. So I guess we
112 can do this:
113
114 HERE @ .
115 804EADC
116 HERE @ @ EMIT
117 H
118 HERE @ 1 + @ EMIT
119 e
120
121 And since HERE clearly hasn't been moved to a point
122 after the string, that means it is temporary. It will be
123 overwritten as soon as we define a new word!
124
125 A close relative to S" (in fact, it uses S" in compiling
126 mode), is the ." word, which is Forth's print string
127 word. I'm excited to have this:
128
129 ." Hello World!"
130 Hello World!
131
132 I've updated the README now that I can do this. :-)
133
134 And the good stuff keeps coming. The next words allow us
135 to define our own constants and variables. Not only
136 that, there's a handy explanation for each which would
137 have saved me some trouble before had I bothered to look
138 ahead (or, you know, actually learned the language I was
139 going to implement).
140
141 10 CONSTANT TEN
142 VARIABLE FOO
143
144 When TEN is executed, it leaves the integer 10 on the stack
145 When FOO is executed, it leaves the address of FOO on the stack
146
147 Let's try a constant:
148
149 42 CONSTANT answer
150 ." The answer is " answer .
151 The answer is 42
152
153 And a variable:
154
155 VARIABLE foo
156 9000 foo !
157 foo @ .
158 9000
159
160 The definition of CONSTANT is pretty easy. It's
161 basically the same as (and functionally equivalent to)
162 this:
163
164 : answer 42 ;
165 ." The answer is " answer .
166 The answer is 42
167
168 But variables needs to allocate some memory and store its
169 address. Two simple utilities aid in this:
170
171 ALLOT - advances HERE by the amount on the stack and
172 leaves the previous HERE on the stack
173
174 CELLS - multiplies the number on the stack by the
175 natural address size of the machine (4 bytes
176 for our 32-bit implementation)
177
178 What's neat about CELLS is it shows how you can build up
179 words that read rather like a natural language:
180
181 HEX
182 HERE @ .
183 804EB30
184 5 CELLS ALLOT
185 .
186 804EB30
187 HERE @ .
188 804EB44
189
190 And that looks like 20 bytes...hey, wait a second. Now
191 that I can easiy store values, I'll let Forth figure it
192 out:
193
194 HERE @ CONSTANT previous
195 5 CELLS ALLOT CONSTANT new
196 previous .
197 804EB78
198 new .
199 804EB98
200 new previous - .
201 20
202
203 VALUE is like VARIABLE, except that the result is a word
204 which leaves its value on the stack like a constant
205 instead of its address. Unlike a constant, it can be
206 updated by another word, TO.
207
208 10 VALUE foo create foo, set to 10
209 20 TO foo update foo to 20
210
211 Sounds good:
212
213 10 VALUE foo
214 foo .
215 10
216 20 TO foo
217 PARSE ERROR: 20 TO
218
219 Huh? Oh! Right between these two word definitions is
220 where I have it stop reading jonesforth.f on load.
221
222 Guess it's time to figure out the next bug in my port.
223
224 I'll try reading all lines again:
225
226 %assign __lines_of_jf_to_read 10000
227
228
229 PARSE ERROR: ( look it up in the dictionary )
230 >DFA
231 PARSE ERROR: ( look it up in the dictionary )
232 >DFA
233
234 Program received signal SIGSEGV, Segmentation fault.
235 _COMMA () at nasmjf.asm:688
236 688 stosd ; puts the value in eax at edi, increments edi
237 (gdb)
238
239 Okay, so the PARSE ERROR message prints out the word
240 that caused the trouble, here ">DFA", after a buffer's
241 worth of context.
242
243 Ha ha, how silly. I simply missed that word in my port.
244 It's a simple definition since we already have >CFA,
245 which returns the codeword for a word pointer. >DFA just
246 has to advance 4 bytes to the "data" (so-called threaded
247 word addresses) after the codeword.
248
249 Okay, that's defined. Now can we run all of
250 jonesforth.f?
251
252 Program received signal SIGSEGV, Segmentation fault.
253 _COMMA () at nasmjf.asm:697
254 697 stosd ; puts the value in eax at edi, increments edi
255 (gdb)
256
257 Drat! Nope, still segfaulting. And in COMMA (',') again.
258
259 I wonder if I've overflowed some memory limitation? How
260 to check that...hmmm... Well, comma stores where HERE
261 points, and that's in memory reserved with Linux's brk
262 syscall. How about I bump that from 0x16000 to 0x64000
263 bytes:
264
265 add eax, 0x64000 ; add our desired number of bytes to break addr
266
267 Nope, exact same error:
268
269 _COMMA () at nasmjf.asm:697
270
271 And same thing if I add another zero to the number. So
272 much for an easy answer. So I guess, ideally, I would
273 break when COMMA is trying to stosd at an address in edi
274 that is outside the reserved FORTH data area. But first
275 I need to know what that area is.
276
277 I'm going to add some custom FORTH variables to capture
278 this so it'll be easy to examine. I'll test in gdb
279 first:
280
281 Breakpoint 2, _start () at nasmjf.asm:103
282 103 xor ebx, ebx
283 104 mov eax, __NR_brk ; syscall brk
284 105 int 0x80
285 106 mov [var_HERE], eax ; eax has start addr of data segment
286 (gdb) p/x $eax
287 $4 = 0x804e000
288 107 mov [var_CSTART], eax ; store info: start address of data segment
289 108 add eax, 0x16000 ; add our desired number of bytes to break addr
290 (gdb) p/x (int)var_HERE
291 $5 = 0x804e000
292 (gdb) p/x (int)var_CSTART
293 $6 = 0x804e000
294
295 So far so good, CSTART contains the start address of the
296 data area.
297
298 109 mov ebx, eax ; reserve memory by setting this new break addr
299 (gdb) p/x $eax
300 $7 = 0x80b2000
301 110 mov [var_CEND], eax ; store info: end address of data segment
302 111 mov eax, __NR_brk ; syscall brk again
303 (gdb) p/x $eax
304 $8 = 0x80b2000
305 112 int 0x80
306 (gdb) p/x $eax
307 117 mov ecx, 0 ; LOADJF read only flag for open
308 (gdb) p/x $eax
309 $10 = 0x80b2000
310 (gdb) p/x (int)var_CEND
311 $11 = 0x80b2000
312
313 That looks right. CEND contains the end address of the
314 data segment. Did I get my requested 0x64000 bytes?
315
316 Let's use the new FORTH vars to find out:
317
318 (gdb) c
319 Continuing.
320 HEX CSTART @ . CEND @ . CEND @ CSTART @ - .
321 804E000 80B2000 64000
322
323 Looking good. Now to catch the bad address being used in
324 COMMA. I added two compares and an ".oops" label:
325
326 _COMMA:
327 mov edi, [var_HERE]
328 cmp edi, [var_CSTART]
329 jl .oops
330 cmp edi, [var_CEND]
331 jg .oops
332 stosd
333 mov [var_HERE], edi
334 ret
335 .oops:
336 nop
337
338 (gdb) break _COMMA.oops
339 Breakpoint 2 at 0x8049343: file nasmjf.asm, line 707.
340 (gdb) c
341 Continuing.
342
343 Breakpoint 2, _COMMA.oops () at nasmjf.asm:707
344 707 nop
345 (gdb) p var_HERE
346 'var_HERE' has unknown type; cast it to its declared type
347 (gdb) p (int)var_HERE
348 $1 = 61368
349 (gdb) p (int)var_CSTART
350 $2 = 134537216
351 (gdb) p (int)var_CEND
352 $3 = 134627328
353
354 Okay, so HERE has been set to an invalid address
355 somehow. I wish backtraces worked. Then I'd be able to
356 see which word this came from.
357
358 I'm tempted to divide and conquer...and it looks like
359 if I stop execution of jonesforth.f right before the
360 definition of SEE, it doesn't segfault.
361
362 So I'll continue testing 'til there and then tackle the
363 problem head-on.
364
365 Anyway, where was I? Oh yeah, VALUE!
366
367 To quote myself:
368
369 "VALUE is like VARIABLE, except that the result is a word
370 which leaves its value on the stack like a constant
371 instead of its address. Unlike a constant, it can be
372 updated by another word, TO."
373
374 10 VALUE foo create foo, set to 10
375 20 TO foo update foo to 20
376
377 Sounds good:
378
379 10 VALUE foo
380 foo .
381 10
382 20 TO foo
383 foo .
384 20
385
386 That's better. I'll continue with the word testing in
387 the next log.