1 So the error in TCFA (which gets the code address for
2 a dictionary entry) has been fixed.
3
4 Now let's see why the COLON definition is crashing with
5 a segfault.
6
7 I type my usual test word ": FIVE 5 ;".
8
9
10 80 cld ; Clear the "direction flag" which means the string
11 (gdb) c
12 Continuing.
13 : FIVE 5 ;
14
15 (We'll skip everything that we now know works from log06.txt.)
16
17 First, INTERPRET checks STATE to see if we're executing
18 or compiling (we're executing).
19
20 code_INTERPRET.check_state () at nasmjf.asm:239
21 239 mov edx,[var_STATE]
22 240 test edx,edx
23 241 jz .execute ; Jump if executing.
24
25 Then we check to see if we are executing a literal value
26 or a word (it's a word ":" (COLON)).
27
28 code_INTERPRET.execute () at nasmjf.asm:254
29 254 mov ecx,[interpret_is_lit] ; Literal?
30 255 test ecx,ecx ; Literal?
31 256 jnz .do_literal
32
33 Now we jump to the code pointed to at the beginning
34 of COLON, which is DOCOL. (Increasingly, as we define
35 more words with other words rather than pure machine
36 language, they'll start with DOCOL.)
37
38 Here I double-check that we're about t jump to the
39 first pointe in COLON and that it points to DOCOL.
40
41 260 jmp [eax]
42 (gdb) info symbol $eax
43 COLON in section .data of /home/dave/nasmjf/nasmjf
44 (gdb) info symbol *$eax
45 DOCOL in section .text of /home/dave/nasmjf/nasmjf
46
47
48 Here's DOCOL. I may have stepped through this before
49 but it's worth looking at again since it's crucial
50 to understanding this type of Forth implementation.
51
52 Given the glacial pace at which I'm porting this, I need
53 lots of reminders!
54
55 This cheatsheet is currently in a comment at the top
56 of my jonesforth.asm:
57
58 esi - next forth word address to execute
59 ebp - return stack for forth word addresses
60
61 In the source, PUSHRSP and POPRSP usually handle the
62 ebp register, which we're using for the return stack
63 pointer (RSP).
64
65 DOCOL's first two lines are both from the PUSHRSP macro
66 (you can see that they have lower line numbers).
67
68 That handles ebp.
69
70 Then DOCOL advances esi to the next word pointer
71 (in COLON after DOCOL itself).
72
73 DOCOL () at nasmjf.asm:40
74 40 lea ebp, [ebp-4] ; "load effective address" of next stack position
75 41 mov [ebp], %1 ; "push" the register value to the address at ebp
76 70 add eax, 4 ; eax points to DOCOL (me!) in word definition. Go to next.
77 71 mov esi, eax ; Put the next word pointer into esi
78
79 Let's see if that's right. Here's the entire definition
80 of COLON. We don't see DOCOL here because it's inserted
81 by the DEFWORD macro, but it comes right before FWORD.
82 (By the way, FWORD is just WORD, but I can't have a
83 symbol called "WORD" in NASM because it's a reserved
84 keyword.)
85
86 DEFWORD ":",1,,COLON
87 dd FWORD
88 dd CREATE
89 dd LIT, DOCOL, COMMA
90 dd LATEST, FETCH, HIDDEN
91 dd RBRAC
92 dd EXIT
93
94 I think it's super-cool that Forth exposes all of the
95 primitives needed to create (or replace!) the COLON
96 compiler so can you can use them in the interpreter
97 yourself. Truly a no-holds-barred language.
98
99 At any rate, the pointer in esi should be the next one
100 in COLON and it should point to WORD (well, FWORD).
101
102 (gdb) info symbol $esi
103 COLON + 4 in section .data of /home/dave/nasmjf/nasmjf
104 (gdb) info symbol *$esi
105 FWORD in section .data of /home/dave/nasmjf/nasmjf
106
107 Great! And then the NEXT macro puts the address pointed
108 to by esi into eax, increments esi to next word pointer,
109 and jumps to the address *pointed to* by the address now
110 in eax. HAVE YOU GOT THAT???
111
112 This is made even more confusing by the lodsd instruction.
113 The mnemonic stands for "load string doubleword". The idea
114 is that you can use it to load a "string" of values
115 by repeatedly calling lodsd (or loadsb for byte, etc.).
116 What it actually does is load 4 bytes from the address at
117 esi into eax and then increments esi by 4.
118
119 (By the way, I've come to _loathe_ the terms "word", "double"
120 "long", etc. I'm okay with "byte" because it's come to
121 mean "8 bits" pretty universally in the year 2022. If
122 I were king, we would just use the byte count for these sizes
123 like:
124 b = 1 byte = 8 bits
125 b2 = 2 bytes = 16 bits
126 b4 = 4 bytes = 32 bits
127 b8 = 8 bytes = 64 bits
128 and "lodsd" would become "lodsb4". Well, lods* would probably
129 have a better mnemonic. But you get the idea. Anyway, harping
130 on x86 is, like, a full-time job and it ain't gonna get this
131 Forth port done.)
132
133
134
135 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
136
137 And did it work?
138
139 28 jmp [eax] ; Jump to whatever code we're now pointing at.
140 (gdb) info symbol *$eax
141 code_FWORD in section .text of /home/dave/nasmjf/nasmjf
142
143 Yup! It's jumping to WORD.
144
145 (As another aside, it occurs to me that "WORD" is
146 a really confusing name for this Forth word - it just
147 tokenizes a space-delimited string from input. Otherwise,
148 it doesn't have anything to do with Forth's concept
149 of "words" as executable code stored in a "dictionary".)
150
151 code_FWORD () at nasmjf.asm:302
152 302 call _WORD
153
154 Cool, so now I'll skip stepping through WORD/KEY as
155 we gather the string "FIVE" (the name of the word I'm
156 trying to define) from input.
157
158 _WORD.skip_non_words () at nasmjf.asm:309
159 309 call _KEY ; get next key, returned in %eax
160 ...
161 325 mov ecx, edi ; return it
162
163 I have to admit, I don't understand why I can't access
164 the memory at word_buffer.
165
166 (gdb) x/s (int)word_buffer
167 0x45564946: <error: Cannot access memory at address 0x45564946>
168
169 Wait a dang second, 45 56 49 46 isn't an address, it's
170 the actual characters 'E' 'V' 'I' 'F' (or "FIVE" stored
171 little-endian)!
172
173 Grrrr... another gripe - the way GDB treats symbols
174 just confuses me. I like how NASM does it: foo is
175 always an address, [foo] is always the value AT that
176 address. It's very consistent.
177
178 Next night: gosh darn it! I remembered. You gotta put
179 a '&' in front of "variables" to get the address...and
180 that includes when you're trying to use the 'examine'
181 ('x') command to format and view memory using the variable
182 name.
183
184 (gdb) p &word_buffer
185 $1 = (<data variable, no debug info> *) 0x804a068 <word_buffer>
186 (gdb) x/4c &word_buffer
187 0x804a068 <word_buffer>: 70 'F' 73 'I' 86 'V' 69 'E'
188
189 At any rate, looks good. WORD returns "FIVE".
190
191 code_FWORD () at nasmjf.asm:303
192 303 push edi ; push base address
193 304 push ecx ; push length
194
195 And with any luck, now we'll be headed to the next word in
196 the COLON definition, CREATE.
197
198 code_FWORD () at nasmjf.asm:27
199 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word.
200 28 jmp [eax] ; Jump to whatever code we're now pointing at.
201 (gdb) info symbol $eax
202 CREATE in section .data of /home/dave/nasmjf/nasmjf
203 (gdb) info symbol *$eax
204 code_CREATE in section .text of /home/dave/nasmjf/nasmjf
205
206 Yay!
207
208 Now CREATE makes the header (dictionary link, name, flags)
209 portion of the word we're compiling.
210
211 In the next log, we'll see if CREATE works and then try to
212 track down which word is causing a segfault when COLON runs.