colorful rat Ratfactor.com > Dave's Repos

nasmjf

A NASM assembler port of JONESFORTH
git clone http://ratfactor.com/repos/nasmjf/nasmjf.git

nasmjf/devlog/log01.txt

Download raw file: devlog/log01.txt

1 First log session to test what I've got so far. GNU Debugger recorded in 2 GNU screen for the full GNU experience. 3 I'll clean up a lot of the gdb prompts and stuff for clarity. 4 5 Reading symbols from nasmjf... 6 Breakpoint 1 at 0x804900e: file nasmjf.asm, line 80. 7 Breakpoint 1, _start () at nasmjf.asm:80 8 80 cld ; Clear the "direction flag" which means the string instructions (such 9 82 mov [var_S0], esp ; save the regular stack pointer (used for data) in FORTH var S0! 10 84 mov ebp, return_stack_top ; Initialise the return stack pointer 11 12 Trying a defined "function" in GDB to cut down on the typing. I always 13 have to cast the NASM labels to (int) since the debugging info has no 14 way of telling GDB what I'm storing there. "int" in this case just means 15 I've got a 4-byte (32 bit) value. GDB has a strong C heritage. 16 p - displays the VALUE of the label, which is an address 17 x - displays the memory at the address 18 p/x and x/x displays as hexadecimal 19 *(int) uses the address stored AT the memory referenced by the label 20 (again, strong C heritage in this syntax) 21 All three of these won't always be relevant, but it saves a lot of typing. 22 23 (gdb) define foo 24 Type commands for definition of "foo". 25 End with a line saying just "end". 26 >p/x (int)$arg0 27 >x/x (int)$arg0 28 >x/x *(int)$arg0 29 >end 30 31 Initial nonsense over. now we use the main mechanism that drives the Forth 32 instructions: the NEXT macro is inlined at the end of every word and here 33 to bootstrap the action. cold_start contains the address of the "QUIT" word. 34 (quit is a silly name - it doesn't quit Forth, it "quits" TO the interpreter) 35 (side note: i'd like everything to be lowercase except assembly macros. But 36 after 'quit' and 'docol', I haven't been good about converting them. Will 37 probably do a couple rounds of cleanup at some point...) 38 39 NEXT loads the address of the next instruction and we jump to it, executing 40 the machine code there. 41 42 _start () at nasmjf.asm:88 43 88 mov esi, cold_start ; give next forth word to execute 44 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 45 28 jmp [eax] ; Jump to whatever code we're now pointing at. 46 47 Since QUIT is defined with the DEFWORD macro, it begins with a call to 48 the 'DOCOL' word - which, in essense, sets up the rest of the Forth word 49 to be executed (QUIT, in this case) for another call to NEXT. 50 51 docol () at nasmjf.asm:40 52 40 lea ebp, [ebp-4] ; "load effective address" of next stack position 53 41 mov [ebp], %1 ; "push" the register value to the address at ebp 54 70 add eax, 4 ; eax points to docol (me!) in word definition. Go to next. 55 56 Here I use that 'foo' function to see if that's true about the eax register. 57 Note that the add 4 instruction has NOT yet executed. GDB always shows the 58 next instruction before you tell it to step forward to that instruction! 59 60 (gdb) foo $eax 61 $9 = 0x804a010 62 0x804a010: 0x08049000 63 0x8049000 <docol>: 0x89fc6d8d 64 65 Yup! It points to DOCOL all right. Now we step and add 4 to eax: 66 67 (gdb) s 68 71 mov esi, eax ; Put the next word pointer into esi 69 (gdb) foo $eax 70 $10 = 0x804a014 71 0x804a014: 0x0804a12c 72 0x804a12c: 0x08049218 73 74 Every single Forth word ends with NEXT, which executes the next word. 75 In this case, it's happening at the end of DOCOL (and DOCOL's job is 76 to get everything set up to have NEXT execute the rest of the word...) 77 78 (gdb) s 79 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 80 28 jmp [eax] ; Jump to whatever code we're now pointing at. 81 82 Double-checking that the instructions in QUIT are what we'll be running 83 now... 84 85 (gdb) foo $eax 86 $12 = 0x804a12c 87 0x804a12c: 0x08049218 88 0x8049218 <code_R0>: 0x04c30868 89 90 Yes! The 'R0' constant is the first thing we run in QUIT! It's really wild 91 how constants in Forth are actually words with a single instruction that 92 pushes a value onto the stack! In this case, R0 is the top of the return 93 stack. 94 95 The push %5 line is from the DEFCONST macro, which, in turn, calls the 96 DEFCODE macro because consts are words. Then the NEXT macro continues to 97 the next word in QUIT... 98 99 code_R0 () at nasmjf.asm:568 100 568 push %5 101 code_R0 () at nasmjf.asm:27 102 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 103 28 jmp [eax] ; Jump to whatever code we're now pointing at. 104 105 ...which happens to be RSPSTORE, which puts a value on the return stack. 106 107 code_RSPSTORE () at nasmjf.asm:201 108 201 pop ebp 109 code_RSPSTORE () at nasmjf.asm:27 110 27 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. 111 28 jmp [eax] ; Jump to whatever code we're now pointing at. 112 113 ...and then QUIT runs INTERPRET, which takes words on STDIN and then 114 ...calls _WORD to get a word from input which 115 ...calls _KEY to get a character ("key") of input 116 117 code_INTERPRET () at nasmjf.asm:209 118 209 call _WORD ; Returns %ecx = length, %edi = pointer to word. 119 _WORD.skip_non_words () at nasmjf.asm:310 120 310 call _KEY ; get next key, returned in %eax 121 _KEY () at nasmjf.asm:351 122 123 _KEY first checks to see if it needs input (currkey has reached 124 bufftop). On first run, they're both zero, so yeah, we need more 125 input. 126 127 Aside: again, "key" isn't how we would normally describe this in 128 a modern environment - it's the next "character" (and even that's 129 becoming a thing of the past now that Unicode is pretty much standard 130 everywhere...). 131 132 Anyway, comparing currkey (ebx = 0) and bufftop (0) sets the Zero 133 Flag (ZF) because the difference between them is the same. As we 134 can see in the 'info reg' display below: 135 136 351 mov ebx, [currkey] 137 352 cmp ebx, [bufftop] 138 353 jge .get_more_input 139 (gdb) info reg 140 ... 141 ebx 0x0 0 142 eflags 0x246 [ PF ZF IF ] 143 ... 144 145 We get more input by telling Linux to give us input from 146 STDIN into a fixed-size buffer: 147 148 149 _KEY.get_more_input () at nasmjf.asm:361 150 361 xor ebx,ebx ; 1st param: stdin 151 362 mov ecx,buffer ; 2nd param: buffer 152 363 mov [currkey],ecx 153 364 mov edx,buffer_size ; 3rd param: max length 154 365 mov eax,__NR_read ; syscall: read 155 366 int 0x80 ; syscall! 156 157 Now I type "foo<enter>": 158 159 foo 160 161 We check to make sure the input isn't zero-length. 162 I don't think it would ever be - the <enter> key would 163 always give us at least '\n'? 164 165 367 test eax,eax ; If %eax <= 0, then exit. 166 368 jbe .eof 167 369 add ecx,eax ; buffer+%eax = bufftop 168 370 mov [bufftop],ecx 169 170 We can see how long the input string is. Yup, 4 bytes is 171 right: "foo\n". 172 173 (gdb) foo $eax 174 $15 = 0x4 175 176 Now we're back to _KEY, having gathered some input. 177 We repeat the check... 178 179 371 jmp _KEY 180 _KEY () at nasmjf.asm:351 181 351 mov ebx, [currkey] 182 352 cmp ebx, [bufftop] 183 353 jge .get_more_input 184 185 This time we have input (and bufftop is at a higher 186 address than currkey), so we continue on by grabbing 187 the current "key" (character): 188 189 354 xor eax, eax 190 355 mov al, [ebx] ; get next key from input buffer 191 192 If that worked, the al register now has the first 193 character of "foo\n". Yup, there's the "f"! (p/c means 194 print as a character. We can also p/s to print a C-style 195 string.) 196 197 (gdb) p/c $al 198 $19 = 102 'f' 199 200 Now we set currkey to the next character and return... 201 202 356 inc ebx 203 357 mov [currkey], ebx ; increment currkey 204 358 ret 205 206 Back at _WORD, we check to see if we've hit a character 207 to skip. Forth is so syntactically simple, I just love it. 208 209 NOTE that the jbe instruction is "jump if compared value is 210 before (less than) or equal", so any character smaller 211 than an ASCII space (0x20) will cause us to keep seeking in the 212 .skip_non_words loop. This is a clever way to skip spaces, 213 tabs, newlines, returns, form feeds, etc. I'll improve the 214 comments for these instructions in the actual program now. 215 216 _WORD.skip_non_words () at nasmjf.asm:311 217 311 cmp al,'\' ; start of a comment? 218 312 je .skip_comment ; if so, skip the comment 219 313 cmp al,' ' ; space? 220 314 jbe .skip_non_words ; if so, keep looking 221 222 Nope, character looks good. So we add it to word_buffer 223 in memory. The stosb instruction implicitly copies what's 224 in the al register (the 'b' is for byte) to memory at 225 the address stored in the edi register. 226 227 Then edi is incremented so that the next time this happens, 228 the next byte will go to the next position, and so forth. 229 It turns out, this is the sort of thing we're guaranteeing 230 when we cleared the direction flag at the very beginning. 231 232 317 mov edi,word_buffer ; put addr to word return buffer in edi 233 234 Now that we've established that we're past any whitespace 235 and are gathering the actual input, we're in .collect_word. 236 I'll snip the stepping through _KEY for 'o', 'o', and '\n' 237 238 _WORD.collect_word () at nasmjf.asm:319 239 319 stosb ; add character to return buffer 240 320 call _KEY ; get next key, returned in %al 241 242 After every call to _KEY, we check to see if we're done 243 collecting the word. The ja instruction is "jump if the 244 compared value is after (greater than)," which is the 245 exact opposite of the jbe check above. 246 To put it straight: before we were looping WHILE the 247 character was whitespace, now we loop UNTIL the character 248 is whitespace. 249 250 321 cmp al,' ' ; is blank? 251 322 ja .collect_word ; if not, keep looping 252 253 Now _WORD returns the length and address of the collected word. 254 255 325 sub edi, word_buffer ; hmm, the len? 256 326 mov ecx, edi ; return it 257 327 mov edi, word_buffer ; return address of the word 258 328 ret 259 260 Then we return to _INTERPRET from _WORD: 261 262 code_INTERPRET () at nasmjf.asm:212 263 212 xor eax,eax ; back from _WORD...zero eax 264 ... 265 266 Let's check the return values now: 267 268 (gdb) p $ecx 269 $1 = 3 270 (gdb) x/3c $edi 271 0x804a068 <word_buffer>: 102 'f' 111 'o' 111 'o' 272 273 Yay! There's the "foo" string that was input. 274 Even though I've got some of the _FIND word that tries to 275 match the input word, I think this has been quite enough 276 for one log. :-)