colorful rat Ratfactor.com > Dave's Repos

meow5

A stack-based pure inlining concatenative programming language written in NASM assembly
git clone http://ratfactor.com/repos/meow5/meow5.git

meow5/log10.txt

Download raw file: log10.txt

1 Hello! So it seems to me that there are two major paths 2 to decide between for the thing to add next: 3 4 * Control structures (if/else, loops) 5 * Write compiled programs to ELF executables 6 7 Both will be challenging. I'm leaning towards ELF at the 8 moment. We'll see what I think when I come back tomorrow 9 night. 10 11 Two nights later: Yup, gonna try to write an ELF 12 executable. This is gonna be cool! 13 14 First, I need to test writing a file. Then write the ELF 15 header, then the contents of a word. 16 17 I'll start by having 'make_elf' take a token (will use 18 as an output filename for writing the executable and, 19 later, the word get the machine code from). 20 21 Then I'll write the string 'ELF' to that file. (Which is 22 very appropriate because Bytes 2-5 of a _real_ ELF 23 header are that string.) 24 25 Next night: So I've got my test 'make_elf' and it is 26 supposed to be writing to whatever filename you want: 27 28 make_elf foo 29 30 That should write the string 'ELF' to a file called 31 'foo', but it's not. So I've inserted a DEBUG to see 32 what the fd returned from 'open' is: 33 34 $ mr 35 make_elf foo 36 new fd: fffffffe 37 Goodbye. 38 Exit status: 0 39 40 Yeah, that's definitely an error. 41 42 While looking for how to decode that error (the open(2) 43 man page explains the errors, but they're all C mnemonic 44 constants, of course), I came across this excellent 45 suggestion on SO: https://stackoverflow.com/a/68155464 46 47 Which was to use strace to decode the error for me! 48 49 $ strace ./meow5 50 execve("./meow5", ["./meow5"], 0x7fff2d4ec190 /* 60 vars */) = 0 51 [ Process PID=2579 runs in 32 bit mode. ] 52 read(0, make_elf foo 53 "make_elf foo\n", 1024) = 13 54 open("foo", O_WRONLY|0xc) = -1 ENOENT (No such file or directory) 55 write(1, "new fd: ", 8new fd: ) = 8 56 write(1, "fffffffe\n", 9fffffffe 57 ) = 9 58 write(-2, "ELF", 3) = -1 EBADF (Bad file descriptor) 59 read(0, "", 1024) = 0 60 write(1, "Goodbye.\n", 9Goodbye. 61 ) = 9 62 exit(0) = ? 63 +++ exited with 0 +++ 64 65 Huh, so something's wrong with my attempt to open the 66 output file with write-only, create, and truncate flags. 67 68 Here's what I'm sending: 69 70 ; From open(2) man page: 71 ; A call to creat() is equivalent to calling open() 72 ; with flags equal to O_CREAT|O_WRONLY|O_TRUNC. 73 ; I got the flags by searching all of /usr/include and 74 ; finding /usr/include/asm-generic/fcntl.h 75 ; That yielded (along with bizarre comment "not fcntl"): 76 ; #define O_CREAT 00000100 77 ; #define O_WRONLY 00000001 78 ; #define O_TRUNC 00001000 79 ; Hence this flag value for 'open': 80 mov ecx, 1101b 81 82 But from the strace above, it looks like it sees 83 O_WRONLY and...0xC - which is, indeed 1100... 84 85 Sounds like I've got a mystery for tomorrow night. 86 87 Two nights later: I bet somebody out there is 88 screaming. Ha ha. Those numbers are in octal, not binary 89 (despite looking for all the world like bit flags). 90 91 So I fixed that one night. Then I had to learn how to 92 set the mode (permissions), which was, like, freakishly 93 hard to find online. All the 'open' examples I found 94 were opening existing files. But since CREAT is an 95 option, obviously there was a way to do it... 96 97 The search "32 x86 assembly linux syscall table" is the 98 blessed way to ask the major search engines. 99 100 The answer is: the mode bits (in the usual unix octal 101 owner/group/all format) go in register edx. So: 102 103 ; ebx contains null-terminated word name (see above) 104 mov ecx, (0100o | 0001o | 1000o) ; open flags 105 mov edx, 666o ; mode (permissions) 106 mov eax, SYS_OPEN 107 int 80h ; now eax will contain the new file desc. 108 109 And when I went to test it, I was sleepy and forgot that 110 since I was running the binary from strace, it wasn't 111 gonna re-build from source like my shell aliases 'mr', 112 'mb', 'mt' do, so I couldn't figure out why it wasn't 113 working... 114 115 ...until I woke up in the middle of the night with the 116 realization. 117 118 Anyway, next morning, here goes: 119 120 $ strace ./meow5 121 execve("./meow5", ["./meow5"], 0x7fff56d5ec40 /* 60 vars */) = 0 122 [ Process PID=1377 runs in 32 bit mode. ] 123 read(0, make_elf foo 124 "make_elf foo\n", 1024) = 13 125 open("foo", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 126 write(1, "new fd: ", 8new fd: ) = 8 127 write(1, "00000003\n", 900000003 128 ) = 9 129 write(3, "ELF", 3) = 3 130 read(0, "", 1024) = 0 131 write(1, "Goodbye.\n", 9Goodbye. 132 ) = 9 133 exit(0) = ? 134 +++ exited with 0 +++ 135 136 Awesome, we can see the flags being correctly decoded 137 and the mode/permission param: 138 139 open("foo", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 140 141 So I've learned that strace rules for this sort of thing! 142 143 But did it work? 144 145 $ cat foo 146 ELF 147 148 Yahoo! Ha ha, I have written a string to a new file. 149 Jeez, that was way harder than I expected. 150 151 But now I can actually try writing an ELF header. I'm 152 excited. 153 154 ------------------------------------------------------- 155 156 11 nights later: It's the holiday season, which is a lot 157 of exhausting activity (if you're a parent) under the 158 best of circumstances and this was an unusually hard one 159 for the family. So what I could easily have done in a 160 single night ended up stretching out for many nights. 161 But I finally finished the header portion in the .data 162 section and am writing it with the 'make_elf' word (I am 163 *not* writing the word yet). 164 165 Let's see what it does so far: 166 167 $ mr 168 make_elf exit 169 new fd: 00000003 170 Goodbye. 171 Exit status: 0 172 173 The "new fd" message is a DEBUG statement I apparently 174 left in there to make sure I was opening the file 175 correctly. 176 177 If I've done everything correctly, this will have 178 written a file named "exit" with a more-or-less correct 179 ELF header. 180 181 Let's see what 'file' thinks of it: 182 183 $ file exit 184 exit: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), can't read elf program headers at 184, no section header 185 186 Not bad! The program headers error might be due to a bug 187 in my headers or just the fact that I'm not writing the 188 program to the file yet. 189 190 Let's see what 'readelf' says: 191 192 $ readelf -a exit 193 ELF Header: 194 Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 195 Class: ELF32 196 Data: 2's complement, little endian 197 Version: 1 (current) 198 OS/ABI: UNIX - System V 199 ABI Version: 0 200 Type: EXEC (Executable file) 201 Machine: Intel 80386 202 Version: 0x1 203 Entry point address: 0x8048000 204 Start of program headers: 184 (bytes into file) 205 Start of section headers: 0 (bytes into file) 206 Flags: 0x0 207 Size of this header: 52 (bytes) 208 Size of program headers: 32 (bytes) 209 Number of program headers: 1 210 Size of section headers: 0 (bytes) 211 Number of section headers: 0 212 Section header string table index: 0 213 214 ... 215 216 readelf: exit: Error: Reading 32 bytes extends past end of 217 file for program headers 218 219 Yeah, so it looks like my program header offset might be 220 wrong. But otherwise, the decoding looks correct! 221 222 Next night: Okay, I don't see anything wrong with my 223 header data (program header offset), so I'm gonna try 224 just writing out a program (word) and see what 225 happens. 226 227 I'm overwriting the program size portion of the program 228 header in data and then writing the header, *then* 229 writing the actual program after that. Every time I call 230 'make_elf' my elf_header data will contain the last 231 word's size that was written. 232 233 Anyway, here goes: 234 235 $ mr 236 make_elf exit 237 prog bytes: 00000008 238 new fd: 00000003 239 Goodbye. 240 241 My 'exit' word is 8 bytes, that sounds right. 242 243 What does file say? 244 245 $ file exit 246 exit: ELF 32-bit LSB executable, Intel 80386, version 1 247 (SYSV), statically linked, no section header 248 249 Ooh! No more errors there! 250 251 And readelf? 252 253 $ readelf exit 254 ELF Header: 255 Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 256 Class: ELF32 257 Data: 2's complement, little endian 258 Version: 1 (current) 259 OS/ABI: UNIX - System V 260 ABI Version: 0 261 Type: EXEC (Executable file) 262 Machine: Intel 80386 263 Version: 0x1 264 Entry point address: 0x8048000 265 Start of program headers: 52 (bytes into file) 266 Start of section headers: 0 (bytes into file) 267 Flags: 0x0 268 Size of this header: 52 (bytes) 269 Size of program headers: 32 (bytes) 270 Number of program headers: 1 271 Size of section headers: 0 (bytes) 272 Number of section headers: 0 273 Section header string table index: 0 274 275 ... 276 277 Program Headers: 278 Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align 279 LOAD 0x000000 0x08048100 0x00000000 0x00008 0x00008 R E 0 280 281 ... 282 283 Cool! That looks good. My program takes up 8 bytes in 284 memory TOTAL. It doesn't allocate ANY memory for a stack 285 or data or anything, which is correct. 286 287 Next morning (I fell asleep): Now for the moment of 288 truth, does the program properly exit? 289 290 $ ./exit 291 bash: ./exit: Permission denied 292 293 LOL. Yeah, it literally doesn't have execute permission: 294 295 -rw-r--r-- 1 dave users 92 Dec 30 09:04 exit 296 297 Weird. That's not the permissions I thought I was 298 setting via the edx register for the sys 'write' call: 299 300 mov edx, 555o ; mode (permissions) 301 302 Well, I'll figure that out in a bit. Right now I just 303 wanna see if I can run this thing. 304 305 $ chmod +x exit 306 $ ./exit 307 Segmentation fault 308 309 Oops, nope. Let's see what GDB says about this 310 program. 311 312 Looks like I have to break via explicit address since 313 there's no debugging symbols... 314 315 $ gdb exit 316 ... 317 (gdb) break *0x08048100 318 Breakpoint 1 at 0x8048100 319 (gdb) run 320 ... 321 Segmentation fault. 322 323 Argh. Shouldn't it have halted at the first instruction? 324 Hmm... 325 326 I'm thinking maybe my program section doesn't have 327 execution permissions or something, in which case it 328 might die before it can even look at the first 329 instruction? 330 331 Anyway, now I know what I'm gonna start looking at next 332 time. 333 334 Next night: no, the flags (pretty sure they're R=read, 335 E=execute) look right for a text/executable segment. And 336 at any rate, as near as I can tell (and meow5 wouldn't 337 work the way it does if it weren't true), Linux ignores 338 the flags anyway! 339 340 Instead, I had mis-typed the entry point address in the 341 main header vs the program header. Now I've made them 342 the same: 343 344 $ readelf -a exit 345 ... 346 Entry point address: 0x8048100 347 ... 348 Program Headers: 349 Type Offset VirtAddr ... 350 LOAD 0x000000 0x08048100 ... 351 352 Kinda weird that there's a leading 0 on one, but not the 353 other, right? But I don't see any harm per se. Also, the 354 meow5 executable shows the same thing (though it 355 executes starting in the second segment and I don't 356 claim to entirely understand the program segment 357 addressing yet, so I may well be missing something 358 important. I need to read that chapter of the ELF 359 document properly...) 360 361 Anyway, does it work now? 362 363 $ ./exit 364 Segmentation fault 365 366 Bah. 367 368 Okay, let's see if I can figure out some stuff with GDB. 369 370 (gdb) file exit 371 Reading symbols from exit... 372 (No debugging symbols found in exit) 373 (gdb) info file 374 Symbols from "/home/dave/meow5/exit". 375 376 Hmmm. I thought 'info file' would at least show the 377 entry point, but no luck there. 378 379 (gdb) break *0x08048100 380 Breakpoint 1 at 0x8048100 381 (gdb) run 382 Starting program: /home/dave/meow5/exit 383 During startup program terminated with signal SIGSEGV, 384 Segmentation fault. 385 386 Another mystery. Well, my meow5 executable starts each 387 LOAD segment at even 1000 byte marks - which I guess has 388 something to do with page sizes? (Again, I need to read 389 that ELF document chapter, and I will, but I just wanna 390 see this working!) 391 392 So I updated my addresses to 0x08048000 at an even 1000 393 (in hex). I double-checked them with 'readelf -hl exit', 394 which I'll spare you from here. 395 396 But running it: 397 398 $ ./exit 399 Segmentation fault 400 401 Argh. 402 403 I'll take a look with GDB: 404 405 (gdb) file exit 406 Reading symbols from exit... 407 (No debugging symbols found in exit) 408 (gdb) r 409 Starting program: /home/dave/meow5/exit 410 411 Program received signal SIGSEGV, Segmentation fault. 412 0x08048047 in ?? () 413 414 Wait a second! That *is* progress. Now it's showing me 415 the address of the crash. I wasn't getting that before. 416 And it looks like it's crashing 47 bytes into memory 417 (which is way larger than my exit code). So it could be 418 that my program just isn't executing correctly... 419 420 So I'll set a breakpoint at the entry point (with GBD's 421 '*' address syntax) and see if I can figure out how to 422 view what's running. 423 424 (gdb) break *0x08048000 425 Breakpoint 1 at 0x8048000 426 (gdb) r 427 The program being debugged has been started already. 428 Start it from the beginning? (y or n) y 429 Starting program: /home/dave/meow5/exit 430 431 Breakpoint 1, 0x08048000 in ?? () 432 433 Cool! I've finally paused the darn thing. 434 435 (gdb) disass *0x08048000 436 No function contains specified address. 437 438 I guess without symbols, 'disassemble' won't cooperate? 439 Can I at least step? 440 441 (gdb) s 442 Cannot find bounds of current function 443 444 Oh, right. I know this one. There's a separate 'stepi' 445 to step through the program at the instruction level 446 since there are no 'lines' to step through! 447 448 (gdb) stepi 449 0x08048047 in ?? () 450 451 Huh? Why am I now at that '...8047' address? 452 453 Turns out there's an 'i' format that will display 454 whatever memory you want as an instruction. So, after 455 the fact, here's that first instruction we just ran: 456 457 (gdb) x/i 0x08048000 458 0x8048000: jg 0x8048047 459 460 Ha ha, well, that certainly explains what's happening. 461 But how did that get there? Here's the bytes of that 462 machine code: 463 464 (gdb) x/x 0x8048000 465 0x8048000: 0x464c457f 466 467 Since it's so tiny, I'm just gonna hex dump exit 468 entirely to see where that is: 469 470 00000000: 7f45 4c46 0101 0100 0000 0000 0000 0000 .ELF............ 471 00000010: 0200 0300 0100 0000 0080 0408 3400 0000 ............4... 472 00000020: 0000 0000 0000 0000 3400 2000 0100 0000 ........4. ..... 473 00000030: 0000 0000 0100 0000 0000 0000 0080 0408 ................ 474 00000040: 0000 0000 0800 0000 0800 0000 0500 0000 ................ 475 00000050: 0000 0000 5bb8 0100 0000 cd80 ....[....... 476 477 Ha ha, I see it right away (though little-endian always 478 makes it harder because the bytes are reversed). 479 480 The memory we're trying to execute is the 'ELF' magic 481 string from the header! 482 483 Okay, apparently I really need to read that chapter 484 about program segments and how they're loaded into 485 memory now. 486 487 But I gotta say, I really don't regret getting this 488 wrong to begin with. Now I have a concrete example of 489 what's happening and the information in that chapter is 490 going to make *so* much more sense to me. Sometimes 491 getting it right the first time "by the book" doesn't 492 teach me nearly as much as getting it wrong on my own 493 and *then* learning how to do it properly. It just 494 sticks better. 495 496 Some number of nights later: First of all, the file 497 creation permissions here _were_ working. I've also 498 updated them to 755: 499 500 mov edx, 755o ; mode (permissions) 501 502 Which shows up correctly: 503 504 $ ls -l exit 505 -rwxr-xr-x 1 dave users 92 Jan 3 22:01 exit 506 507 And as for my executable trying to run the ELF header 508 itself...ha ha, well, I did read Part 2: "Program 509 Loading and Dynamic Linking" of the System V ELF spec 510 and the answer was so simple, it was downright silly. 511 512 When you specify that the ELF executable wants to load 513 the file segment into (one of) the program's virtual 514 memory segments (which is what my single "LOAD" type 515 program header is requesting), it will load the ELF 516 header itself, followed by whatever data (or machine 517 code, in this case) follows the header. 518 519 So you always need to account for the ELF header when 520 determining the execution entry point address. 521 522 In other words, where I was pointing to the very first 523 byte of my requested virtual address: 524 525 dd 0x08048000 ; entry - Execution start address 526 527 I needed to offset it by the elf header size: 528 529 dd elf_va + elf_size ; entry - execution start address 530 531 Oh, right, and I also made a NASM macro to contain that 532 address so I wouldn't have the bare value in multiple 533 places: 534 535 %assign elf_va 0x08048000 ; elf virt mem start address 536 537 Okay, crossing my fingers and toes... 538 539 $ mr 540 make_elf exit 541 prog bytes: 00000008 542 new fd: 00000003 543 Goodbye. 544 Exit status: 0 545 $ ./exit 546 $ 547 548 Gasp! It worked! My executable exited cleanly! That can 549 only happen if the exit syscall was called correctly. 550 551 But a *real* test would be to call the exit syscall with 552 a unique value so we can *see* it doing something. 553 554 Do I dare hope? I'm going to try making a new word with 555 a constant value and "calling" the 'exit' word and see 556 if I can write that out as a new ELF executable: 557 558 $ mr 559 : foo 42 exit ; 560 make_elf foo 561 prog bytes: 0000000d 562 new fd: 00000003 563 Goodbye. 564 Exit status: 0 565 566 Indeed, that wrote a 97 byte ELF file containing 0xD 567 (13) bytes of machine code: 568 569 $ ls -l foo 570 -rwxr-xr-x 1 dave users 97 Jan 3 22:25 foo 571 572 But does it work?! 573 574 Drum roll... 575 576 $ ./foo 577 $ echo $? 578 42 579 $ 580 581 Ha ha! No way! 582 583 It totally works. 584 585 Initial ELF creation is a success! 586 587 I think I'll figure out how to handle memory in my ELF 588 output next. It would be amazing to be able to write a 589 stand-alone executable that prints "Meow. Meow. Meow..." 590 591 See you in the next log!