Hello! This page has my original slides for a spoken talk.
What you probably want instead is the written page:
Forth: The programming language that writes itself: The Web Page.
The page you're viewing has 3,000 words.
That page has 33,000 words. :-)
Charles H. Moore and the pursuit of simplicity.
A talk by Dave Gauer, 2023.
Made with minslides, which I made for this talk.
Use the J and K keys to go forward and backward.
You can also just scroll down like a regular web page.
Toggle notes/annotations with the N key.
Hello! These are my talk notes. The slides start out with fairly complete text and stand on their own, but they get more terse and cryptic later. I also use these notes to store the reference links and additional quotes for individual slides.
I might also flesh the notes out to include a sort-of transcript of the spoken part of the talk. Stay tuned.
Something that I will have in the actual talk: Raise your hand (and keep it raised) if you've:
When I was a wee programmer, I would sit around the virtual Usenet campfires listening
to the tall tales and legends of the elders.
I learned about magical languages with lots of (((((parenthesis))))).
I listened, wide-eyed, to true tech tales like The Story of Mel.
And I heard tell of a programming language so flexible that you could change the values of integers.
They said that language was called Forth and it was created by a mad wizard called Chuck Moore who could write any program in a couple screens of code.
Years went by and I wrote a lot of PHP. I lost friends to the Internet Explorer 6 wars.
But I never forgot about the legend of Forth.
The blog series programming in the twenty-first century by game developer James Hague gave me the final push.
Forth was a recurring theme and it just sounded so darned interesting.
So I went on an adventure and I came back and I think I have some answers.
(Oh, and I confirmed the legend. I can make any integer equal anything I want. Stick around 'til the end to see that Forth magic trick.)
I thought this was what Forth was all about:
3 4 + 7
$ bc bc 1.07.1 Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY. For details type `warranty'. (3 * 4) + (5 * 6) 42
$ dc 3 4 * 5 6 * + p 42
3 4 * 5 6 * + . 42
That's true, but then I learned some more...
Op The Stack -- --------- 3 3 4 3 4 * 12 5 12 5 6 12 5 6 * 12 30 + 42 .
Using "the stack" in i386 assembly:
mov eax, 3 mov ebx, 4 mul eax, ebx push eax mov eax, 5 mov ebx, 6 mul eax, ebx pop ebx add eax, ebx eax=42
Forth:
CAKE DUP HAVE EAT
Lack of explicit names for intermediate values.
If I ask you to add these numbers:
2 6 1 3 7
Do you feel a need to give a name to each sum pair...or even the running total?
That's true, but then I learned some more...
Ah, this must be it because it sounds fancy.
Contrast with applicative language:
eat(bake(prove(mix(ingredients))))
Concatenative language:
ingredients mix prove bake eat
The canonical example of a concatenative language is Joy
Manfred von Thun inspired by Backus's 1977 ACM Turing Award lecture:
Joy is kind of like starting with a Lisp
Minus function parameters
Minus variables
Minus traditional control structures
All functions are unary (an arity of 1)
Stack in, stack out
A program is a list of data and functions
Joy's superpower: combinators
Higher-order functions like map:
inc = function(n){ return n + 1; }; bigger = list.map(inc);Concatenatively:
list [inc] map
To Mock a Mockingbird
Identity
(I x) = x
K and S
(K x y) = x (S x y z) = (x z (y z))
Combinators: A Centennial View
A function using a combinator to reverse lists in Joy:
reverselist == [] swap infra
infra takes a list as one param and a program as a second and uses the list as temp stack and executes the program.
The swap here is just to get the param order correct for infra, it has nothing to do with reversing the input list.
Combinators let us factor out explicit loops.
foo.map(bar)equals
temp = []; for(i=0; i<foo.length; i++){ temp[i] = bar(foo[i]); }
And in Joy, combinators can factor out all sorts of logic
Even different flavors of recursion made feasable by the uniformly unary functions.
Here's a factorial definition:
fac == [null] [succ] [dup pred] [*] linrec
Forth has higher order functions too with word "execution tokens" (function pointers).
EXECUTE
You can very compactly define words such as MAP, FOLD, and REDUCE.
Forth is definitely concatenative.
However,
2 3 +
Forth: "Push 2
, then 3 on the stack; pop both and add them;
push the result, 5, on the stack."
Joy: "The composition of the functions 2, 3, and + is identical to the function 5."
All these aspects of Forth:
Are true.
They are all perfectly valid and complementary ways of describing the language Forth:
CAKE EAT
Yes, postfix notation was definitely in the air (Zuse's Z3 1938).
And yes, stacks were known in the time of Forth's origins, but generally limited (2-4 items in registers).
But Joy (and the term "concatenative programming") come from the 1980s.
We need to go back to the 1950s.
Fortran on punchards
"Compiling took 30 minutes...you got one shot per day"
-- Chuck Moore, Forth, the Early years
SAO and MIT are really close to each other. So the story isn't as confusing as it first seems in Forth - The Early Years. Is Chuck at SAO or MIT or Stanford? The answer, geographically, is yes.
As for the computer: https://en.wikipedia.org/wiki/IBM_704
"In its day, the 704 was an exceptionally reliable machine. Being a vacuum-tube machine, however, the IBM 704 had very poor reliability by today's standards. On average, the machine failed around every 8 hours, which limited the program size that the first Fortran compilers could successfully translate because the machine would fail before a successful compilation of a large program."
Made an interactive interpreter.
On a computer with nothing we would recognize as a terminal.
Control an astronomical calculation program using
statement numbers.
Specifically: "Moore's programming career began in the late 1950s at the Smithsonian Astrophysical Observatory with programs to compute ephemerides, orbital elements, satellite station positions, etc. His source code filled two card trays. To minimize recompiling this large program, he developed a simple interpreter to read cards controlling the program. This enabled him to compose different equations for several satellites without recompiling..."
Free-form input was more efficient (smaller and faster) and more reliable than Fortran column-formatted, which resulted in re-runs due to mis-aligned columns.
Arguably, this really is the origins of the thing that will be named Forth.
And the statement numbers would be equivalent to the words:
WORD NUMBER INTERPRET ABORT
Free-form input unusual at the time, but super handy alternative to recompiling every time you want to change the numbers.
CURVE written in Stanford's own Algol implementation.
A much more sophisticated interpreter.
Now has a data stack and stack manipulation operators:
+ - * IF ELSE DUP DROP SWAP
"With the TTY came paper-tape and some of the most un-friendly software imaginable - hours of editing and punching and loading and assembling and printing and loading and testing and repeating."
-- Chuck Moore, Forth, the Early years
"Punched tape was used as a way of storing messages for teletypewriters. Operators typed in the message to the paper tape, and then sent the message at the maximum line speed from the tape. This permitted the operator to prepare the message "off-line" at the operator's best typing speed, and permitted the operator to correct any error prior to transmission. An experienced operator could prepare a message at 135 words per minute (WPM) or more for short periods." https://en.wikipedia.org/wiki/Punched_tape
And terminals! terminal input and output
KEY EMIT CR SPACE DIGIT
Also, in-forth editor (kinda IDE, kinda OS).
IBM 1130 minicomputer at a big textiles manufacturer.
16 bit, 8k ram.
Backup was via punch/reader.
With disks, now we can have file names!
Limited to 5 characters.
Moore's "fourth generation" system becomes FORTH.
Now have a return call stack, allowing nested definitions
: double dup + ; : quad double double ;
Univac 1108
Same textile company.
Written in assembler and could call COBOL modules because that's what the corporate suits wanted in 1970.
Moore hates complexity
National Radio Astronomy Observatory
- Computer control software for radio telescopes.
At this time, there are talks of patenting Forth.
Moore believes ideas shouldn't be patented.
He also rejects the standardization of Forth.
"All of my fears of the standard and none of the advantages of the standard have come to pass. Any spirit of innovation has been thoroughly quelched.
Underground Forths are still needed.
I said I thought the standard should be a publication standard but they wanted an execution standard."
-- Chuck Moore, 1997
Ported to the IBM 360/50
And to the Honeywell 316
And to the Honeywell DDP-116
And to the DEC PDP-11 (yes, that PDP-11)
All of this porting possible because of indirect threaded code.
Wait, so if Chuck hates complexity so much, why is threaded code so hard to understand? It seems very complex? Well, read on!
For a good discussion of Forth internals: https://www.bradrodriguez.com/papers/moving1.htm
Threaded code is not related to concurrency, i.e. "multi-threaded programming".
It's code that consists almost completely of calls to subroutines.
Could be machine code or interpreted.
Direct calls:
call 0x0804000 call eax or jmp 0x0804000 jmp eax
Indirect calls:
call [eax] or jmp [eax]
Calling indirect threaded code:
mov eax, esi call [eax]
Storing threaded code:
<addr pointing to code> <addr pointing to code> <addr pointing to code> <addr pointing to code>
Storing indirect threaded code:
<addr pointing to addr pointing to code> <addr pointing to addr pointing to code> <addr pointing to addr pointing to code> <addr pointing to addr pointing to code>
Threaded code was much more common in the days of yore.
It is very dense, compact on disk and in memory.
That's Forth's origin story.
This gives us the why.
Forth is postfix because that's the most natural order for a computer: get the values, operate on them;
Stack oriented because that's an incredibly minimalistic and compact and convenient way to store values without having to create variables (and name them!);
Concatenative because building a language that can operate as a string of words is incredibly flexible and can adapt to any (?) programming style without any help from the language itself. Especially with combinators;
Interpreted because that is interactive and allows changes on the fly. Again, because Forth is all about absolute maximum adaptibility;
Highly adaptable because you can bootstrap a Forth implementation from a handful of words implemented in assembly and then write the rest in Forth;
Extremely compact because computers at the time had extremely limited memory. And the ubiquitous low-power computers of the future may also demand this. The microcontrollers that control just about everything certainly do.
But somewhere along the way, I came across these quotes...
"To understand Forth, you have to implement a Forth."
-- Somebody on the Internet
"Take a look at JonesForth."
-- Everybody on the Internet
My NASM port of JonesForth: nasmjf
Opening the third eye by (re)implementing Forth.
JonesForth ascii art:
nasmjf
ascii art:
But that's just the tip of the iceberg!
nasmjf
inner/outer interpreter diagram:
To get from one code word to another uses a bit of
assembly pasted at the end of each in a chunk called
the NEXT macro. Here it is from nasmjf
:
%macro NEXT 0 lodsd ; NEXT: Load from memory into eax, inc esi to point to next word. jmp [eax] ; Jump to whatever code we're now pointing at. %endmacro
To get from one colon word to another uses a bit of
assembly pasted at the end of each in a chunk called
the EXIT macro. Here it is from nasmjf
:
DEFCODE "EXIT",EXIT,0 POPRSP esi ; pop return stack into esi NEXT
My comment in nasmjf
attempting to explain the
execution of indirect threaded
code as a nested sequence of
sequence of NEXT and EXIT and QUIT:
; QUIT (INTERPRET) ; * regular word ; DOCOL ; NEXT ; * regular word ; DOCOL (codeword ; NEXT ; * code word ; <machine code> ; NEXT ; * code word ; <machine code> ; NEXT ; EXIT ; NEXT ; EXIT ; NEXT ; QUIT (BRANCH -8 back to INTERPRET for more)
Absolutely nothing else drives the flow of an indirect threaded Forth application. It's addresses stored in registers and one or two line assembly instructions at the end of the word that manipulate the return stack as needed and jump to the next instruction.
Don't you see how simple it is?
I hope this is recognizable...
In case this meme is otherwise unknown or lost to the ages, it's
a screencap of the character Charlie Kelly from the show
It's Always Sunny in Philadelphia.
https://en.wikipedia.org/wiki/Charlie_Kelly_(It%27s_Always_Sunny_in_Philadelphia)
Forth is complex when taken as a whole. But it is made of tiny
pieces, each of which is very simple. The concept was created
over a period of years on very constrained systems.
Each part created only as needed.
Simple:
DEFCODE "SWAP",SWAP,0 pop eax pop ebx push eax push ebx NEXT
Simpler:
DEFCODE "DUP",DUP,0 mov eax, [esp] push eax NEXT
Simplest:
DEFCODE "DROP",DROP,0 pop eax NEXT
8 7 8 7 SWAP 7 8 DROP 7 DUP 7 7
nasmjf
has 130 code words. Mostly for efficiency.
sectorforth has 10.
: SDD SWAP DROP DUP ; 8 7 8 7 SDD 7 7
: SDD SWAP DROP DUP ;
Colon (:
) fetches the word name and sets "compile mode".
Semicolon (;
) completes the word's entry in the dictionary and unsets "compile mode".
Almost no syntax = simple interpreter
8 7 SWAP DUP DROP : SDD SWAP DROP DUP ; 8 7 SDD
Almost no syntax = extreme extensibility.
The definition of IF...THEN from jonesforth.f:
: IF IMMEDIATE ' 0BRANCH , HERE @ 0 , ; : THEN IMMEDIATE DUP HERE @ SWAP - SWAP ! ;
Almost no syntax = extreme extensibility!!!!!
The definition of ( ) nested comments from jonesforth.f:
: ( IMMEDIATE 1 BEGIN KEY DUP '(' = IF DROP 1+ ELSE ')' = IF 1- THEN THEN DUP 0= UNTIL DROP ; ( From now on we can use ( ... ) for comments. ...
The dictionary uses a linked list and word matching is done from the most recently defined word, so:
It's not just the language itself that is unusually flexible, the usage of Forth allows for really surprising flexibility.
Example paraphrased from Thinking Forth. Say we create a variable to hold a number of apples:
VARIABLE APPLES 20 APPLES ! APPLES ? 20
Forth variables put addresses on the stack.
I loved this example when I read the book.
I have a physical copy of Thinking Forth, but Forth, Inc. has made it available for free online: https://www.forth.com/wp-content/uploads/2018/11/thinking-forth-color.pdf
We pepper our program with this APPLES variable.
Then we are told that we must now keep track of two different kinds of apples: red and green. What to do?
A new variable will store the current type of apples.
VARIABLE COLOR
A new variable and word will deal with red apples. The word sets the type of apple by storing the address of REDS in COLOR.
VARIABLE REDS : RED REDS COLOR ! ;
Same for green.
VARIABLE GREENS : GREEN GREENS COLOR ! ;
And change APPLES from a variable to a word that gets the current count by color:
: APPLES COLOR @ ;
Now we have to re-write any use of APPLES, right?
Wrong! The use of APPLES is identical. The syntax hasn't changed one bit for any existing code. We just need to make sure we've set the right color.
20 RED APPLES ! 30 GREEN APPLES ! GREEN APPLES ? 30 APPLES ? 30 RED APPLES ? 20
"I didn't create Forth, I discovered it."
-- Chuck, apocryphally
Making nasmjf
gave me so many ideas, I had to try some
experiments.
An exercise in extreme concatenative programming where all code is concatenated (inlined).
: meow "Meow. " print ; meow Meow. : meow5 meow meow meow meow meow ; meow5 Meow. Meow. Meow. Meow. Meow.
Despite attempting to make something radically different, it's remarkable how many times Forth's solution was the path of least resistance.
"Aha! That's why."
Example: string quoting
" Hello World."
Meow5 has this quoting:
"Hello World."
But the effects are cascading...and limit flexibility
"Aha! That's why."
This is why I titled this "The programming language that writes itself"
Hand-written 1Kb binary
I took this image straight from the repo. I just LOVE this so much. https://github.com/nineties/planckforth
Forth is an idea that has taken form in countless applications, many of them custom and home-grown.
Operating system: Forth.
OS and library of routines in 8 KB of ROM.
"Ten times faster than [interpreted] BASIC" and less than half the memory requirements.
https://en.wikipedia.org/wiki/Jupiter_Ace
https://vintagecomputer.ca/jupiter-ace-4000/
"The most distinguishing feature among many computers of the era was that it offered Forth as a base language and not the usual BASIC that was the popular flavor of the day."
Operating system: Forth.
OS and office suite in 256 KB of ROM.
Innovative interface by Jef Raskin.
https://en.wikipedia.org/wiki/Canon_Cat
https://vintagecomputer.ca/canon-cat/
"While the Canon Cat is a more powerful word processor than most, it's most exciting characteristic is the utilization of a new (for the time) user interface known as the Humane Interface."
Also, you have to check out the picture of Jef with a little model of the Cat on his Wikipedia page: https://en.wikipedia.org/wiki/Jef_Raskin
The 2003 list by NASA's James Rash is too long to easily list.
"There is always great concern about software reliability, especially with flight software."
Control of 50-foot long, six-joint arm for Space Shuttle simulator. Extensive math routines convert two three-axis joystick commands into required joint velocities in six different co-ordinate systems. Entire system developed by one programmer in five weeks.
Multitasking operating system, Forth language compiler, and libraries for UT69R000 radiation-hardened microprocessor used in Space Shuttle instrumentation.
First mission to send a lander to a comet!
Forth used for the Rosetta Ion and Electron Sensor instrument, using Harris RTX2010 Forth microprocessor.
The Philae lander is powered by two 8MHz Harris RTX2010 16-bit stack processors...used by the Philae CDMS to control all aspects of the lander.
https://www.cpushack.com/2014/11/12/here-comes-philae-powered-by-an-rtx2010/
"Why was the RTX2010 chosen? Simply put the RTX2010 is the lowest power budget processor available that is radiation hardened, and powerful enough to handle the complex landing procedure. Philae runs on batteries for the first phase of its mission (later it will switch to solar/back up batteries) so the power budget is critical. The RTX2010 is a Forth based stack processor which allows for very efficient coding, again useful for a low power budget."
This is one of the final images taken by Philae as it made the soft descent to the surface of comet 67p.
https://www.nasa.gov/feature/jpl/final-descent-image-from-rosetta-spacecraft
Awesome talk by Jack Rusher
"...Space probes written in Lisp and Forth have been debugged while off world... If they had proven their programs correct by construction, shipped them into space, and then found out their spec was wrong, they would have just had some dead junk on Mars. But what these guys had was the ability to fix things while they are running on space probes... In addition, the spec is always wrong!"
-- Jack Rusher, Stop Writing Dead Programs, 2022
Check out Jack's awesome transcript here, which was super helpful when I wanted to re-find this quote. https://jackrusher.com/strange-loop-2022/
"Had I had more time, I would have done an entire series of slides on FORTH. It's a tiny language that combines interactive development, expressive metaprogramming, and tremendous machine sympathy. I've shipped embedded systems, bootloaders, and other close-to-the-metal software in FORTH."
I finally found a story about Forth being updated in space, if not the story:
"Also in 1993 I used MCL to help generate a code patch for the Gallileo magnetometer. The magnetometer had an RCA1802 processor, 2k each of RAM and ROM, and was programmed in Forth using a development system that ran on a long-since-decommissioned Apple II. The instrument had developed a bad memory byte right in the middle of the code. The code needed to be patched to not use this bad byte. The magnetometer team had originally estimated that resurrecting the development environment and generating the code patch would take so long that they were not even going to attempt it. Using Lisp I wrote from scratch a Forth development environment for the instrument (including a simulator for the hardware) and used it to generate the patch. The whole project took just under 3 months of part-time work."
When we defeat the alien kill-bots and reprogram them, it will
surely be with a Forth of some sort.
Obviously the graphic is unreadably tiny. For the full-size original and the gForth program used to create it, check out:
Charles H. Moore founded Forth, Inc in 1973. Has been porting Forth to various systems since.
Image from the great quote collection (are you tired of seeing this link?)
colorForth
Screenshot from this Quora answer by Tikhon Jelvis: https://www.quora.com/What-is-your-undergraduate-research-experience-like
"An idiosyncratic programming environment, the colors simplify Forth's semantics, speed compiling, and are said to aid Moore's own poor eyesight: colorForth uses different colors in its source code (replacing some of the punctuation in standard Forth) to determine how different words are treated."
"The language comes with its own tiny (63K) operating system. Practically everything is stored as source code and compiled when needed. The current colorForth environment is limited to running on Pentium grade PCs with limited support for lowest-common-denominator motherboards, AGP video, disk, and network hardware." https://en.wikipedia.org/wiki/ColorForth
Neat interview mentioning colorForth (amongst other things!) here: https://www.red-gate.com/simple-talk/opinion/geek-of-the-week/chuck-moore-geek-of-the-week/
Fighting the good fight against software complexity since the 1950s.
"I am utterly frustrated with the software I have to deal with. Windows is beyond comprehension! UNIX is no better. DOS is no better. There is no reason for an OS. It is a non-thing. Maybe it was needed at one time.
-- Chuck Moore, 1997
"If they are starting from the OS they have made the first mistake. The OS isn't going to fit on a floppy disk and boot in ten seconds."
-- Chuck Moore, 1999
Instead of being rewritten, software has features added. And becomes more complex. So complex that no one dares change it, or improve it, for fear of unintended consequences. But adding to it seems relatively safe. We need dedicated programmers who commit their careers to single applications. Rewriting them over and over until they're perfect.
-- Chuck Moore, 2009
His real love seems to be hardware.
Remember that Harris RTX2010? That's basically his design.
Has been designing hardware since 1983 starting with the Novix N400 gate array and dev board.
(An improved processor sold to Harris to become the RTX* chips.)
Has been designing chips ever since.
With his own VLSI software, "OKAD", written in 500 lines of Forth, of course.
"Very large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining millions or billions of MOS transistors onto a single chip. VLSI began in the 1970s when MOS integrated circuit (Metal Oxide Semiconductor) chips were developed and then widely adopted, enabling complex semiconductor and telecommunication technologies. The microprocessor and memory chips are VLSI devices." https://en.wikipedia.org/wiki/Very_Large_Scale_Integration
In 500 lines of colorForth these tools provide everything required to design a chip. They are derived from an earlier version called OKAD that successfully generated many versions of Forth microprocessor chips. https://colorforth.github.io/vlsi.html
A whole book about the NC4000 chip: http://forth.org/OffeteStore/4001-footstepsFinal.pdf
List of chips (and links to more): http://forth.org/cores.html
This question and answer(s) are excellent for understanding what OKAD is: https://retrocomputing.stackexchange.com/questions/25506/did-forths-inventor-charles-moore-really-write-a-cad-program-in-only-5-lines-of
GreenArrays. "Programming a 144-computer chip to minimize power" (2013)
144 asynchronous computers on a chip. Idle cores use 100 nW. Active ones use 4 mW, run at 666 Mips, then return to idle. All computers running flat out: 550mW (half a Watt).
youtube.com: "Programming a 144-computer chip to minimize power" - Chuck Moore (2013)
https://www.greenarraychips.com/
"COMPLETE SYSTEMS: We refer to our chips as Multi-Computer Systems because they are, in fact, complete systems. Supply one of our chips with power and a reset signal, and it is up and running. All of our chips can load their software at high speed using a single wire that can be daisy chained for multiple chips; if desired, most can be bootstrapped by a simple SPI flash memory."
"Contrast this with a Multi-Core CPU, which is not a computing system until other devices such as crystals, memory controllers, memories, and bus controllers have been added. All of these things consume energy, occupy space, cost money, add complexity, and create bottlenecks."
"NO CLOCKS: Most computing devices have one or more clocks that synchronize all operations. When a conventional computer is powered up and waiting to respond quickly to stimuli, clock generation and distribution are consuming energy at a huge rate by our standards, yet accomplishing nothing." https://www.greenarraychips.com/home/documents/greg/PB002-100822-GA-Arch.pdf
"If you talk about molecular computers that are circulating in your bloodstream, they aren't going to have very much power and they aren't going to have very much memory and they aren't going to be able to use much energy.
-- Chuck Moore, Programming a 144-computer chip to minimize power, 2013
Totally, like, my opinion:
The complete domination of x86 PCs in practically all areas of computing until very recently is an accident of history. There is absolutely no reason to believe that this is how the majority of computing will continue to be done in the decades to come.
Even relatively low-energy devices like our smart phones require massive computing power to do most tasks, especially anything having to do with a web browser.
There are physical realities about how much computing power (and memory) you can have per unit of energy. There are plenty of applications where low energy computing is a hard requirement and I believe these applications will only increase.
Also, virtualization changes things, too. We went from single applications taking over the whole computer to running a bunch of things on the same computer, to virtualized computers or containers where we're back to applications thinking they're the only thing on the computer again. VMs tend to be constrained by both power and memory.
I think Forth-likes have a strong future as we look towards:
Now, behold a new definition of the integer 4:
: 4 12 ;
Which results in:
." The value of 4 is " 4 . CR The value of 4 is 12Tada!