Forth: The programming language that writes itself: The Web Page

Charles H. Moore and the pursuit of simplicity.

drawing of chuck moore with his real head - with a scroll that says The Web Page Edition

Created: 2023-02-26 Updated: 2023-05-27

Note: This page is my personal journey to discover Forth and put it in context with a fair amount of computing history. This is adapted from my slides for a talk which are quite terse. I've done everything in my power to make this page scale up and down for various screen sizes. I welcome suggestions and corrections. Here's my contact page.

Estimated time to read: Forever.

(At least scroll down and enjoy my drawings since your browser has already loaded them!)

Work in progress: I have a lot of commentary and additional material to add. I'm currently writing a little bit on this almost every morning when I first get up. Also note that you are looking at a first draft straight out of the keyboard. I'll probably be cleaning up a lot in in a second or third pass.

Look for a marker like this one below to see my current progress.

The Legend

When I was a wee programmer, I would sit around the virtual Usenet campfires listening to the tall tales and legends of the elders. usenet campfires on a desert scene: comp.lang.forth comp.lang.lisp and alt.religion.kibology

In the 1990s, Usenet newsgroups (wikipedia.org) were where it was at. For example, Linus Torvald's initial announcement of Linux was to comp.os.minix in 1991.

The comp.* (wikipedia.org) groups and particularly comp.lang.* were great places to learn about and discuss programming. By the time I got there in the late 90s, Perl was a pretty hot topic, especially as it took a dominant role in the early Web as the CGI (wikipedia.org) language.

There were programming resources on the Web. (Nothing like what's available now!) But to actually learn to program, I bought books (and still do). And Usenet was where the community and folklore lived.

The "Easter egg" in this drawing is alt.religion.kibology, which should get a chuckle from old timers. (The rest of you can look it up.)

I learned about magical languages with lots of (((((parenthesis))))). third eye open to the y combinator

Sharp-eyed Lisp-lovers and other mutants will perhaps recognize this thing as the Y combinator expressed with lambdas.

The only time I understood this was when I completed the book The Little Schemer by Friedman and Felliesen, which walks you through creating it for yourself.

I listened, wide-eyed, to true tech tales like The Story of Mel (foldoc.org). Royal McBee RPC-4000 computer drawing

Mel was real and the Royal McBee RPC-4000 was real. Look at that teletype (aka "teleprinter"). If typewriters and "Royal" together make a little bell in your head go "bing" (as your mental carriage hits the end of the page, of course), then you're right: Royal McBee was a merger between the Royal Typewriter Company (wikipedia.org) and McBee, a manufacturer of accounting machines.

For a while, Royal was owned by the Italian typewriter company, Olivetti, who also made some really interesting computers (wikipedia.org).

And I heard tell of a programming language so flexible that you could change the values of integers.

chuck moore as an adorable wizard

They said that language was called Forth and it was created by a mad wizard called Chuck Moore who could write any program in a couple screens of code.

Years went by and I wrote a lot of PHP and JavaScript. I watched the Web evolve (and sometimes de-evolve).

But I never forgot about the legend of Forth.

The blog series "programming in the twenty-first century" by game developer James Hague gave me the final push.

He made Forth a recurring theme and it just sounded so darned interesting.

So I went on an adventure and I came back and I think I have some answers. a tired warrior returns from forth mountain

(Oh, and I confirmed the legend. I can make any integer equal anything I want. Stick around 'til the end to see that Forth magic trick.)

chuck moore as an adorable wizard

Postfix (RPN) notation

hp-35 calculator with rpn syntax

At first, I thought this was what Forth was all about:

3 4 +
7
    

Now begins my quest to understand Forth.

RPN notation is one of the most visually obvious aspects of Forth, but it's not what Forth is about or the reason Forth exists. As we'll see, the situation is reversed. What Forth is "about" is the "reason" it uses postfix notation, not the other way around.

In fact, as you'll see, my quest is mostly a series of incorrect assumptions I made by looking at the language without the context of history.

By the way, the HP-35 calculator (wikipedia.org) pictured here is really interesting. In the early 1970s, HP had powerful desktop calculators. Actually, what they had were really programmable computers, but they still called them calculators (wikipedia.org) for sales reasons.

Putting all of that power into a "shirt pocket" calculator was an astounding accomplishment at the time.

HP-35 calculators have been in space. They killed off the slide rule.

By the way, legend has it that the size of the HP-35 was based on the dimensions of Bill Hewlett's actual shirt pocket.

I mention all of this only because the HPs are famous for using RPN syntax, something that would otherwise be pretty obscure for more people than it already is.

Noob:

$ bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006,
2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
(3 * 4) + (5 * 6)
42
    

Pro:

$ dc
3 4 * 5 6 * + p
42
    

I'm being cheeky here. Users of bc ("basic calculator" or "bench calculator") are not "noobs" by any measure. But it is arguably far geekier to use the much older dc ("desk calculator") program.

Besides using RPN syntax, dc (wikipedia.org) is completely programmable. It also happens to be one of the very first Unix programs and pre-dates the C programming language!

Traditionally, bc was actually just a handy infix syntax front-end for the real calculator, dc. Thus the gentle ribbing.

Anyway, the point here is that RPN syntax lets you express nested expressions without requiring parenthesis to get the order of operations the way you want them.

In this example, we input 3, then 4. * multiplies them. Now we have the result, 12 available. But first, we input 5 and 6 and multiply them with another * to also store the answer 30. The final + adds both stored answers (12 and 30) and stores the resulting 42. We can't actually see this result until we "print" it with p.

As it is known about "ed, the standard text editor" (gnu.org), dc doesn't waste your VALUABLE time (or teletype paper) with output you don't need!

Forth pro:

3 4 * 5 6 * + .
42
    

As you can see, someone sitting at a Forth interpreter can perform this calculation exactly the same as with the dc calculator (or an HP-35, for that matter).

So I guess we know what Forth is all about, right? If we know how to enter things in postfix notation, we "get" Forth.

Well, that's true. Forth absolutely uses postfix notation. But then I learned some more...

Stack-based

drawing of three stacks illustrating push swap and dup operations

This is probably the second most obvious thing about the Forth programming language. It uses a stack.

A stack is a data structure often explained with a "stack of plates" analogy. You PUSH a plate on the stack and you POP a plate off the stack.

Above, we have an illustration of two other common stack operations: SWAP slides a plate out (very carefully) from the second position and puts it on top. DUP takes the top plate and replicates it using kitchen magic and puts the replica on the top of the stack (an equal amount of matter is removed somewhere else in the Universe, but we try not to worry too much about that).

All four of these words also happen to be Forth words that do those actions with values in memory. (What a coincidence!) Putting them in ALL CAPS is how we indicate that Forth is old and these words are old.

Historical note: In the old days, people and computers just WENT ABOUT SHOUTING AT EACH OTHER ALL THE TIME IN ALL CAPS AND IT WAS FINE. But when a computer asks, "SHALL WE PLAY A GAME?" in all caps, you say, "NO". WarGames, 1983 (wikipedia.org)

Op   The Stack
--   ---------
3     3
4     3  4
*     12
5     12 5
6     12 5  6
*     12 30
+     42
.
    

Let's revisit our math problem from earlier. Rather than being concerned with the syntax or notation, we're now interested in what these operations are doing with our data stack.

As you can see, entering a number puts it on the stack. The math operators here all take two values from the stack, do something with them, and put a new value back on the stack.

The '.' (DOT) operator is different since it only takes one value (to print it) and does not put anything back on the stack.

Now let's see something non-numerical...

Forth:

CAKE DUP HAVE EAT 
    

Getting the joke here will require knowing this English idiom (wikipedia.org).

Actually, this isn't just a silly example. Forth's use of the stack can lead to extremely natural looking use of nouns and verbs.

There can, indeed, be some object named CAKE that we have placed on the stack (probably a memory reference) which can be DUPed, and then HAVEd and EATen.

It's up to the Forth developer to make these harmonious word choices.

Lack of explicit names for intermediate values.

If I ask you to add these numbers:

2 6 1 3 7
    

Do you feel a need to give a name to each sum pair...or even the running total?

(Hopefully your answer is "no" or the rhetorical question doesn't work.)

But it's funny how our programming languages often require us to explicitly name intermediate results so that we can refer to them. But on paper, we would never give these values names - we just want to work with them.

Imagine, if you will, a factory assembly line of some sort in which each person working the line is a hateful pedant who refuses to work on the part in front of them until you name it. And each time the part has been worked on. Furthermore, they refuse to let you re-use a name you've already made up for a different person.

A lot of imperative languages are like that assembly line. As your values go down the line, you've got to come up with nonsense names like my_part, my_part_step2, my_part_step2(copy), and my_part_step2(copy)-new...

Well, working without names (sometimes called tacit or point-free programming) is sometimes a far more natural, and certainly less irritating, way to compute.

So, Forth is stack-based.

That's true, but then I learned some more...

Concatenative programming

a confused cat working on an old pc

Ah, this must be it because it sounds fancy.

On this journey of Forth discovery, you'll inevitably run into the term "concatenative programming".

An awesome resource for this is the website concatenative.org. It contains a list of concatenative languages (including Forth, of course).

The Factor programming language website also has an excellent page defining concatenative programming: Factor documentation: Concatenative Languages.

I understand the explanations on these websites now, but it took me a while to get there. Your journey may be shorter or longer. Probably shorter.

Let's see if I can stumble through it...

Contrast with applicative language:

eat(bake(prove(mix(ingredients))))
    

Concatenative language:

ingredients mix prove bake eat
    

An applicative language has you apply a function to a value, which returns another value. Using familiar Algol-like syntax, that means passing arguments to functions within a set of parenthesis.

Unlike the math examples, where the infix notation looks more natural to most of us than the postfix notation, the concatenative example of this baking program looks more natural (at least in a language sense) than the inside-out function application example.

(Of course, if you're a programmer used to years of something like C or Java or JavaScript, the inside-out parenthetical form will probably seem pretty natural too. Well, guess what? You're the weird ones now! How does it feel?)

The point here is that concatenative style has us "composing" functions (which you can think of as verbs) simply by putting them in sequence. Each function will be called in that sequence. The values that are provided or produced at each step are passed along to be consumed as needed.

But let's pursue this "concatenative language" concept past the surface...

The canonical example of a concatenative language is Joy.

Joy

Manfred von Thun inspired by Backus's 1977 ACM Turing Award lecture: top of the john backus paper Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs

Can Programming Be Liberated from the von Neumann Style? (PDF) (worrydream.com) This paper is dense with notation and I haven't personally attempted to wade through it yet. I'm sure it contains many profound ideas. I do know just enough to believe I understand this paragraph from the paper's abstract:

"An alternative functional style of programming is founded on the use of combining forms for creating programs. Functional programs deal with structured data, are often nonrepetitive and nonrecursive, are hierarchically constructed, do not name their arguments, and do not require the complex machinery of procedure declarations to become generally applicable. Combining forms can use high level programs to build still higher level ones in a style not possible in conventional languages."

Joy is where the "concatenative" description comes from in the first place, so using it to explain what concatenative programming is all about. von Thun specifically credits Billy Tanksley for creating the term "concatenative notation".

Joy is kind of like starting with a Lisp

...without variables

...and without traditional control structures

...and all functions are unary (or an "arity of 1").

Specifically, all functions take one stack as input and return one stack as output. The stack is not named, it is implied.

A program is simply a list of functions that is read from left to right.

Joy's superpower: combinators.

I created the above list describing Joy from the following von Thun interview answer:

"Joy then evolved from this in an entirely haphazard way: First I restricted the binary relations to unary functions, and this of of course was a dramatic change. Second, to allow the usual arithmetic operations with their two arguments, I needed a place from which the arguments were to come and where the result was to be put - and the obvious place was a stack with a few shuffling combinators, originally the four inspired by Quine. Third, it became obvious that all these combinators could be replaced by unary functions, with only function composition remaining. Finally the very different distinctively Joy combinators emerged, which take one or more quoted programs from the stack and execute them in a specific way. Along the way of course, lists had already been seen as just special cases of quoted programs. This meant that programs could be constructed using list operations and then passed on to a Joy combinator."

From A Conversation with Manfred von Thun (nsl.com), which is a really great read.

As you can see, combinators are crucial in Joy. Let's talk about those for a moment.

Combinators

Combinators are any "higher-order" functions like map.

And by "higher-order", we just mean functions that take other functions as input and do things with them.

You can even have functions that take functions that take functions and so on to do powerful things. But you'll need to meditate on them every time you have to re-read that part of your code.

JavaScript:

inc = function(n){ return n + 1; };

bigger = [1, 2, 3, 4].map(inc);

Result: [2,3,4,5]
    

JavaScript using an "arrow function":

bigger = [1, 2, 3, 4].map(n => n + 1);

Result: [2,3,4,5]
    

In the first example, we have familiar Algol-like syntax with all the brackets and parenthesis. Hopefully map() is familiar. But if not, just know that it takes two parameters: The first, in this case, is implicit as the Array object (defined literally as [1, 2, 3, 4] calling its own map() method). The second parameter is a function (inc in this case), which will be applied to every member of the list.

The output of map() is a new list containing the result of each application.

The second example with the arrow function syntax works exactly the same way, but more compactly.

Notice how both JavaScript examples have variables such as the parameter n and the result bigger. "Traditional" programming languages often make us name values before we can work with them.

The same thing, but concatenatively in Joy:

[1 2 3 4] [1 +] map

Result: [2 3 4 5]
    

The syntax here may require a little explanation. The [] brackets are Joy's quote mechanism. Quotations are a lot like lists, but they can contain programs as well as data.

In this case, the first quotation is the number list.

The second quotation is a program, `1 +`.

As in the JavaScript examples, map takes two parameters. The first is the function (well, _program_) to apply, and the second is the list to apply it to.

(It's kind of confusing to talk about "first" and "second," though because that's the opposite order in which we supply those arguments on the stack...)

Note the lack of variables bigger or n in the Joy example. This is a hallmark of concatenative programming - we don't need to name every intermediate value. They just exist!

This "point-free" or "implicit" or "tacit" style can be a blessing... or curse. Unlike computers, human brains have a hard time juggling too many things on the stack. There seems to be a happy medium.

Note: I had a bit of a time actually running Joy to test out these examples. Thankfully, I eventually ran into Joypy (github.com), a Joy written in Python which was simply a matter of downloading and running (assuming you already have a Python interpreter, of course.)

git clone https://github.com/calroc/joypy.git
cd joypy
python -m joy
...
joy? [1 2 3] [1 +] map
        

If you have the slightest interest in Joy, I highly recommend reading or skimming this tutorial by Manfred von Thun himself. Even without an interpreter to run the examples, it's a delight:

An informal tutorial on Joy

But we've barely touched the conceptual power of combinators with our map examples. Let's briefly go just a little deeper.

cover of the book

The book shown here is my personal copy of To Mock a Mockingbird by mathematician and puzzle-maker Raymond Smullyan. It uses puzzles involving birds to solve logic problems and classify some well-known combinators.

You can no more name all of the combinators than you can name all of the possible functions or cardinal numbers, but some have been identified as having special properties.

It turns out that higher-order functions, aka "combinators", are all you need to perform any computation. They can replace logical operators and even variables!

TODO: I once found a tool online that takes an expression and replaces all variables in the expression with combinators. What did I do with that link?

They're just like lambda calculus and Turing machines. Both are all you need to perform any computation it is possible to compute in the sense of the Church-Turing thesis (wikipedia.org) or "computability thesis".

It turns out that computability is is a fundamental feature of the Universe.

This particular flavor of universal computation is called combinatory logic (wikipedia.org).

Identity

(I x) = x
    

The simplest of all combinators is I, the identity combinator. This can also be written in lambda calculus as I = λx (or a via a variety of other syntaxes with more or less parenthesis).

The way to read (I x) = x is: "I applied to some value x results in...x."

(Instead of "value x", you could say "object x" in a very abstract sense of the word "object" that could be a scalar value, or list, or function, or anything.)

K and S

(K x y) = x

(S x y z) = (x z (y z))
    

The syntax above might not work the way you think it does and the I combinator was too simple to give me a chance to explain: The K combinator's definition, (K x y) = x reads, "K applied to x yields a combinator, which, when applied to y always evaluates to x."

(In other words, K makes a combinator that always throws away any input and just returns x.)

(Programmers familiar with the concept of currying will see that the way this this is like partial application of a function, where a new function is "pre-baked" with the argument x.)

K is super easy to write in a language like JavaScript, which is also a nice choice because you can play with it right in the browser console like I just did:

K = function(x){
  return function(y){
    return x;
  }
}

K("hello")("bye")

> "hello" 
        

(See how the result of K("hello") is a function that returns "hello" no matter what you give it as input?)

But the real point is this: S and K are very special. Together with I, they form the "SKI calculus" and just these three combinators are all you need to perform any computation in the known universe.

Actually, it's even crazier than that. You don't even need I because it be created with just S and K.

That's right, S and K alone are "Turing complete," as we programmers like to say.

By the way, understanding the S combinator has been left as an exercise for the reader. After that, you can start working on the Y combinator (remember that arcane lambda calculus artifact projected over that head with the third eye way up near the beginning of this page?)

cover of the book

This book is my personal copy of Combinators: A Centennial View, by Stephen Wolfram.

It starts with a terse introduction to the SKI combinator calculus and then launches into page after page of visualizations of S and K combinators being fed into each other. Some of them terminate and some don't. Some of them produce interesting patterns of surprising complexity.

I think it is absolutely wild to see someone just play with a subject like Wolfram does in this book.

Combinators is also Wolfram's ode to the discoverer of combinatory logic, Moses Schönfinkel (wikipedia.org) who, like so many of the giants in the field of computer science, did his work on paper decades before the first digital electronic computers beeped their first boops.

Okay, so they're a crazy way to compute. But are they worth using in "real" programs?

Combinators let us factor out explicit loops.

foo.map(bar)
    
equals
temp = [];
for(i=0; i<foo.length; i++){
    temp[i] = bar(foo[i]);
}
    

Yeah! map() is a great example. Once you start using simple combinators like this to abstract away the boilerplate logic of yet another loop over a list of items, it's hard to go back.

My personal history with exploring higher order functions in a production setting is through the Ramda (ramdajs.com) JavaScript library, which I discovered from the talk Hey Underscore, You're Doing It Wrong! by Brian Lonsdorf, which is fantastic.

Mind you, it's very easy to go overboard with this stuff and write something far less readable by your average human than some simple procedural code. (I've done this, ha ha.) But in limited doses, it's super powerful and compact.

"Factor out" (abstract away) unwanted details!

Joy uses combinators to factor out all sorts of logic.

Even different flavors of recursion can be completely handled for you by combinators in Joy thanks to the uniformly unary functions.

Here's a factorial definition:

factorial == [null] [succ] [dup pred] [*] linrec
    

Computing the factorial of a number is often used as a recursion example since it uses a cumulative result (you multiply a digit by the result of multiplying the rest of the digits).

In this Joy example, linrec is a "linear recursion" combinator and it takes 4 parameters.

Here is a JavaScript equivalent (also recursive):

function factorial(n) {
    if (n <= 1) {
        return 1;
    }

    return n * factorial(n - 1);
 }
        

Note that the Joy example is not just shorter and has no variable names but it has abstracted away the mechanics of recursion. All we're left with is the logic specific to the factorial problem itself.

Okay, so we've gone pretty deep into this concatenative programming and combinator thing. How does this actually relate to Forth?

Forth has higher order functions too with word "execution tokens" (function pointers).

EXECUTE
    

You can very compactly define words such as MAP, FOLD, and REDUCE.

First, let's see how EXECUTE works. The syntax will be alien to non-Forth programmers. But the concept will be no problem for anyone used to using first class functions (where functions can be passed around as values).

Let's take this step-by-step:

: hello ." Hello" ;
        

This is Forth for "compile a word (function) called hello that prints the string "Hello".

(We'll learn how compiling words works later. For now, please just gracefully accept what you're seeing.)

 
VARIABLE hello-xt
        

This creates a new variable called hello-xt which will store the "execution token" for the hello word.

 
' hello hello-xt !
        

This uses the ' (aka "TICK") word to get the address of the hello word and puts it on the stack. Then we put the address of the variable hello-xt on the stack. The ! (aka "STORE") word stores the provided value (the address of hello) at the provided address (the address of hello-xt).

(Or, simply, "store the address of hello in the variable hello-xt.")

 
hello-xt @ EXECUTE
Hello
        

The @ (aka "FETCH") word gets the address we stored and, finally, EXECUTE runs the word at that address. Behold, the "Hello" string!

For some compact higher-order function definitions in Forth, check out this this Gist by Adolfo Perez Alvarez (github.com).

Forth is definitely concatenative.

(And definitely capable of defining and using combinators, for that matter.)

However,

2 3 +
    

Forth: "Push 2, then 3 on the stack; pop both and add them; push the result, 5, on the stack."

Joy: "The composition of the functions 2, 3, and + is identical to the function 5."

What I mean by this is that while both languages share a common syntax, and both produce the same result for this expression, there is a fundamental difference between how the two languages "think about" the expression.

Forth's only concern (as a language) is to process these three tokens and act upon them according to a couple really simple rules. (If the token is in the dictionary, execute it. If it's a number, put it on the stack.)

To Joy, it may be the same mechanical process under the hood, but the language itself sees these tokens as part of a "mathematical" expression. It's a much more abstract outlook.

The point I'm making is that Forth may accomodate the abstract point of view, if the developer chooses to take it. But Forth is not based on abstract concatenative computing principles or combinatory logic.

Forth really is just about the "nuts and bolts". You bring your own theories with you.

(Understanding concatenative programming is super cool, but it doesn't actually help us understand the "true nature" of Forth.)

All these aspects of Forth:

Are true.

They are all perfectly valid and complementary ways of describing the language Forth:

CAKE EAT
    

This two-word example is all it takes to demonstrate these three aspects:

So these are all perfectly valid ways to think about Forth.

But to understand the answers to the why questions like, "Why was Forth created?" and, "Why is does it have this syntax?" I think we need to look at when. Let's try to figure this out from a historical context.

drawing of konrad zuse's z3 computer

Yes, postfix notation was definitely in the air (Zuse's Z3 1938).

And yes, stacks were known in the time of Forth's origins, but generally limited (2-4 items in registers).

But Joy (and the term "concatenative programming") come from the 1980s.

So there you have it. It's reasonable to assume that Forth's RPN syntax and use of stacks are "historically accurate." (And in the case of stacks, even that is a bit muddy since it seems stacks as *memory structures* in the earliest days weren't exactly as we think of them today in a world of huge, cheap RAM.)

But the "concatenative" concept wasn't floating around until later. So that's not a good way to examine Forth's history. It's probably not a good way to find out answers to the why questions.

The drawing of a computer labeled "Z3" on the right is of the Z3 computer designed by engineer and computer scientist Konrad Zuse. This is widely considered to be the first programmable digital computer. It used electro-mechanical relays (like telegraphs or in modern nuclear reactor safety systems where they are not as subject to nuclear radiation as semiconductors).

The Z3 could do addition in less than a second and multiplication in three seconds. It had 64 words of 22 bits each and worked with the equivalent of modern floating-point numbers.

As mentioned above, it can be said to use RPN, though there are only two registers and nine instructions with opcodes encoded in eight bits. The computer is programmable via punched paper tape (you can see the tape device to the right of the control console).

It is also a stack machine. Again, this is with a mere two registers, which get juggled in a particular sequence as you load values. Fun fact: The control unit used special control wheels to encode microsequences. If the microsequence wasn't programmed correctly, it could short-circuit the machine and destroy the hardware!

I got most of this information from this excellent paper by Raul Rojas: Konrad Zuse's Legacy: The Architecture of the Z1 and Z3 (PDF) (ed-thelen.org).

The point here is that the simple mechanics of RPN and stack-based operation are very natural for digital computing machines going back to the very beginning.

But that doesn't tell us much about why Forth is the way it is.

I still didn't get it.

We need to learn more.

chuck moore as marty in a drawing of the back to the future poster

We need to go back to the 1950s.

If this image doesn't make any sense, it's from the iconic movie poster by Drew Struzan for Back to the Future (1985) (wikipedia.org).

Smithsonian Astrophysical Observatory and MIT 1958

chuck moore operating an IBM 704

Fortran on punchards.

"Compiling took 30 minutes...you got one shot per day"

-- Chuck Moore, Forth, the Early years

In Forth - The Early Years (PDF) (worrydream.com), Chuck Moore recites a fairly terse history of Forth, from the earliest pre-Forths to the creation of the language standard.

He mentions the Smithsonian Astrophysical Observatory (SAO) and the Massachusetts Institute of Technology (MIT) in roughly the same time period, and it's a bit difficult to be entirely sure which is which.

But if you look at a map, SAO is at Harvard University. Harvard and MIT are about a mile apart in Cambridge, Massachusetts. Basically a singular point if you zoom out a bit.

The computer in question is the IBM 704 (wikipedia.org)

It was one of those room-filling vacuum-tube computers with tape drives the size of refridgerators.

It was fully programmable "modern" computer with magnetic-core memory, a 36-bit instruction set, multiple registers, and 36-bit words. There were switches for each register on the control console, but programs could be written to and read from paper punch cards.

"In its day, the 704 was an exceptionally reliable machine. Being a vacuum-tube machine, however, the IBM 704 had very poor reliability by today's standards. On average, the machine failed around every 8 hours, which limited the program size that the first Fortran compilers could successfully translate because the machine would fail before a successful compilation of a large program."

It's difficult to imagine now, but changing parameters for a program, re-compiling it, and running it again could take a day (assuming you didn't make any mistakes).

So Chuck solved that irritation with an extremely clever solution.

Moore made an interactive interpreter.

On a computer with nothing we would recognize today as an interactive terminal.

To control an astronomical calculation by "programming the program". fortran on a punchcard

Here's a quote from The Evolution of Forth (forth.com):

"Moore's programming career began in the late 1950s at the Smithsonian Astrophysical Observatory with programs to compute ephemerides, orbital elements, satellite station positions, etc. His source code filled two card trays. To minimize recompiling this large program, he developed a simple interpreter to read cards controlling the program. This enabled him to compose different equations for several satellites without recompiling..."

Free-form input was more efficient (smaller and faster). It also turned out, somewhat ironically, to be more reliable for human input than Fortran, which required formatted columns. Any mis-aligned columns in Fortran punchcard input would require a re-run of the program.

It's important to note that these programs did not use actual keywords per say. They were simply statement numbers encoded on the punchard.

Arguably, this really is the origins of the thing that will be named Forth.

And the statement numbers, had they been written, would be equivalent to these Forth words:

WORD NUMBER INTERPRET ABORT
    

Free-form input was unusual at the time. Obviously a super nice alternative to recompiling every time you want to change the numbers!

So, at last, we have discovered the true origin of the Forth language. Moore's interpreter was used interactively to program a program. It was created to reduce waste and tedium.

Stanford 1961

drawing of chuck at the stanford burroughs b5500 system

Now we head from Massachusetts to California where Moore found himself at Stanford University working with Stanford's Burroughs B5500.

(Moore had received his BA in Physics and started graduate school at Stanford.)

Let's talk about the computer first, then the specific project Moore worked on.

The B5500 (or "B 5500" - the official manual puts a space between the B and number) was a solid-state computer. It was part of the "second-generation" of computers. These computers had discrete transistors on circuit boards. By contrast, the first generation used vacuum tubes (like the aforementioned IBM 704) and the third generation used integrated circuits.

The Burroughs Large Systems company were transistor computer pioneers.

Here's some more resources:

Moore's CURVE written in Stanford's own Algol implementation.

A much more sophisticated interpreter.

Now has a data stack and stack manipulation operators:

+ - * IF ELSE DUP DROP SWAP
    

Specifically, Moore worked on the Stanford Linear Accelerator as a programmer. His focus was on steering the beam of the electron accelerator.

The CURVE program was much more "programmable" than the program(s) he created at SAO above. He took those ideas and expanded them to include the equivalent of the stack operators shown (as a way of passing parameters) and even the ability to define new procedures.

This is a much more capable interpreter.

(I also think it's interesting to compare with another interpreter created specifically to be embedded in larger programs for controlling them, Tcl (wikipedia.org). John Ousterhout created Tcl out of frustration with ad-hoc, half-baked solutions 27 years later in 1988 at Berkeley. Tcl and Forth have similar levels of syntactical purity and flexibility. Both let the user define things like new control structures!)

A powerful, extensible interpreter is a huge time-saver (over re-compiling the program) and allow the user of the program, who might or might not also be the original programmer, to add to the program's functionality on the fly.

Truly, now we have the beginnings of a fully-fledged programming language. It's not named Forth yet, but we're getting closer.

Freelancing 1965

a teletype 33 with paper tape reader and writer

"With the TTY came paper-tape and some of the most un-friendly software imaginable - hours of editing and punching and loading and assembling and printing and loading and testing and repeating."

-- Chuck Moore, Forth, the Early years

First, let's talk about what "TTY" means in 1965. Teleprinters (wikipedia.org) or "teletypewriters" or just "teletype" were all printer devices. The Latin "tele-" prefix used here in the pure sense of "far" or "at a distance". These machines trace a direct lineage from telegraphs and Morse code.

In the late 1800s, the concept of a typewriter which operated over telegraph lines had been explored, but the transmission code, paper tape, and typewriter system devised by Donald Murray (oztypewriter.blogspot.com) was the winner. (It can even be argued that Murray's choice of QWERTY keyboard is what cemented it as the standard around the world!)

The Baudot code (from which we also get the term "baud") was modified by Murray into something that very much resembles what we still use today. Murray introduced the concept of control characters which are very clearly typewriter controls CR (carriage return) and LF (line feed).

Teletype machines started as point-to-point text communications tools (like the telegraph), but they were later used over switched networks like the world-wide Telex system which used pulse dialing to automatically route a connection through the network.

The Teletype Model 33 (wikipedia.org) I drew here was one of the most popular teletypes used with computers from 1963 (it shares a birth year with the ASCII standard!) to the mid-1970s when video terminals finally came down in price enough to push printer teletypes aside. Teletype made the Model 33 until 1981, which is much later than I would have guessed!

As for paper-tape (wikipedia.org), I'll just quote Wikipedia directly:

"Punched tape was used as a way of storing messages for teletypewriters. Operators typed in the message to the paper tape, and then sent the message at the maximum line speed from the tape. This permitted the operator to prepare the message "off-line" at the operator's best typing speed, and permitted the operator to correct any error prior to transmission. An experienced operator could prepare a message at 135 words per minute (WPM) or more for short periods."

Again, Donald Murray didn't invent the concept of perforated paper tape for data storage, but his system used it for the encoding of transmitted messages from the keyboard and that seems to have a pretty direct lineage to the usage here. (I welcome correction from any historians.)

Just to be clear: In all of the above computers we've talked about, the output is displayed on (and often entered with) paper.

Terminal input and output!

KEY EMIT CR SPACE DIGIT
    

Also, an editor within the interpreter itself (kind of like having an "IDE", kind of like a complete OS of its own).

In the mid-1960s, "mini-computers" came out. They were still huge by today's standards, but no longer requiring a large room of their own.

The other big change was direct interactive use of a computer via teletype.

The invention of timesharing (stanford.edu) was a huge shift away from the "batch processing" style of computing that had come before.

Fun fact: A "second generation" time-sharing operating system called Multics (multicians.org) was the spiritual ancestor of and name from which Brian Kernighan made the joke name Unix: "One of whatever Multics was many of".

As it regards Moore's evolving pre-Forth language, this completely interactive use (as opposed to input via punch card) added some important interactive editing abilities.

This would have been right around the time that the original LISP REPL (Read-eval-print loop) (wikipedia.org) was created in 1964 on a PDP-1. Moore was very much on the latest trends in interactive computer usage.

By the way, if you want to see an awesome demonstration of interactive computer usage on paper, check out this demonstration by Bob Spence: APL demonstration 1975 (youtube.com).

Bob's demo is extremely pleasant to watch and brilliantly presented in split screen. Notice how paper output lets you mark up stuff with a pen - pretty nice feature!

Bob Spence (wikipedia.org) is best known for his own contributions, including a number of early clever computer interaction ideas that are worth re-examining today.

APL (wikipedia.org) is a whole other rabbit hole which has interesting intersections with the point-free and higher-order function programming we've encountered earlier.

1968

IBM 1130 minicomputer at a big textiles manufacturer. drawing of chuck at an IBM 1130 minicomputer

16 bit, 8 KB RAM.

Backup was via punch/reader.

With disks, now we can have file names!

File names limited to 5 characters...

Moore's "fourth generation" system becomes FORTH.

This is the big one! And yup, this really is the origin of the name, "Forth". Funny how temporary things tend to stick and last forever, isn't it?

The IBM 1130 (wikipedia.org) is one of those new-fangled "minicomputers" we've talked about. The CPU weighed less than a car!

And it was affordable! The base model was as low as $32,000. Compare that to $20,000, the median price for a house in the U.S. in 1965.

As noted, the unit Chuck Moore worked on had a disk drive, which would have bumped up the price an additional $9,000, or a couple brand-new 1965 cars in that driveway. But imagine having disk drive cartridges with 512 KB of storage at your disposal. What would you do with all that space?

We're still interacting with the computer (mostly) via paper, but this was a much more "modern" machine in terms of what we're used to now. It brought the idea of interactive computing to "the masses" because it was so much smaller, cheaper, and more reliable than the sorts of computers that had come before.

Quoting The Evolution of Forth (forth.com):

"Newly married and seeking a small town environment, Moore joined Mohasco Industries in Amsterdam, NY, in 1968. Here he developed computer graphics programs for an IBM 1130 minicomputer with a 2250 graphic display. This computer had a 16-bit CPU, 8k RAM, his first disk, keyboard, printer, card reader/punch (used as disk backup!), and Fortran compiler. He added a cross-assembler to his program to generate code for the 2250, as well as a primitive editor and source-management tools. This system could draw animated 3-D images, at a time when IBM's software for that configuration drew only static 2-D images. For fun, he also wrote a version of Spacewar, an early video game, and converted his Algol Chess program into the new language, now (for the first time) called FORTH. He was impressed by how much simpler it became."

The IBM 1130 was hugely influential to a bunch of early big-name programmers in addition to Chuck Moore.

Now have a return call stack, allowing nested definitions

: DOUBLE DUP + ;
: QUAD DOUBLE DOUBLE ;
    

At this point, we can truly call this the Forth programming language - not just because of the name, but because nested word definitions are made possible with a return address stack.

There is now a dictionary of named words which can be called interactively or recursively in the definitions of other words.

In the example above, we've defined a word called DOUBLE which duplicates the number on the top of the stack and adds the two numbers together.

A second word called QUAD uses the previous definition by calling DOUBLE twice, quadrupling the number in a rather amusing way.

Without a return stack, we have no way of telling the computer how to "get back" to the place in QUAD where we left off after DOUBLE is done.

(Don't worry, I'll explain how the syntax works in a moment.)

drawing of chuck at a univac 1108 console

1970

Univac 1108

Same textile company.

Written in assembler and could call COBOL modules because that's what the corporate suits wanted in 1970.

Moore hates complexity

The UNIVAC 1108 is a great example of the really cool "retro-futuristic" design in these old machines. Just look at the sweeping angles in my drawing of the console.

When these computers cost more than a house, it makes perfect sense that they were constructed into beautiful furniture and made to look like space ships.

You have to wonder: Did the sci-fi art of the time drive the design of these computers or did the computers and industrial design of the time inform the art? Or, more likely, did they both feed off of each other in the classic cycle of, "life imitates art imitates life?"

That's a teletypewriter build into the desk of the console. I presume the tractor-feed paper would have spooled off into a container behind the sleek facade?

Anyway, the UNIVAC 1108 is an even more modern computer. Like its contemporaries, it's moving into using integrated circuits for everything, including the register storage. (The 1108 had 128 registers and must have been interesting to program!)

As was also the trend at the time, the CPU was constructed of discrete cards connected together by a wire-wrapped (wikipedia.org) backplane.

By the way, wire-wrap connections are extremely high quality. Wire is wrapped with great force around a post, making a gas-tight connection that will not corrode (corrosion can occur outside the connection, of course). A little bit of the insulation gets wrapped in the last turns, which provides flexibility and strain relief.

Cheap PCBs have mostly replaced this technique, but it seems like a really excellent way to prototype circuits (or make finished products, for that matter) that ought to be considered more by hobbyists. I'm doing my part by having purchased the tools (a simple wrap tool and a spool of the wire). I intend to try it out on the next electronic thing I build.

Anyway, the Univac was a still more powerful and more modern computer and Moore certainly took advantage of it.

You don't have to read between the lines to see Moore's obvious distaste of COBOL (wikipedia.org), the COmmon Business-Oriented Language. What's impressive is that he managed to still use Forth while also using the required COBOL modules.

When this project was abandoned by the employer, Moore was upset by the whole situation, particularly the way business software was increasing in complexity.

He also wrote a book (unpublished) at this time called Programming a Problem-Oriented Language, which I've read. It's a bit stream-of-consciousness, but captures his thought processes at that time in terms of software construction in general and making "a Forth" particularly. It's very candid and short on pomp. Moore is what you might call a "straight shooter".

Musing: You have to wonder, is "Problem-Oriented" in opposition to "Business-Oriented" a la COBOL? I suspect so. I personally have the same peeve about the term "business logic" in programming. I know what people mean by it, but it still rubs me the wrong way. How would they like it if I called it "hobby logic" or "fun logic"? :-)

NRAO - Early 1970s

National Radio Astronomy Observatory - Computer control software for radio telescopes. drawing of radio telescope dishes from NRAO

Here is the official website of the National Radio Astronomy Observatory (nrao.edu). But for a better summary, the Wikipedia entry (wikipedia.org) is the way to go. Be sure to scroll down to the incredible image and description from 1988 of the collapsed 300ft radio telescope:

"The telescope stood at 240ft in height, wieghed 600-tons, had a 2-min arc accuracy, and had a surface accuracy of ~1 inch. The collapse in 1988 was found to be due to unanticipated stresses which cracked a hidden, yet weight and stress-supporting steel connector plate, in the support structure of the massive telescope. A cascade failure of the structure occurred at 9:43pm causing the entire telescope to implode."

The 300ft dish had been the world's largest radio telescope when it went active in 1962 at the NRAO site in West Virginia.

My drawing above is of the Very Large Array (wikipedia.org) in New Mexico. NRAO is also a partner in a huge international array in Chile.

By using radio interferometry, arrays of telescopes can be treated as essentially one huge telescope with the diameter of the array (obviously missing the sensitivity a dish of that size would have).

But the scope for which Moore wrote software was a single 36ft (11 meter) dish at Kitt Peak in Arizona called The 36-Foot Telescope. It was constructed in 1967 and continued working until it was replaced with a slightly larger (and more accurate) dish in 2013.

The 36ft scope was used for millimeter-wavelength molecular astronomy.

(Note that Moore stayed at the NRAO headquarters in Virginia.)

NRAO used Fortran on its minicomputers, but based on the success of his previous work, Moore was begrudgingly given permission to use Forth.

Since I couldn't possibly do justice to summarizing it, here's Chuck's own words describing the software he wrote for the NRAO:

"There were two modes of observing, continuum and spectral-line. Spectral-line was the most fun, for I could display spectra as they were collected and fit line-shapes with least-squares."

"It did advance the state-of-the-art in on-line data reduction. Astronomers used it to discover and map inter-stellar molecules just as that became hot research."

(Quote from "Forth - The Early Years".)

Here is a photo (nrao.edu) of the 36-foot telescope. And here is a photo of the control room in 1974 (nrao.edu) with what appears to be a PDP-11 in the background?

As you can see, the work itself was extremely interesting and cutting-edge. But how Moore went about it was also very interesting, which a series of computer drawings will demonstrate in a moment.

But on the Forth language front, there was another development...

At this time, there are talks of patenting Forth.

Moore believes ideas shouldn't be patented.

If the note about COBOL previously didn't already give you a sense that Chuck Moore is a bit of a rebel, this part should.

We take it for granted now that "free" or "open" software unencumbered by patents and restrictive corporate licenses is a good thing. But this was absolutely not a mainstream position in the early 1970s.

To put things in context, in the summer of 1970, Richard Stallman (wikipedia.org) was just out of high school and was writing his first programs in Fortran (which he hated) and then APL.

It wasn't until 1980 that Stallman finally had enough with the state of proprietary and legally encumbered software. Companies were increasingly using copyright to prevent modification, improvement, or duplication by the end user. Stallman started writing free clones of such programs. He announced the GNU project in 1983.

I believe Moore and Stallman have it right. There's absolutely nothing wrong with writing programs for money or selling software. But using the law to prevent people from truly owning that software (choosing how or where to run it) or even preventing them from writing their own similar software, if they are capable, is an abominable practice and should be countered at every step.

He also rejects the standardization of Forth.

"All of my fears of the standard and none of the advantages of the standard have come to pass. Any spirit of innovation has been thoroughly quelched.

"Underground Forths are still needed.

"I said I thought the standard should be a publication standard but they wanted an execution standard."

-- Chuck Moore, 1997

(Quote from the ANSI Forth section in this cool collection of Forth quotes (ultratechnology.com).

I think that when you get to the heart of what Forth is all about, Moore's displeasure with the ANSI standardization suddenly makes tons of sense. In short, the whole point of Forth is to create your own toolkit. Having a an all-inclusive language standard is great for making sure Forths are interchangeable. But it's antithetical to adapting the language to your specific needs.

Okay, that's the philosophical stuff out of the way. Now let's get back to the technological aspects.

While Moore was at NRAO, he also wrote software to point the telescope. Elizabeth Rather (Moore credits her as "Bess Rather" in his paper) was hired for support and they worked together on at least one port. The Forth system migrated across multiple machines at NRAO, which, as we'll see, highlights one of the technological strengths of the standard Forth implementation.

By the way, after her initial reaction of shock and horror, Rather embraced Forth. From The Evolution of Forth (forth.com):

"After about two months, Rather began to realize that something extraordinary was happening: despite the incredibly primitive nature of the on-line computers, despite the weirdness of the language, despite the lack of any local experts or resources, she could accomplish more in the few hours she spent on the Forth computers once a week than the entire rest of the week when she had virtually unlimited access to several large mainframes."

Rather went on to write the first Forth manual in 1972 and write papers about it for the NRAO and other astronomical organizations.

(Later, Elizabeth "Bess" Rather (wikipedia.org) became the co-founder of FORTH, Inc with Chuck and remained one of the leading experts and promoters of the Forth language until her retirement in 2006!) screenshot of the top half of the first page of the article

There's also a great overview paper of the whole NRAO system by Moore and Rather in a 1973 Proceedings of the IEEE: The FORTH Program for Spectral Line Observing (PDF) (iae.nl).

It includes a high-level description of the system with some user input examples (in Forth) and a neat diagram on the first page.

Anyway, I mentioned porting Forth to a bunch of different computers...

Ported to the IBM 360/50 drawing of chuck using an ibm 360/50 computer

Moore mentions first having ported his Forth system to the IBM 360/50 (wikipedia.org).

The System/360 (or S/360) line was an extremely successful computer series. IBM was the first company to use microcode to make a compatible instruction set across all computers in the series.

Until then, if you bought a cheaper computer to get started and then needed more computing power, you could buy a more powerful computer. But then you would have to re-write your programs in a new instruction set if you had written them in assembly. Or re-compile them, typically from punchcards, if they were written in a high-level language like Fortran. (Both computers would also need a compatible Fortran compiler!)

It's easy to see why being able to upgrade without changing your software would have been appealing, especially to businesses.

The cheaper 360 computers used microcode and the most expensive and powerful machines had hard-wired logic. NASA even had some one-off models of IBM 360 made just for them.

System/360 computers were made between 1964 and 1978. They were a big bet (5 billion dollars according to System 360: From Computers to Computer Systems (ibm.com).

It wasn't just the instruction set that was compatible. The 360 computers also had standardized peripheral interfaces, which were compatible between machines.

Between the longevity of the 360 systems and the compatibility between them, there was a huge market for peripheral devices. IBM themselves made 54 different devices such as memory, printers, card readers, etc. The 360 also spawned a whole third-party peripheral industry, much like the IBM PC-compatible era that started in 1981 and continues to the desktop computer I'm typing on right now in 2023.

And to the Honeywell 316 drawing of chuck using a honeywell 316computer

I drew Chuck behind the system in this one because I couldn't bring myself to obscure an inch of that glorious pedestal console.

You can see the brochure image I drew this from on the Wikipedia page for the Honeywell 316 (wikipedia.org).

Or if they change it for some misguided reason, the image should still be available here (wikimedia.org).

Just look at the space-age curves on that thing. It looks straight out of a Star Trek set. There's basically no chance the one Moore actually worked on had this console, though. Less than 20 of them were sold.

Beyond just its appearance, this particular console has a really wild history. The extravagant gift company, Neiman Marcus, actually offered the Honeywell H316 with this pedestal as a "kitchen computer". It cost $10,000 and would have come with a two-week course to learn how to input recipes and balance a checkbook using toggle switches and lights to indicate binary data! (As far as anyone knows, none of these were actually sold.

The ad for the Honeywell Kitchen Computer was in full "Mad Men" mode and was, as was unfortunately typical for the time, extremely patronizing (but if you can look past that, extremely hilarious as well):

"If she can only cook as well as Honeywell can compute.

"Her souffles are supreme, her meal planning a challenge? She's what the Honeywell people had in mind when they devised our Kitchen Computer. She'll learn to program it with a cross-reference to her favorite recipes by N-M's own Helen Corbitt. Then by simply pushing a few buttons obtain a complete menu organized around the entree. And if she pales at reckoning her lunch tabs, she can program it to balance the family checkbook..."

You can see a tiny scan of the original ad with a woman admiring her new Honeywell Kitchen Computer that barely fits in her kitchen here (wikipedia.org).

But moving on from the pedestal...

We've seen a lot of Forth implementations, but the port to the H316 is considered to be the first complete, stand-alone implementation of Forth because it was actually programmed on the computer itself and was used to create other Forths (it's a metacompiler)!

And at this point, Moore has achieved a fully ascendant Forth.

But wait, there's moore...er, sorry, more!

As was (and still is) typical for a Chuck Moore endeavor, this telescope application pushed other new boundaries as well. The system actually ran across two computers (we're about to see the second one) and gave real-time access to multiple astronomers. Because it spread the load the way it did, there were no issues with concurrency (something we struggle with to this day).

This real-time control and analysis was basically a singular luxury available on no other system at the time. (Even the makers of these computers had created only the most primitive concurrency for these machines and it was nothing like this.)

As usual, Moore was right on the very bleeding edge of computing with his ultra-flexible Forth system.

...And ported to the Honeywell DDP-116 drawing of chuck using a honeywell DDP-116 computer

As mentioned above, the Forth system was ported to the DDP-116 (t-larchive.org). and used together with its "parent" system on the H316 featured above.

(The DDP-116 was originally manufactured by Computer Control Company in 1965, but CCC was sold to Honeywell in 1966 and became its Computer Controls division.)

The DDP-116 was a 16-bit computer (the first available for purchase), but still part of the "second generation" with individual transistors and components wire-wrapped together on huge circuit boards. (Check out the pictures on that t-larchive.org link for all sorts of excellent views of the insides and outsides of an example machine and peripheral devices!) It happens to have also been pretty rare machine. It didn't sell in vast quantities like the IBM systems.

As you can see in the drawing, Chuck Moore began to grow in power as his system evolved and this manifested in additional arms! (Or maybe I started to get a little loopy while drawing old computers for these slides in the final evenings before I as due to give my talk. I'll let you decide what is real.)

....And ported to the DEC PDP-11 (yes, that PDP-11) drawing of chuck using a DEC PDP-11 computer

The PDP-11 was, by some measures, the most popular minicomputer ever.

It was a 16-bit machine and had an orthogonal instruction set (the same instruction could be used in multiple ways depending on the operand). Even I/O was memory-mapped, so the same instructions used to move values around in memory and registers could also be used to transfer data to and from devices.

This made it fun to program by reducing the cognitive load on the assembly language programmer. The ideas in the PDP-11 spread rapidly and are to be found in the most popular architectures in use today (like Intel x86).

(PDP-11 assembly language will look surprisingly familiar to modern programmers compared to what came before.)

The original machines were made starting in 1970 with wire-wrapped backplanes and discrete logic gates.

Later models introduced large-scale integration, a term we'll see later, so keep it in mind. Basically, this is the integration of multiple ICs (which are themselves integrations of discrete transistors - Integrated Circuits) into a single chip. What we think of as a CPU (Central Processing Unit) in current (2023) desktops, laptops, and cell phones are these single chip integrations where everything has been designed to fit on a single piece of silicon, a single "chip". A CPU is still an IC, but one that packs a whole room or cabinet's worth of logic gates into a microscopic etching.

The large-scale integrated versions of the PDP-11 were still being made twenty years later in 1990! There are PDP-11s still performing crucial tasks today (nuclear power plants being one of the most prominent examples).

It's hard to see in my drawing, but the PDP-11 front panel is one of the most iconic computer interfaces ever made. Hobbyists make working models, including ridiculously cute and awesome miniature versions. Here are two model versions - links both go to the original wikipedia.org files, where you can admire their full beauty:

pdp11 panel1 pdp11 panel2

It would be difficult to overstate the impact of this machine. Probably the most famous piece of software released on the PDP-11 was the first version of Unix (wikipedia.org) actually named "Unix".

It was also the birthplace of the C (wikipedia.org) programming language. Dennis Ritchie ported Ken Thompson's B language to the PDP-11 to take advantage of its abilities. Unix was then re-written in C starting with Version 4.

The Unix we know today and a large portion of the command line utilities were programmed on the PDP-11.

You'll also note that Chuck Moore has gained his fourth and final arm in this drawing. ("Fourth", get it? Also, the word "fourth" looks so weird to me right now.) This may or may not reflect actual events. Also, I'm not sure if Moore would have been using a video terminal at that time. It's possible. DEC's first video terminal was the VT05 (columbia.edu), which came out in 1970.

All of this porting possible because of indirect threaded code. an abstract drawing of boxes and arrows representing threaded code in memory

First of all, "threaded code" in this usage is not related to concurrency, i.e. "multi-threaded programming".

It's code that is composed of subroutines addresses.

Threaded code can be machine code or interpreted.

Wait, aren't most programs composed of calls to subroutines?

That's true. The big difference is that threaded code doesn't actually contain the instructions to call the subroutines. It stores just the addresses.

Another routine advances a pointer over the address list and executes the subroutines.

Threaded code is a bit complicated to explain because to really understand how it works, you need to know how it is stored and executed.

And indirect threaded code is even more complicated and harder to explain than that!

"Hey, wait!" I hear you saying. "But if Chuck hates complexity so much, why did he use such a complex method for Forth?

There are good reasons. I'll first try to explain, briefly, how threaded code is stored.

Direct calls (not threaded):

jmp 0x0804000
jmp eax
    

This is the simplest type of "call" to store in a program. We simply have the jmp (jump) instruction followed by the address to jump to.

Note that I show both a hard-coded address (0x0804000) and a register (eax). Both of these are "direct" for our purposes.

Alternatively, we could use the more advanced call instruction, but this does more: it stores a return address on "the stack" before jumping to the specified address. A ret (return) instruction at the end of the called routine will use the address to resume the execution just after the "call site" where the call was first made.

(By the way, return addresses are stored on a stack because you can nest calls. Pushing addresses as you jump and popping them in reverse order as you return keeps things nice and neat. This "the stack" is not related to "the stack" in Forth which is sometimes better known as "the parameter stack".)

This is direct and it's not threaded.

Indirect calls (not threaded):

jmp [eax]
    

This example is just to explain what "indirect" means.

But for it to make any sense, you need to know that the square brackets around the register ([eax]) means "the value at the memory address stored in register eax".

So jmp [eax] means "jump to the address stored at the address stored in register eax."

That's indirect.

But it's not threaded.

Storing threaded code:

<addr pointing to code>
<addr pointing to code>
<addr pointing to code>
<addr pointing to code>
    

Instead of containing the actual instructions to jump or call subroutines:

jmp 0x0804000
jmp 0x080A816
jmp 0x08C8800
jmp 0x08C8DD0
        

or

call 0x0804000
call 0x080A816
call 0x08C8800
call 0x08C8DD0
        

Threaded code stores just the list of addresses:

0x0804000
0x080A816
0x08C8800
0x08C8DD0
        

There are two obvious consequences of this:

Storing indirect threaded code:

<addr pointing to addr pointing to code>
<addr pointing to addr pointing to code>
<addr pointing to addr pointing to code>
<addr pointing to addr pointing to code>
    

This is where it gets pretty crazy. So now we've got a second level of indirection. Why on Earth would we do this?

The answer is that this allows us to store a separate "interpreter" for different kinds of subroutines!

You see, the list above is true - these are addresses pointing to other addresses. But conceptually, what's really happening (in the case of an indirect threaded Forth) is that the initial list of addresses is pointing to Forth words (first address), which happen to start with the address of a routine (second address) that will be used to execute the rest of the word.

Confused yet?

I personally would have happily nodded along and not really understood that explanation at all until much later in my journey. But you're probably faster than me at apprehending this stuff.

Let's keep going!

drawing of a minicomputer saying 'i have 16k of core!'

Threaded code was much more common in the days of yore.

It is very dense, compact on disk and in memory.

In addition to its ultra-compact storage in memory, threaded code would have been even more efficient on contemporary machines because calling subroutines wasn't always as simple as the call instruction found on most "modern" architectures.

Subroutine and procedure call support (clemson.edu) by Mark Smotherman explains:

"1963 - Burroughs B5000 - A stack-based computer with support for block-structured languages like Algol. Parameters and return address are stored on the stack, but subroutine entry is a fairly complex operation."

So the subroutine call savings (both in terms of memory and in terms of computational performance were potentially much greater.

We'll revisit this topic from another angle next. But if you want to dive in right away (and see the origin of the boxes and arrows drawings at the top of this section), check out MOVING FORTH Part 1: Design Decisions in the Forth Kernel by Brad Rodriguez.

drawing of chuck moore as a superhero with a cape and bowtie

That's Forth's origin story.

This gives us the why.

Specifically,

Forth is postfix because that's the most natural order for a computer: get the values, operate on them;

Forth is stack oriented because that's an incredibly minimalistic and compact and convenient way to store values without having to create variables (and name them!);

Forth is concatenative because building a language that can operate as a string of words is incredibly flexible and can adapt to just about any programming style without any help from the language itself. (Especially when you throw in higher-order functions/words a.k.a combinators.);

Forth is interpreted because that is interactive and allows changes on the fly. Again, because Forth is all about absolute maximum adaptability;

Forth is highly adaptable because you can bootstrap a Forth implementation from a handful of words implemented in assembly and then write the rest in Forth;

Forth is extremely compact because computers at the time had extremely limited memory. (The ubiquitous low-power computers of the future may also demand this. The microcontrollers that control just about everything around us certainly do.)

Now that we have everything in historical context, I think it's much clearer why Forth exists and why it takes the peculiar form that it does.

But I was still a long way off from a true understanding of how it all comes together and what makes Forth special as a programming language.

Somewhere along the way, I came across these quotes...

"To understand Forth, you have to implement a Forth."

-- Somebody on the Internet

And

"Take a look at JonesForth."

-- Everybody on the Internet

You'll also notice the phrasing here, "a Forth," not just "Forth."

The distinction is important.

I think we've already established that Chuck Moore, that brilliant rascal, believes that a Forth system is best when it is custom-tailored to the system and task at hand. So writing your own Forth or Forth-like is entirely "par for the course" in any would-be-Forther's quest to discover the True Meaning of the language and enter the mystical realm where All is Revealed.

Well, what could I to do?

Having no other clear course of study, I decided to heed the wisdom of the crowd.

Presenting...

JonesForth and "Assembly Nights"

My faithful asus eeepc 701 waiting romantically on the bed text reads 'Assembly Nights'

To really get to know it, I took Forth to bed with me.

I wrote Assembly Nights when I realized how much I was enjoying myself:

"Over the last three months, I've developed an unusual little nighttime routine..."

I prepared myself for dealing with the JonesForth source (i386 assembly language, specifically the GNU GAS assembler flavor) by learning some assembly and Linux ABI basics.

Then I spent roughly a year porting JonesForth into a complete working copy in NASM assembler.

I did a tiny bit almost every night. A lot of it was debugging in GDB.

my giant gold on gray logo for nasmjf

My NASM port of JonesForth: nasmjf

Opening the third eye by (re)implementing Forth.

Here's the nasmjf web page

In the process of writing the port, I learned how a traditional indirect threaded Forth works.

And I learned that it takes time to absorb such a twisty-turny method of code execution.

Especially if the assembly language tricks are new to you, as they were for me.

JonesForth ascii art: jonesforth ascii art explaining flow of threaded code

One of the first things you encounter when you open up the jonesforth.S (the assembly language portion) are many ASCII art diagrams.

Richard W.M. Jones does an excellent job of walking you through the workings of the interpreter (and explaining the i386 instruction set features he uses).

JonesForth is 32-bit only and uses the Linux ABI directly.

If the diagram above seems a bewildering, I agree.

nasmjf ascii art: my nasmjf ascii art explaining flow of threaded code

After I was done with my port, I gave the whole diagram thing a try in my own assembly language source. This is one of several.

With the benefit of the distance of time, it is clear to me that these things only make sense once you already understand them to some degree. But the act of making them is extremely useful for solidifying your understanding.

But that's just the tip of the iceberg!

nasmjf inner/outer interpreter diagram: my nasmjf diagram showing outer and inner interpreter

The JonesForth ASCII art diagram and my interpretation in nasmjf are just part of the system. They're just showing how the "inner interpreter" works to execute Forth words.

You may recall from the section above about indirect threaded code that the second level of indirection allows different "interpreter" routines to execute different types of threaded threaded routines. Well, that's what these diagrams are show.

But when we say that Forth is an interpreted language, this is not what we're talking about. There's also the "outer interpreter" that the programmer interacts with.

And that is in some ways just as circuitous in its workings as the indirect threaded code.

In the vector image I made above for nasmjf, I attempted to map out the whole thing in my own words.

If you take anything from this image, it's that INTERPRET looks up words (functions) by name and calls them by executing the interpreter routine whose address is stored in the word (this is the indirect threading). In turn, there may be any number of interpreters, but the three main types used in JonesForth are:

But that only explains how code starts executing. How does this type of Forth know what to run after a word is complete?

Ah, for that we have...

To get from one code word to another uses a bit of assembly pasted at the end of each. This is the NEXT macro. Here it is from nasmjf:

%macro NEXT 0
    lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
    jmp [eax] ; Jump to whatever code we're now pointing at.
%endmacro
    

Notice the term "code word". That's the Forth term for words written in pure assembly language.

Every code word has this macro at the end.

This implementation uses the i386 lodsd instruction to take care of two operations in one: move a "double word" (which is 32 bits on Intel chips for the really annoying reason that they kept the definition of "word" at 16 bits even as the platform moved to 32 and then 64-bit architecture. So "word" on Intel architectures is a completely meaningless thing that you just have to memorize as "16 bits" even though the word "word" is supposed to be the native data size of the architecture. And what's worse is that the tools for working with programs on Intel chips like GDB then refer to everything with the corresponding C names for everything, which naturally assumed that the architecture names would be based on reality. But they aren't. So terms like "double" and "long" are basically just absolutely worthless legacy garbage to memorize and useful only to C and Intel architecture veterans. And now this parenthetical rant is way out of hand) from memory into a register, and then update another register so that it points to the next "double" spot in memory.

Then it starts execution at the address that is stored at that address.

Got that?

Ha ha, that's yet another essential moving part.

But wait, there's more.

To get from one colon word to another uses a bit of assembly pasted at the end of each in a chunk called the EXIT macro. Here it is from nasmjf:

DEFCODE "EXIT",EXIT,0
    POPRSP esi            ; pop return stack into esi
NEXT
    

For "regular" Forth words (those defined in the Forth language itself via the "COLON" compiler, which we'll talk about in a moment), we have the so-called EXIT macro. All this one does is handle the return stack. There will be a NEXT after that, so the process we described above will automatically start where we left off at the "call site" of the word we just finished executing.

That's another moving part.

My comment in nasmjf attempting to explain the execution of indirect threaded code as a nested sequence of sequence of NEXT and EXIT and QUIT:

; QUIT (INTERPRET)
;     * regular word
;         DOCOL
;         NEXT
;         * regular word
;             DOCOL (codeword
;             NEXT
;             * code word
;                 <machine code>
;             NEXT
;             * code word
;                 <machine code>
;             NEXT
;         EXIT
;         NEXT
;    EXIT
;    NEXT
; QUIT (BRANCH -8 back to INTERPRET for more)
    

This nested view of the process is as close as I've ever been to explaining (to myself) what the entire execution flow looks like at a high level.

I'm sure every Forth implementer has their own mental model.

Absolutely nothing else drives the flow of an indirect threaded Forth application. It's addresses stored in registers and one or two line assembly instructions at the end of the word that manipulate the return stack as needed and jump to the next instruction.

Don't you see how simple it is? drawing of chuck as crazy charlie explaining a theory with wild eyes and a wall covered in paper and strings

Historical note: The above "Crazy Chuck" drawing is a parody of the meme of Charlie Day's character from the episode "Sweet Dee Has a Heart Attack" of the show It's Always Sunny in Philadelphia:

"Every day Pepe's mail's getting sent back to me. Pepe Silvia, Pepe Silvia, I look in the mail, this whole box is Pepe Silvia!"

Perhaps you, citizen of the future, will not have recognized this parody, but at least now you can look it up.

Forth is complex when taken as a whole. But it is made of tiny pieces, each of which is very simple. The concept was created over a period of years on very constrained systems. Each part created only as needed. an abstract drawing of boxes and arrows representing threaded code in memory

This is where the historical context is, once again, very revealing:

As we've seen, Charles H. Moore did not create Forth all at once in a single lightning bolt of inspiration. It began as a simple command interpreter and executor and grew from there. It has always consisted of tiny little parts, working together.

Each of these tiny parts is extremely simple on its own.

And each was added over a period of time as the need arose.

I think that's the genius of Forth: That all of these little pieces can work together to make a running system and yet still remain independent. You can learn each of these in isolation. You can replace them in isolation.

The trade-off is: Ultimate flexibility and simplicity at the lowest level comes at the cost of ease of understanding at the highest level.

When most of us would have thought bigger, Moore thought smaller.

Code words

Let's go at this from a different direction. I've mentioned "code words" and "'regular' colon words".

In short: Code words are the machine language primitives supplied by the Forth implementation. Colon words are everything else. Colon words are written in the Forth language on top of the code word primitives. Colon words are either supplied by the Forth implementation or written by the end user (programmer).

Let's further demystify this with some extremely simple and extremely concrete examples of actual NASM assembly language source from nasmjf:

Small and simple:

DEFCODE "SWAP",SWAP,0
    pop eax
    pop ebx
    push eax
    push ebx
NEXT
    

Is that really SWAP? Yeah, that's really SWAP. We're just telling the CPU to pop the two most recent values from the stack and then push them back in opposite order.

(JonesForth uses the i386 call/return stack as a Forth parameter stack to make these operations easy.)

Smaller:

DEFCODE "DUP",DUP,0
    mov eax, [esp]
    push eax
NEXT
    

Down to just two instructions now. We move the value pointed at by the esp register into eax and then push it onto the stack.

To understand why this duplicates the top item on the stack, you need to know how the esp register is used. Here's the relevant comment from the JonesForth source:

"In this FORTH, we are using the normal stack pointer (%esp) for the parameter stack. We will use the i386's "other" stack pointer (%ebp, usually called the "frame pointer") for our return stack."

Which means that esp points to the current top of the stack. So pushing that value on the stack duplicates the top value.

Smallest:

DEFCODE "DROP",DROP,0
    pop eax
NEXT
    

Now we have a Forth word that is defined in this implementation as a simple i386 instruction! DROP just "removes" the top value from the stack. In this case, we pop it into the eax register to throw it away.

Now let's see these three words in action in a real Forth program that moves some real numbers around in memory...

Code words in action

8 7      8 7
SWAP     7 8
DROP     7
DUP      7 7
    

The code word primitives we've just defined are used by the rest of the Forth implementation to define colon words in the language itself. But you can (and will!) use them directly in your own Forth applications.

The above example show what it might be like to use these three primitives ourselves.

Apart from pushing the two numbers on the stack initially, we've now seen the assembly language code for the entire program shown above. That makes this pretty "bare metal" stuff, right?

Here's the walk-through:

nasmjf has 130 code words. Mostly for efficiency.

And if you weren't already, perhaps you're now wondering how many Forth words need to be defined in machine code (either written in raw machine code, assembly, or a higher level language like C and up) in order to have a "bootstrappable" Forth system?

There are some theoretical minimums. But as you get down to an absurdly small number of instructions, your Forth code to implement the rest of the language approaches absurdly large amounts of convolutions that test the limits of both programmer ergonomics and computational inefficiency.

Check out this amazing article by Frank Sergeant called A 3-INSTRUCTION FORTH FOR EMBEDDED SYSTEMS WORK (utoh.org).

"How many instructions does it take to make a Forth for target development work? Does memory grow on trees? Does the cost of the development system come out of your own pocket? A 3- instruction Forth makes Forth affordable for target systems with very limited memory. It can be brought up quickly on strange new hardware. You don't have to do without Forth because of memory or time limitations. It only takes 66 bytes for the Motorola MC68HC11. Full source is provided."

You read that right: 66 bytes.

And later:

"The absolute minimum the target must do, it seems to me, is fetch a byte, store a byte, and call a subroutine. Everything else can be done in high-level Forth on the host."

sectorforth has 10 code words.

Cesar Blum's sectorforth (github.com) is:

"...a 16-bit x86 Forth that fits in a 512-byte boot sector. Inspiration to write sectorforth came from a 1996 Usenet thread (in particular, Bernd Paysan's first post on the thread)."

See? There's Usenet again.

The author's posting of the project to the Forth sub-reddit (reddit.com) has additional insight:

"I've always been fascinated by the idea of having a minimal kernel of primitives from which "everything" can be built. Before Forth, I had only seen that in the form of Lisp's "Maxwell equations of software", which is cool, but always left me a little disappointed because it is too abstract to build something that you can actually interact with - you can't break out of its esoteric nature...
"With Forth, however, you can start from almost nothing, and start adding things like ifs, loops, strings, etc., things that look more like your day-to-day programming. I find that there's a lot of beauty in that."

Note: The statement about Maxwell's equations surely refers to Alan Kay's famous quote about LISP:

"Yes, that was the big revelation to me when I was in graduate school - when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were "Maxwell's Equations of Software!" This is the whole world of programming in a few lines that I can put my hand over."
(From A Conversation with Alan Kay (acm.org))

Okay, so we've talked about code words. Now let's see what colon words are all about...

Colon words are made of Forth!

: SDD SWAP DROP DUP ;
8 7      8 7
SDD      7 7
    

A colon word is so-named because its definition begins with the ":" character.

Colon words are defined entirely by code words, other colon words, or numeric values.

(You can also have strings in most Forth implementations, but those are just input that happens to follow a special word such as ." that knows how to handle the input!)

The example colon word definition above creates a new word called SDD that is a composition of the three code words we defined earlier.

Perhaps the word "composition" brings to mind the concatenative terminology we explored near the beginning of this quest?

You can see the new SDD word in action below the definition. The effect is, of course, identical to calling the three separate words SWAP DROP DUP in sequence.

In indirect threaded code terms, it might be helpful to remember that this colon word has been "compiled" into the addresses of the "inner interpreters" for each of the three code words.

How ":" works

: SDD SWAP DROP DUP ;
    

Colon (:) fetches the word name and sets "compile mode".

Semicolon (;) completes the word's entry in the dictionary and unsets "compile mode".

What may or may not be obvious at this point is that ":" is a Forth word.

It looks like the sort of thing we would call "syntax" in other programming languages. But it really isn't syntax.

It may be hard to come fully to grips with at first, but Forth's only syntax is the whitespace between tokens of input. (Tokens that are unfortunately tokenized by a word called "WORD", which is an incredibly confusing overload of the term.)

Also note the term "dictionary". That's been mentioned a couple times before, but not explained properly. Hold on a moment. We'll get to it.

Okay, so : turns switches the "outer interpreter" into compile mode and ; switches it back. But what does that mean?

"Compiling" in Forth means putting into memory:

At its simplest, compiling is just like executing, but we're storing addresses instead of jumping to them.

Actually, that's understating the elegance and simplicity of how this works, which is one of the most mind-blowing things in Forth.

Forth uses the same "outer" interpreter to compile code that it uses to execute code.

Which means that as you enter the definition of a word in Forth from your keyboard, the interpreter is compiling it on the fly into memory.

Now let's step through the example:

: SDD SWAP DROP DUP ;
        

In "normal mode", the interpreter is executing everything as we enter it.

When the interpreter encouters the ':' (colon) word, it looks it up in the dictionary and executes it. The definiton of ':' turns on the "compile mode" switch.

Now when the interpreter hits the "SWAP" word, it looks up its address in the dictionary as usual, but now it stores the address in the next available memory slot (a very important built-in variable called HERE keeps track of this memory position).

The same thing happens for "DROP" and "DUP".

(That's all it takes to "compile" threaded code!)

Then a bunch of really interesting things happen when the interpreter gets to ";" (semicolon).

First, ";" is looked up and found in the dictionary.

Perhaps you are wondering why its address isn't also compiled into our new definition?

One of the flags stored in a word's dictionary entry (in its header) is called "immediate". When this flag is turned on, the word is always executed immediately even in compile mode.

The ";" word is an immediate word, so it executes.

(There are also tricks for compiling immediate words into other words, but you are not allowed to learn about these until you write your own Forth. Ha, not really. But it's pretty mind-bending stuff.)

The definition of ";" turns off compile mode. Then it does some housekeeping to complete the entry of the new SDD word in the dictionary. We're going to talk in more about how the dictionary works in just a moment.

As soon as ";" returns control to the outer interpreter, we're now sitting in normal mode again and our new SDD word is available to be called directly or compiled into other words.

Almost no syntax = simple interpreter

8 7 SWAP DUP DROP

: SDD SWAP DROP DUP ; 8 7 SDD
    

I've alluded to this in several different ways above, but I'll make a bold claim: Forth has the simplest syntax and therefore the simplest parser, interpreter, and compiler ever used in a mainstream general-purpose programming language.

Lisp and Tcl are also famously minimalistic languages. People have written incredibly tiny implementations of each:

Mind you, both of these people are incredible programmers, but it's the sort of thing that can be done.

But Forth surely takes the cake. Even a certified non-genius like myself can write an entire Forth interpreter in a couple hundred assembly instructions. (See "Meow5" below.)

Tokenizing Forth can be done in a handful of instructions.

And as mentioned, once you've written a Forth interpreter, you're well on your way to a working Forth compiler.

It would be annoyingly redundant to walk through the two lines of Forth above because they are nearly identical. The only difference is that the first line simply executes each word as it is encountered (SWAP, DUP, DROP). The second line compiles those three words into a new word called SDD. The result of both lines is the same.

Almost no syntax = extreme extensibility

The definition of IF...THEN from jonesforth.f:

: IF IMMEDIATE ' 0BRANCH , HERE @ 0 , ;

: THEN IMMEDIATE DUP HERE @ SWAP - SWAP ! ;
    

Here's the mind-blowing part about Forth, and a solid reason to title this, "The programming language that writes itself."

This is a language control structure defined in the language! Forth is not the only language that can do this, but few languages invite the programmer to participate so thoroughly in the inner workings as often or as joyfully as Forth.

Figuring out how the IF/THEN definitions above work is left as an exercise for the reader, but here's the words you don't already know:

'       - gets the address of the word that follows, put on stack
0BRANCH - branch to the next value if the top of the stack has 0
,       - 'compile' the current stack value to the memory at HERE
@       - fetch value from address on stack, put value on stack
!       - store to memory (stack contains address, then value)
        

(By the way, I'll go on the record to say this: The early parts of bootstrapping Forth in Forth (so, like, at least the top 25% of jonesforth.f) is significantly more mind-bending than low-level code word definitions written in assembly language. In fact, any time I needed to return to the assembly, it was like a comforting blanket of simplicity compared to the logic puzzle of those Forth-in-Forth primitives!)

But, honestly, even seeing control structures implemented in the language cannot prepare you for seeing this next trick:

The definition of ( ) nested comments from jonesforth.f:

: ( IMMEDIATE
    1
    BEGIN
        KEY DUP '(' = IF DROP 1+
        ELSE ')' = IF 1- THEN
        THEN
    DUP 0= UNTIL
    DROP
;

(
    From now on we can use ( ... ) for comments.
...
    

Yeah, you read that right. Even comments are implemented in the language! And you can re-define them or add your own kind of comments!

Let that sink in for a moment.

Some of you are soiling yourselves in excitement right now.

Some of you are soiling yourselves in fear.

Don't be embarrassed. It's a natural reaction.

The Dictionary

The Forth dictionary uses a linked list and word matching is done from the most recently defined word, so:

Forth's dictionary is the reason all of this extensibility is possible.

More recent word definitions override older ones with the same name - the interpreter stops at the first match.

However, uses of old definitions are not in any danger because they haven't stored the name of the word, they've stored the address (as threaded code, remember?)

You can even extend words by calling the original word from a new one with the same name!

Astute readers are perhaps wondering what happens if you attempt to make a recursive word. Well, by default, ':' (COLON) marks the word currently being compiled into the dictionary as hidden or disabled so that previous definitions can be called, as mentioned.

But there is also a word called RECURSE which inserts a call to the current word within itself. (You knew there would be some way to do that, right? And if not, you'd be able to make one yourself.)

grayscale apple

It's not just the language itself that is unusually malleable, the usage of Forth allows for really surprising flexibility.

Here is an example lifted and paraphrased from Thinking Forth by Leo Brodie.

Say we create a variable to hold a number of apples:

VARIABLE APPLES
20 APPLES !
APPLES ? 20
	

Forth variables put addresses on the stack.

(Note: I have a physical copy of Thinking Forth because I love books, but Forth, Inc. has made it available for free online: Thinking Forth (PDF) (forth.com))

Let's walk through this Forth code line-by-line:

VARIABLE APPLES
        

Creates a new spot in free memory and a new word in the dictionary called APPLES that pushes that memory address on the stack when it is called.

20 APPLES !
        

Puts the value 20 on the stack, then the address for APPLES. The "!" (store) word takes an address and a value and stores the value at that address.

Conceptually, we've stored 20 in the APPLES variable (APPLES=20).

APPLES ?
        

This uses a new word we haven't seen yet, "?". It probably does not come as a surprise that it fetches a numeric value from an address and prints it (which pops the value off the stack again).

So, in this line, APPLES puts its address on the stack and "?" prints the value we stored there (20 in this case).

By the way, here's the definition of "?" in JonesForth:

: ? @ . ;

Look at how small that is! (I should also tell you that "." (dot) pops the top value off the stack and prints it as a number. Once you start writing your own Forth interpreter, as I'm sure you will the moment you finish reading this page, you will get immediately and permanently acquainted with ".".)

Okay, on with our example. We're about to be dealt a terrible blow...

We pepper our program with this APPLES variable.

Then we are told that we must now keep track of two different kinds of apples: red and green. What to do?

red apple green apple

This is exactly the sort of conundrum we see in real life software all the time. You prepared for all sorts of different quantities of apples, but it never occurred to anyone that we would need to track different types of apples.

Oh well, now we've got to figure this out. But do we have to completely re-write our application?

(Outside of this example, answer might be "yes". Maybe this changes the whole "theory" of the program, in the Programming as Theory Building (ratfactor.com) sense. In which case, a re-write or big refactor of our apple counting program might be very appropriate. But for this example, we're assuming that we have thousands of lines of apple-handling functionality that will not need to change. For this example, we'll say that grouping the apples by kind is more of a surface detail.)

So obviously we can't store two values in one variable and expect all of the existing code to still work. So what could we possibly do?

A new variable will store the current type of apples.

VARIABLE COLOR
	

Remember, VARIABLE creates a memory space and a word called COLOR that puts the address of that memory on the stack when it is called.

red apple

A new variable and new word will deal with red apples. The word sets the type of apple by storing the address of REDS in COLOR.

VARIABLE REDS
: RED REDS COLOR ! ;
	

That's three additional words so far: COLOR, REDS, and RED.

Only one of these, RED, is recognizably a function. But really all three of them are.

Remember, "!" stores a value at an address. So RED stores the address of the REDS variable in the COLOR variable (COLOR=[address of REDS]).

green apple

Same for green.

VARIABLE GREENS
: GREEN GREENS COLOR ! ;
	

Now we have five new words: COLOR, REDS, RED, GREENS, and GREEN.

Just keep in mind that RED and GREEN both update an address stored in COLOR.

And change APPLES from a variable to a word that gets the current count by color:

: APPLES COLOR @ ;
	
grayscale apple red apple green apple

And just to make sure this part is crystal clear, COLOR @ fetches the address (REDS or GREENS) stored in COLOR.

Now we have to re-write any use of APPLES, right?

Wrong! The use of APPLES is identical. The syntax hasn't changed one bit for any existing code. We just need to make sure we've set the right color.

20 RED APPLES !
30 GREEN APPLES !

GREEN APPLES ? 30
APPLES ? 30

RED
APPLES ? 20
	

All of the existing code that uses APPLES will still work exactly the same way with absolutely no modifications. You might need to stare at the example above for a while. I sure did.

Wait, I hear a distant screaming.

How could this possibly be okay?! Shouldn't we be running away in horror right now at this unchecked chaos material where anything is allowed? We can't just do that! This is not okay!

Well, maybe.

But I think one reason this actually is okay, on a conceptual level, is that APPLES did not really change what it originally did.

Coming from the normal programming language world, we would be tempted to say that APPLES was a "variable" before and it is a "function" now. But in Forth, a variable is a word that puts an address on the stack. A function is just...a word.

After our change, APPLES is still a word that puts an address on the stack. There is no conceptual change at the language level.

Because Forth provides the atomic units of computing at the language level, there is no abstraction to break when we make this change to APPLES.

Forth is a language where you make the abstractions. Forth sees no difference between a "variable" and a "function". A "variable" is just a convenience word you could have written for yourself. No abstraction means no broken abstraction.

grayscale apple red apple green apple

Okay, so we've seen how Forth "writes itself" in the sense that a tiny handful of primitives can be used to bootstrap the rest of the language in the language itself.

And we've seen how the enormous flexibility of the primitives can allow nearly unbounded freedom.

Now let's explore the language from another angle.

"I didn't create Forth, I discovered it."

-- Chuck, apocryphally

(I have been unable to find a source for the quote above. It probably comes from an interview. I have seen third-hand usages of the quote. And it does sound like something Chuck Moore would say in typical Chuck Moore modesty.)

If Forth truly is a fundamental way to express computation, then it's almost like a mathematical expression of computation like Gödel and Herbrand's general recursive functions, Church's lambda calculus, Turing's theoretical machines, Post's canonical systems, and Schönfinkel and Curry's combinators.

I'll let philosophers angrily split hairs over what I just said above, but I think the principle stands: Forth is a fundamental type of programming language design.

Why do I believe this? Well...

Making nasmjf gave me so many ideas, I had to try some experiments.

my lenovo 11e thinkpad with assembly code waiting romantically on the bed with a candle. text reads 'Assembly Nights II'

Meow5

meow5 cat logo

An exercise in extreme concatenative programming where all code is concatenated (inlined).

We explored what it means to be a "concatenative" programming language at the beginning of my journey above. In short, in a concatenative language, data implicitly flows from one function to another like a factory assembly line. (Contrast with an "applicative" language where data is explicitly passed to functions in named parameters).

Like Forth, Meow5 happens to be concatenative because it uses the same "parameter stack" concept.

Unlike Forth or most other sane languages, Meow5 is a thought experiment taken too far. Specifically, the thought, "instead of threading function calls by storing their addresses, what if we just store a copy of the whole function?

In compiler parlance, this is "inlining" (short for "inline expansion"). It is a common optimization technique for avoiding the overhead of a function call for small functions.

Here's an excellent summary: Inline expansion (wikipedia.org).

In the case of a tiny function like the word DUP, that would be a very good and sensible thing to do. Remember when we looked at the assembly language source of some small word definitions from my NASM port of JonesForth? DUP was just a single assembly instruction:

pop eax
        

It would be incredibly silly to have several jumps to and from a single-instruction word! (And, it comes as no surprise that "real" Forth implementations often inline small primitives such as DUP. Some even provide an INLINE word to allow the programmer to specify this explicitly.)

My question was: At what point is this no longer a good idea? Obviously at some point a function is too large to inline. But every code word (assembly language) definition in JonesForth was quite tiny by modern standards. With today's CPUs and their relatively enormous caches, how far could you take this inlining concept?

Plus, it would make compiling a stand-alone executable almost trivial because every word in a 100% inlined language would contain all of the machine code.

And wouldn't the CPU just love seeing all of those instructions executing in one straight and continuous sequence with no jumps? If I were a CPU, I would love it.

(By the way, spoiler alert: I consider Meow5 to be a delightful little failure. The problem isn't inlining machine code - that works great, and, indeed, the exported ELF executables from Meow5 work exactly as I imagined. The problem is data, and most conspicuously, data in the form of strings. You either have to copy the string in each word that uses it or go through some complicated hoops to track which word uses what data (as well as its location in memory in the live interpreter and in it's destination in the stand-alone ELF memory layout. Either way, the purity and simplicity is lost, which was the whole point of the experiment. Also, it will come as no surprise that I later discovered that Forth implementations often have an INLINE word so you can instruct the compiler which words would make the most sense being copied rather than jumped to - which gives you the best of both worlds. The Meow5 project is a success because I learned a lot.)

: meow "Meow. " print ;
meow
Meow.

: meow5 meow meow meow meow meow ;
meow5
Meow. Meow. Meow. Meow. Meow.
	

In this canonical Meow5 program example, a new word is created called meow. It defines a string containing "Meow. " (The trailing space is just to make it look better when we print it in a sequence.) A second word called meow5 calls meow five times as you would expect and it does what you would expect. And yes, this is where the language name came from.

This look exactly like the same thing in Forth. But the difference here is that meow5 actually consists of five copies of meow. Then, if you were to call meow5 from another word, then that would be another copy - making a total of ten copies of the original meow.

The question is, at what point does including a copy of every "ancestor" function all the way up the "call stack" spiral out of all sane proportions? And would attempting to create a recursive function in such a language cause an infinite loop and/or open a rift in the space-time continuum?

Like I said, this is a thought experiment taken too far.

Or think of it as an art project.

Anyway, the point is...

Despite attempting to "go my own way," it's remarkable how many times Forth's solution was the path of least resistance.

"Aha! That's why."

First of all, you'll notice I ended up using ":" and ";" to define new functions. Forth makes liberal use of symbols and abbreviations, which can make it pretty hard to read. But I have to admit, ": ... ;" has grown on me. So I adopted that in Meow5. So that's probably the most visible thing.

And using a postfix notation is absolutely the path of least resistance for a stack-based language - every comes in the order expected by the language. So an interpreter can execute the statements right as they come in.

(Side note: This is how the PostScript (wikipedia.org) printer and display language works. The printer may begin printing as soon as it recieves the document because everything is defined in the order it is needed and never depends on later information. This can also be a disadvantage of PostScript for viewing documents on screens: You can't just render a page mid-document because styling and formatting controls must be read in their entirety from the start of the document in order to ensure they are completely correct.)

I was determined to make things easy for myself in the "lazy" sense, so I can say with some certainty that Forth is one of the most "minimum effort" languages you can imagine in terms of machine and implementor effort.

There was one place I decided to deviate from standard Forth that I knew would make implementation harder:

Forth's string quoting with a " and a space:

" Hello World."
    

This has always bothered me. Chuck Moore even mentions it in his unpublished book, Programming A Problem-Oriented Language (PDF) (forth.org) in the section titled 6.3 Character strings:

"What does a character string look like? Of all the ways you might choose, one is completely natural:
    "ABCDEF...XYZ"
            
A character string is enclosed in quotes. It can contain any character except a quote, specifically including spaces."

Right! Obviously. So by golly, that's what I would do in Meow5, like every sensible language!

Meow5 has this more natural quoting without a space after the initial ":

"Hello World."
    

But the effects are cascading...and limit flexibility

I think it was worth the extra effort to support. (I also included escape sequences, e.g. \n and \", while I was at it.) But it wasn't just an up-front cost.

If we keep reading the next paragraph in 6.3 Character strings:

We get in trouble immediately! How do you recognize a character string? By the leading quote, for course. But do you modify your word subroutine to recognize that quote? If you do so you may never use a leading quote for any other purpose. Much better that the quote is a word by itself, treated like any other dictionary entry, for it can then be re-defined. But words are terminated by spaces, and I still resist making quote an exception. So let's type character strings:
    " ABCDEF . . . XYZ"
            

I ended up having to put exceptions for the " character in multiple places in the Meow5 interpreter, including my get_token function, which serves the same purpose as the "word subroutine" Moore mentions above.

And now any additional language feature which would add or modify the behavior of the interpreter will also have to work around or duplicate the special " character handling!

It seems I can either follow Moore's advice or re-discover it for myself. What I always love about re-discovering things for myself is that "aha!" moment when I realize why things are the way they are.

(Another perfect example of this was when I decided to have some of my functions leave the stack alone after using the top value. It seemed very wasteful to have to do a DUP before certain functions when I knew I would "always" want to keep the value on the stack anyway. Why not just peek at it and leave it where it was? Well, Moore recommends just popping everything so you don't have to remember. And you know what? He was absolutely right. I kept forgetting which words did or didn't pop the stack and it kept causing problems. I completely regretted it and ended up making them all pop like Moore advised. Another option would have been to have a naming scheme for non-popping words like the way Ruby has a "!" suffix for methods that modify their arguments.)

And this is another reason to title this "The programming language that writes itself." So we've seen:

If you set out to make the simplest possible interpreter for a brand new CPU architecture, it's likely you'll make a Forth whether you want to or not.

Forth lets you define more Forth in Forth so you can Forth while you Forth. And the Forth editor is Forth and can be extended with Forth, so can Forth Forth in Forth Forth Forth Forth. (I'll let you figure out which of those are nouns, adjectives, or verbs and whether or not I have the right number of them.)

And if that weren't enough, Forths often contain assemblers so you can define additional code words in Forth. JonesForth has the stub of one at the end. Here's the comment introducing it:

(
    ASSEMBLER CODE --------------------------------------------

    This is just the outline of a simple assembler, allowing
    you to write FORTH primitives in assembly language.

    Assembly primitives begin ': NAME' in the normal way,
    but are ended with ;CODE.  ;CODE updates the header so that
    the codeword isn't DOCOL, but points instead to the
    assembled code (in the DFA part of the word).

    We provide a convenience macro NEXT (you guessed what it
    does).  However you don't need to use it because ;CODE will
    put a NEXT at the end of your word.

    The rest consists of some immediate words which expand
    into machine code appended to the definition of the word.
    Only a very tiny part of the i386 assembly space is covered,
    just enough to write a few assembler primitives below.
)
        
And then there's this:

PlanckForth

Hand-written 1Kb binary binary layout of planckforth as taken from the repo

This image comes from the PlankForth repo (github.com). It's one of the most beautiful pieces of code I've ever seen. It's a complete ELF binary that with a working Forth implementation that fits in less than 1Kb (there's enough room for a description and copyright at the end).

And the best part isn't even the initial application or the way it's stored in an ASCII hex represention that can be turned into a working binary using xxd -r -c 8.

No, the best part is bootstrap.fs, written in line-noise like operators and gradually becoming readable Forth after a couple hundred lines.

Thankfully, comments are one of the very first things implemented and it's like seeing bacteria spell out words in a petri dish:

h@l@h@!h@C+h!k1k0-h@$k:k0-h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!
h@l@h@!h@C+h!k1k0-h@$k h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!

h@l@ h@!h@C+h! k1k0-h@$ k\h@k1k0-+$ h@C+h!
    i       h@!h@C+h!
    kkf     h@!h@C+h!
    kLf     h@!h@C+h!
    k:k0-   h@!h@C+h!
    k=f     h@!h@C+h!
    kJf     h@!h@C+h!
    k0k5-C* h@!h@C+h!
    kef     h@!h@C+h!
l!

\ **Now we can use single-line comments!**

\ planckforth -
\ Copyright (C) 2021 nineties
...
        

There are almost as many Forth implementations as there are stars in the night sky.

Forth is an idea that has taken form in countless applications, many of them custom and home-grown.

If it goes "beep boop", someone has written a Forth for it.

For some notable uses, here are some starting points:

Open Firmware (wikipedia.org) deserves a little more detail. It comes, like many things, from Sun Microsystems.

"Being based upon an interactive programming language, Open Firmware can be used to efficiently test and bring up new hardware. It allows drivers to be written and tested interactively."

Open Firmware was featured on the Space Shuttle ESN, which ran on a radiation-hardened UT69R000 (cpushack.com) processor. See the paper, Developing plug-and-play spacecraft systems: NASA Goddard Space Flight Center's (GSFC) Essential Services Node (ESN) (PDF) (zenodo.org) which notes that:

"Open Firmware can debug hardware,software, plug-in drivers, and even the firmware itself. Open Firmware provides interactive tools for debugging systems."

But I think it's time to get back to drawings of computers, right?

Check these out:

Jupiter Ace, 1982

drawing of the jupiter ace home computer

Operating system: Forth.

OS and library of routines in 8 KB of ROM.

"Ten times faster than [interpreted] BASIC" and less than half the memory requirements."

(The quote above is from Popular Computing Weekly, 1982.)

The Jupiter Ace (wikipedia.org) was a British home computer of the early 1980s.

It has a fan-made website, the Jupiter ACE Arhive from which has the page, What is a Jupiter ACE? (jupiter-ace.co.uk):

"The major difference from the 'introductory computer' that was the ZX81, however, was that the Jupiter ACE's designers, from the outset, intended the machine to be for programmers: the machine came with Forth as its default programming language."

That website has tons of resources, including the clipping from Popular Computing Weekly quoted above.

(You also owe it to yourself to visit the "What is..." page linked above and then hover your mouse over the image of the ACE's circuit board. Every single IC, capacitor, and resistor is identified and explained in little tooltips!)

It's not every day you see a programming language listed as an operating system for a computer. But you may recall that as early as the "IBM 1130 minicomputer at a big textiles manufacturer" era, Moore already had an editor and file management features. And you can certainly write hardware drivers in Forth if you have the right code word primitives. So there's no limit to how low-level Forth can go.

(There's also no limit to how high-level Forth can go. The book Thinking Forth by Leo Brodie (same book from which we got the apples example above) is full of examples of applications written in very "English like" high-level words.)

The ACE never sold very many units, but it is prized by collectors today.

The What is Forth? (jupiter-ace.co.uk) page has an excellent explanation of Forth in general, but especially as an all-encompassing computing system or "programming environment":

"Classic Forth systems use no operating system. Instead of storing code in files, they store it as source-code in disk blocks written to physical disk addresses. This is more convenient than it sounds, because the numbers come to be familiar. Also, Forth programmers come to be intimately familiar with their disks' data structures, just by editing the disk. Forth systems use a single word "BLOCK" to translate the number of a 1K block of disk space into the address of a buffer containing the data. The Forth system automatically manages the buffers."

Many of us fondly remember the boot-to-BASIC computers of the 1980s, but can you imagine growing up with the Jupiter ACE in your home and actually understanding it?

The ACE ran on the Zilog Z80 CPU, which was incredibly popular at the time for low-power computers and has had an amazingly long life. It was used in the higher-end TI graphing calculators such as the TI-85 I had in high school in 1996, which I spent many a happy afternoon programming in TI-BASIC.

Canon Cat, 1987

drawing of the canon cat word processor home computer

Operating system: Forth.

OS, office suite, and programming environment in 256 KB of ROM.

Innovative interface by Jef Raskin.

Another computer with Forth as an operating system!

The Canon Cat (wikipedia.org) is a particularly fascinating machine for a number of different reasons, the primary of which is the keyboard-driven interface by UI pioneer Jef Raskin.

Raskin wrote a book titled The Humane Interface (wikipedia.org) with some provocative ideas that are probably very much worth re-visiting. For example, I like these two design rules:

  • Elimination of warning screens - modern software applications often ask the user "are you sure?" before some potentially harmful action; Raskin argues they are unhelpful because users tend to ignore them out of habit, and that having a universal undo eliminates the need for them.
  • Universal use of text - Raskin argues that graphic icons in software without any accompanying text are often cryptic to users.

The Cat was the hardware and software incarnation of Raskin's design philosophies.

Also, you have to check out the picture of Jef with a little model of the Cat on his Wikipedia page: Jef Raskin (wikipedia.org). Direct link to the image: here.

The Cat ran on a Motorola 68000 CPU, which was also used in the Apple Macintosh and was one of the first 32-bit processors (32-bit instruction set, registers, and non-segmented memory addressing).

Though the Cat appears to have made good use of Forth internally, its use by the end user doesn't seem to have been promoted heavily by the documentation or user interface. From Canon Cat: Enabling Forth (digibarn.com) by Dwight Elvey, we can see the process sounds a bit awkward:

"Highlight the string: Enable Forth Language.
Then do: front, answer
Then: shift, usefront, space
You are now in Forth.
You need to do: -1 wheel! savesetup re
Front the editor, use the setup to set the keyboard to ascii so that you can type the characters < and > with shift , and shift .
Do a usefront disk.
It will save to the disk so that it will be ready the next boot with just the: shift, usefront, space to restart Forth.
To undo the Forth mode: Forth? off 0 wheel! re

(Note that USE FRONT is a dedicated key on the Canon Cat keyboard that lets you apply whatever function is printed on the front of another key on the keyboard. All of the Cat's interactions are performed through the keyboard.)

And if that process weren't enough to put you off, this warning seems particularly dire and, if anything, hilariously understated:

"Use care while in Forth mode as usefront shift : will format the disk (a good idea to make a backup or at least remove the disk while experimenting)."

But all of that makes it sound worse than it is. You can see Dwight Elvey boot up a cat and demonstrate it (youtube.com). Getting to the Forth interface is really not a lengthy process. Just a couple keystrokes. And the Cat is a more compact computer than I imagined from the pictures.

If you like industrial design or interesting computer interfaces, you owe it to yourself to check out the amazing pictures of Jef Raskin's team designing the Canon Cat (1985)! (digibarn.com)

If you want to see a bunch of pictures of a vintage Cat in amazing shape, check out Santo Nucifora's Canon Cat (vintagecomputer.ca).

Finally, I'll just repeat this Canon Cat fact so you can let it marinate in your head for a little bit:

OS, office suite, and programming environment in 256 KB of ROM.

Okay, speaking of hardware. Now it's time for...

title says Forth in Space and chuck is an astronaut on EVA who says May the Forth be with you.

Easily one of the most exciting uses of Forth is in space exploration. We've already seen how Chuck Moore was intimately involved in programming ground-based radio telescopes.

But Forth has also found its way into space craft.

NASA is famous for having stringent rules about software that runs on spacecraft costing hundreds of millions of dollars (upon which billion-dollar missions rely).

"May the Forth be with you," courtesy my daughter. :-)

NASA and the ESA

unreadable list of a ton of nasa projects using forth

The list of projects using Forth at NASA compiled by James Rash in 2003 is too long to easily list here.

The image on the right is intentionally too small to read. As you can see, it's a big list.

The original NASA link has died, but the page was archived by the Wayback Machine at archive.org. There's a nice copy hosted here as well: Forth in Space Applications (forth.com).

I haven't found a list like this for the ESA, but the Philae lander (featured below) would be one very prominent example.

By the way, Forth isn't featured here, but here's a fun overview of some CPUs used in various space missions: The CPUs of Spacecraft: Computers in Space (cpushack.com).

(And it's also not about Forth at all, but this is my Web page, so I can do what I want, so I'm not going to miss an opportunity to mention one of my favorite computing-in-space books: Digital Apollo: Human and Machine in Spaceflight (mit.edu) by David Mindell. It will change how you look at the Apollo missions, computers in general, and the role of astronauts in space craft!)

Space Shuttle Small Payload Accommodations Interface Module (SPAIM)

nasa mission patch for ssbuv

"There is always great concern about software reliability, especially with flight software."

From the paper Forth in Space: Interfacing SSBUV, a Scientific Instrument, to the Space Shuttle (PDF) by Robert T. Caffrey et al:

"There is always a great concern about software reliability, especi- ally with flight software. The effects of a software error in flight could be dramatic. We were able to produce reliable software by writing a Forth routine on the PC, downloading the software, and testing it interactively. We varied the inputs to a routine and checked the ability of the routine to operate correctly under all conditions. As a result, during the STS-45 Shuttle mission, the SPAIM flight software worked perfectly and without any prob- lems."

Forth systems can be multi-tasking and this allowed the system to monitor itself. Each task had its own stack and a watchdog task could, for example, check the health of another task by monitoring the other task's stack. (Stack depth was found to be a good indication of task health. Malfunctions would often cause the stack depth to grow.)

"The ability of the Forth development system to debug hardware and software interfaces, model missing hardware, simulate system malfunctions, and support system integration dramatically helped in the quick generation of error-free software. The interactive, integrated and multltasking features of the Forth system proved to be the key elements in the success of the SPAIM systems development. Several techniques such as stack depth monitoring, address monitoring, cycle time monitoring, and error flag monitoring provided system checks during both the system integration process and the actual Shuttle mission."

The interactive nature of the Forth system, as opposed to traditional debugging with a strictly compiled language, was considered to be hugely convenient and sped up programming, debugging, and testing.

The SPAIM system used a 16-bit Intel 87C196KC16 microcontroller, a MIL-SPEC member of the Intel MCS-96 (wikipedia.org) family, which started out as controllers for Ford engines in the 1970s. They continued to be made in various incarnations until 2007 and were often used in common devices such as hard drives, modems, and printers. Unlike many chips headed to space long-term, this one wasn't "rad-hard" (hardened against the effects of radiation).

NASA Robot Arm Simulator

robot arm in space shuttle

Control of 50-foot long, six-joint arm for Space Shuttle simulator. Extensive math routines convert two three-axis joystick commands into required joint velocities in six different co-ordinate systems. Entire system developed by one programmer in five weeks.

So, it turns out that you can't just use the same robot arm on the ground for simulations as the one that will go into space.

The Space Shuttle Robot Arm Simulator (forth.com) was a complex machine with some very specific requirements.

For example, contending with gravity changes the the requirements to such a degree that it's a completely different robot!

"The GSFC arm, for example, is designed to carry up to a thousand pound payload at its tip. In order to do this it uses a high pressure (4000 psi) hydraulic system rather than electric motors as on the RMS."
and
"Because of the completely different nature of the joint controls, the original RMS software was not usable except as a source of algorithms."

You can see in my drawing that the arm lived in a simulated shuttle bay and was accompanied by an enormous model satellite. That satellite looks exactly like the Hubble Space Telescope to me, which checks out with the facts.

This sounds wildly complex to me:

"The RMSS contains fourteen separate processes: one for each joint, one for each joystick, one for the digital display panel, a simulation process, a trending process, and several supervisory processes."

But, as seems to be a trend with Forth-based space software, the work was impeccable:

"Simulation testing was so thorough that when the arm software was installed on site, not a single change was made to the executive control algorithms."

Coincidence?

Shuttle Mission Design and Operations System (SMDOS)

drawing of the shuttle launching

JPL's ground-based control software for shuttle SIR-A and SIR-B radar imaging instruments.

This entry started off as an excuse to draw a Space Shuttle. But it's actually a great example of how an interactive system (as opposed to a strictly pre-compiled system) can save a mission.

The paper: Forth as the Basis for an Integrated Operations Environment for a Space Shuttle Scientific Experiment (PDF) (forth.com) describes a number of hardware failures that had to be overcome.

"It was in the first day of data taking that we noticed the first problem..."

The SIR-B's transmitting antenna had shorted, resulting in the expected 1000 watts of power being reduced to a faint 100 watts.

"Since the returned echo was negligible as received by the SIR-B antenna it was decided to increase the gain of the receiver. The problem was in not understanding what had happened to cause the failure it was not immediately apparent what the appropriate gain should be..."
Forth-based, highly adaptable SMDOS to the rescue!
"No problem. With the advice of the radar engineers, the Forth module that was used to generate the display was quickly modified to produce a calibrated display. The gain of the receiver was increased until a perfect bell-shaped pattern again appeared on the display."
Then a second hardware failure:
"This was only the start of our problems. A satellite on board failed to deploy properly. The shuttle had to remain in high orbit until the problem was resolved before it could fire its engines to descend to the orbit that had been planned for the SIR-B data taking. "
Now the shuttle would not be in the planned orbit for data-taking. A second SMDOS adaptation fixed that.

Then a third hardware problem with another antenna:

"A bolt had sheared in the antenna's pointing mechenism and the KU band antenna was trashing around, threatening to destroy itself. It was necessary for an astronaut to exit the shuttle (EVA) in a spacesuit to pin the antenna down."

Now the shuttle had to rotate to point at a relay satellite to gather data (to tape!) and then rotate towards Earth to transmit the recorded data, and repeat.

"Of course this meant an entirely new data-taking strategy. Again the SMDOS computers were put to work displaying new plans for the stringent new conditions."

They lost a bunch of data, of course, but at least they were able to salvage 20% of it by rotating and capturing and rotating and transmitting. None of which would have a been possible if they had not been able to modify the software on the spot.

Conclusion:

"When the antenna feed failed and we realized that the software had to adapt to that failure, it was relatively easy given the interactive Forth enviroment to change the required module to meet the new specifications. This is clearly beyond the capabilites of most languages."

Forth is possibly singularly unique in allowing complete freedom of modification in an interactive session. This is a double-edged sword if there ever was one. That sword, plus the postfix notation has been a hard sell in the corporate world.

Harris RTX2010 used in a ton of space applications. block diagram of harris chip

The RTX2010 (wikipedia.org) and its predecessor, the RTX2000 account for a good portion of the use of Forth in the space industry. They run Forth natively.

This is mostly because of the specific attractive properties of these processors - very low latency and the ability to quickly process the floating point mathematical operations needed for neat space stuff like navigation and thruster control.

Largely because of the stack-based design, the RTX 2000 and 2010 have very compact machine code. Subroutines calls take only a single cycle and returns are free! All branches take exactly one cycle as well.

They are also brilliantly minimalistic designs. The entire RTX2000 instruction set fits on a single page. See the first PDF link below:

Let's look at a specific spacecraft as an example...

Rosetta and Philae

drawing of rosetta approaching comet

First mission to send a lander to a comet!

The Rosetta Ion and Electron Sensor instrument used a Harris RTX2010.

The Philae lander used two Harris RTX2010s for complete system control (CDMS) and two more to control its landing system (ADS).

The ESA's Rosetta mission (esa.int) was an amazing mission for a bunch of reasons. It sent the first spacecraft to ever rendezvous with and then follow a comet around the Sun.

Rosetta traveled space for a full decade (2004 to 2014) before meeting up with comet 67P/"Churyumov-Gerasimenko", which is 4km wide and orbits the sun every six and a half years.

Rosetta orbited the comet for three months and then deployed another space craft called Philae to land on the comet.

Both craft contained heaps of instruments (11 on Rosetta and 9 on Philae) as well as high-resolution cameras. the whole mission (wikipedia.org) is worth reading about. There are some fantastic images and animations to be seen there and on the comet's own page (wikipedia.org). the philae lander brightly illuminated by the sun against the black backdrop of space

But for a moment, the star of the show was the Philae (wikipedia.org) lander.

Often described as being "the size of a washing machine," Philae dropped from Rosetta to the surface of 67p.

The picture at the right was taken from Rosetta's OSIRIS imager as Philae dropped slowly away from the orbiter.

Because the comet's gravitational pull is so small (huge boulders have been observed moving around on its surface), a pair of harpoons were meant to fire into the surface of the comet and hold the lander down. These did not deploy (possibly a mechanical failure) and a landing thruster also failed, so Philae ended up having a long, low-gravity tumble on the surface.

It's been speculated that the harpoon failure actually saved Philae from an even more exciting trip because studies of the surface found it to be harder than expected. It might have launched itself away rather than anchoring! (As it was, the lander bounced with a force that was just shy of escaping the comet's gravitational pull entirely. It rose a full kilometer above the surface before slowly returning for another two bounces before winding up at its final resting spot.)

As mentioned, Harris RTX2010 processors were used in at least one scientific instrument in the mission. (I've seen rumors of up to 8 different instruments, but it's difficult to confirm without sifting through a ton of material about each one.)

A pair of Harris RTX2010s controlled Philae's Active Descent System.

Check out Here comes Philae! Powered by an RTX2010 (cpushack.com):

"Why was the RTX2010 chosen? Simply put the RTX2010 is the lowest power budget processor available that is radiation hardened, and powerful enough to handle the complex landing procedure. Philae runs on batteries for the first phase of its mission (later it will switch to solar/back up batteries) so the power budget is critical. The RTX2010 is a Forth based stack processor which allows for very efficient coding, again useful for a low power budget."

Here is more information (with pictures!) about the physical design and components in the Philae control system in charge of the entire operation of the lander: Command and Data Management Subsystem (CDMS) of the Rosetta Lander (Philae) (sgf.hu).

"Harris RTX2010 processor has been selected for the DPU boards because it is the lowest power consuming, space qualified, radiation hardened, 16-bit processor with features to provide so complicated functions as the CDMS has to perform. It is a stack based, Forth language oriented processor with an exotic and challenging instruction set. CDMS is a real-time control and data acquisition system, and it has to process tasks in parallel. Therefore, a real-time, pre-emptive multitasking operating system has been developed to run application tasks executing the required functions in parallel."

And here is the lander's Active Descent System (ADS) QM User Manual (spyr.ch) which has way more detail about this computer system, including a number of details about the Forth software:

"After resetting the subsystem (power-on reset), the bootstrap sets up the Forth environment, copies the firmware from PROM to RAM and disables the PROM for further access.

After this, the main word Do-App is called from the Forth system immediately after setup. You can find the main word Do-App in the file app.fth (see part II). Do-App calls Init-App, which itself calls other initilisation words like Init-ADS. Then the application enters the main loop. In the main loop the following actions are performed:
  • reset the watchdog (watchdog is enabled for the QM)
  • put the data into the HK registers
  • get the data from the ADC handler
  • process CDMS requests"

Despite the unfortunate landing (which put Philae in too much shadow to get as much solar energy as hoped and at an angle that made communication with Rosetta difficult), Philae was still robust enough to perform "80%" of its scientific mission, which is pretty amazing.

lander picture: the outline of deeply fractured rock and ice, the darkness of space in the background, and a lander foot brightly lit by the sun.

Just look at that. A picture from the surface of a comet.

For the full-size image, more info, and links to the other CIVA camera images, see CIVA camera 1 view (esa.int).

There is a ton of fascinating information about the landing and the ESA's detective work to figure out where Philae actually ended up:

As for the Rosetta orbiter spacecraft itself...

This is one of the final images taken by Rosetta as it made the "hard descent" (controlled crash landing) to the surface of comet 67p: photo of comet 67p taken by rosetta (

The image and a description are here: Final Descent Images from Rosetta Spacecraft (nasa.gov).

"The decision to end the mission on the surface is a result of Rosetta and the comet heading out beyond the orbit of Jupiter again. Farther from the sun than Rosetta had ever journeyed before, there would be little power to operate the craft. Mission operators were also faced with an imminent month-long period when the sun is close to the line-of-sight between Earth and Rosetta, meaning communications with the craft would have become increasingly more difficult."

By the way, the ESA has a nice summary of the computer hardware used by the OSIRIS camera on Rosetta. Optical, Spectroscopic, and Infrared Remote Imaging System (esa.int).


Rabbit Hole Alert: This takes us away from Forth for a moment, but learning about the Virtuoso RTOS (real-time operating system) eventually leads to a deep, deep Wikipedia rabbit hole that takes you on a journey to the Inmos processors, Hoare's CSP, the occam programming language, the HeliOS parallel computing operating system, and the concept of the "transputer" microprocessors. Transputers use whole processors as building blocks for a parallel computer in the same way transistors are used as the building blocks for processors. (Thus, transputer = "transistor computer," you see?) They were mostly featured in supercomputers, but they also saw some industrial controller use and there was even an Atari Transputer Workstation, ATW-800. (I've intentionally not linked to any of these things here because you'll disappear into that hole an never see the end of this document, which would be very sad. Also, I mention "transputers" again one more time below and you wouldn't want to miss that.)


The Rosetta orbiter and Philae lander now rest silently on the surface of 67p, where they will no doubt stay for billions of years or until future comet tourists pick them up and put them in a trophy room, whichever comes first.

Stop Writing Dead Programs

crop of jack rusher from the previous screenshot "...Space probes written in Lisp and Forth have been debugged while off world... If they had proven their programs correct by construction, shipped them into space, and then found out their spec was wrong, they would have just had some dead junk on Mars. But what these guys had was the ability to fix things while they are running on space probes... In addition, the spec is always wrong!"

-- Jack Rusher, Stop Writing Dead Programs (talk given at Strange Loop 2022)

Here's the talk: "Stop Writing Dead Programs" by Jack Rusher (Strange Loop 2022) (youtube.com). You've got 43 minutes. I'm timing you. Don't get distracted by other suggestions. Come back here. I'm waiting.

Or better yet, check out Jack's awesome transcript, which was super helpful when I wanted to re-find the above quote: Stop Writing Dead Programs. (jackrusher.com).

In his transcript, he notes:

"Had I had more time, I would have done an entire series of slides on FORTH. It's a tiny language that combines interactive development, expressive metaprogramming, and tremendous machine sympathy. I've shipped embedded systems, bootloaders, and other close-to-the-metal software in FORTH."

I was extremely interested in hearing about Forth systems being updated in space, but had a heck of a time finding any.

I finally found one on a page that is otherwise largely dedicated to Lisp's use at the Jet Propulsion Labs: 1992-1993 - Miscellaneous stories (sourceforge.io) on the amazing, sprawling site for the Mecrisp-Stellaris Forth (which runs on various non-x86 CPUs):

"Also in 1993 I used MCL to help generate a code patch for the Gallileo magnetometer. The magnetometer had an RCA1802 processor, 2k each of RAM and ROM, and was programmed in Forth using a development system that ran on a long-since-decommissioned Apple II. The instrument had developed a bad memory byte right in the middle of the code. The code needed to be patched to not use this bad byte. The magnetometer team had originally estimated that resurrecting the development environment and generating the code patch would take so long that they were not even going to attempt it. Using Lisp I wrote from scratch a Forth development environment for the instrument (including a simulator for the hardware) and used it to generate the patch. The whole project took just under 3 months of part-time work."

When we defeat the alien kill-bots and reprogram them, it will surely be with a Forth of some sort. alien kill-bots being controlled by forth

In the background, one of the Invader machines lies crumpled and smoking in the ruins of one of Earth's great cities. Stomping towards us with its mechanical arms raised in victory, is another Invader. But this one is different. The tell-tale giveaway is the opening in its protective head dome.

At great cost, humans had managed to trap one of the Invaders long enough to concentrate all of their efforts in penetrating its dome defenses, while leaving the rest of the machine unharmed and operable.

Working feverishly against a doomsday clock, they burrowed deep into the electrical heart of the machine, identifying and classifying its alien functions until they understood it well enough to attempt an interface.

A bus protocol was decoded. Programming work began.

It went poorly. The aliens had unthinkably bizarre notions of generalized computing that defied all known patterns of software.

Everything had to be done with agonizing labor, stringing sequences of raw bus messages together in hopes of getting a correct response.

But then someone had the bright idea to bootstrap a Forth from the known instruction sequences. With this, they could write a bare-bones interpreter. And, at last, they could experiment quickly and safely.

Days later, an arm moved. Then they crushed a barrel with a gripper claw.

BARREL OBJECT-ID VISION TARGET
133 L-ARM-FWD 14 L-CLAW-OPEN
25 L-ARM FWD 14 L-CLAW CLOSE
        

Then they took a first four-legged step. Then 20 steps.

PREP-QUAD-LEGS
20 STRIDE-LOOP
        

As ravaged fighters looked on in amazement, Defender-1 burst from the old brick warehouse and toppled an Invader as it passed by on patrol.

The machines grappled for a moment and it looked as if Defender-1's clumsy movements would be no match for the alien, even from a superior position.

But all of the weapons systems had been decoded by then.

: KILL
    100 BEAM-LEVEL
    BOT OBJECT-ID VISION TARGET
    L-BEAM FIRE-FULL
    R-BEAM FIRE-FULL
;
        

Now the humans had the upper hand at last. Other Invader machines were disabled or captured. Defender-1 was joined by Defender-2 and then 3, 4, 5, and so on. Software was passed by sneaker-net and by shortwave packet radio.

City by city, Earth took back control. And along with victory, word of the One True Language spread across the land. Flags were raised in honor of its original discoverer Chuck Moore.

Where other abstractions had failed, the universal machine truth of Forth had succeeded.

Forth is an idea

unreadably tiny diagram of lineage of various Forth implementations

Obviously the graphic is unreadably tiny. For the full-size original and the gForth program used to create it, check out:

Forth Family Tree and Timeline (complang.tuwien.ac.at).

One of the hardest things about trying to learn "Forth" is realizing that there is no one Forth.

There are Forth standards dating back to the original ANS Forth document and continuing with the official website of the Forth 2012 Standard, and Forth200x committee (forth-standard.org).

There are many common words, certainly, but purpose-built Forths will have their own special vocabularies.

Also, it may be that for many of us, making Forths is actually more fun than using them. The ground is littered with hobby Forths. Each one is a little glittering jewel.

Work In Progress Marker: I've gotten this far.

What about Chuck?

Charles H. Moore founded Forth, Inc in 1973. He's continued to port Forth to various systems. drawing of chuck at a desk programming on a pc with a crt. equipment looks 1990s era

I drew this image of Chuck from a photo in this amazing quote collection, Moore Forth: Chuck Moore's Comments on Forth (ultratechnology.com) by Jeff Fox.

You'll notice I added some color to my drawing for the first time. That's because I'm pretty sure that what we're seeing on Chuck's monitor is...

colorForth screenshot of colorforth

Screenshot from this Quora answer by Tikhon Jelvis: https://www.quora.com/What-is-your-undergraduate-research-experience-like

"An idiosyncratic programming environment, the colors simplify Forth's semantics, speed compiling, and are said to aid Moore's own poor eyesight: colorForth uses different colors in its source code (replacing some of the punctuation in standard Forth) to determine how different words are treated."

"The language comes with its own tiny (63K) operating system. Practically everything is stored as source code and compiled when needed. The current colorForth environment is limited to running on Pentium grade PCs with limited support for lowest-common-denominator motherboards, AGP video, disk, and network hardware." https://en.wikipedia.org/wiki/ColorForth

Neat interview mentioning colorForth (amongst other things!) here: https://www.red-gate.com/simple-talk/opinion/geek-of-the-week/chuck-moore-geek-of-the-week/

Fighting the good fight against software complexity since the 1950s.

"I am utterly frustrated with the software I have to deal with. Windows is beyond comprehension! UNIX is no better. DOS is no better. There is no reason for an OS. It is a non-thing. Maybe it was needed at one time.

-- Chuck Moore, 1997

Once again, from the compiled quotes at ultratechnology.com

"If they are starting from the OS they have made the first mistake. The OS isn't going to fit on a floppy disk and boot in ten seconds."

-- Chuck Moore, 1999

Once again, from the compiled quotes at ultratechnology.com

Instead of being rewritten, software has features added. And becomes more complex. So complex that no one dares change it, or improve it, for fear of unintended consequences. But adding to it seems relatively safe. We need dedicated programmers who commit their careers to single applications. Rewriting them over and over until they're perfect.

-- Chuck Moore, 2009

Once again, from the compiled quotes at ultratechnology.com

His real love seems to be hardware. Remember that Harris RTX2010? That's basically his design. chuck as a mad scientist chip creator

TODO: https://www.cpushack.com/2013/02/21/charles-moore-forth-stack-processors/

Has been designing hardware since 1983 starting with the Novix N400 gate array and dev board.

(An improved processor sold to Harris to become the RTX* chips.)

Has been designing chips ever since.

With his own VLSI software, "OKAD", written in 500 lines of Forth, of course.

"Very large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining millions or billions of MOS transistors onto a single chip. VLSI began in the 1970s when MOS integrated circuit (Metal Oxide Semiconductor) chips were developed and then widely adopted, enabling complex semiconductor and telecommunication technologies. The microprocessor and memory chips are VLSI devices." https://en.wikipedia.org/wiki/Very_Large_Scale_Integration

In 500 lines of colorForth these tools provide everything required to design a chip. They are derived from an earlier version called OKAD that successfully generated many versions of Forth microprocessor chips. https://colorforth.github.io/vlsi.html

A whole book about the NC4000 chip: http://forth.org/OffeteStore/4001-footstepsFinal.pdf

List of chips (and links to more): http://forth.org/cores.html

This question and answer(s) are excellent for understanding what OKAD is: https://retrocomputing.stackexchange.com/questions/25506/did-forths-inventor-charles-moore-really-write-a-cad-program-in-only-5-lines-of

GreenArrays. "Programming a 144-computer chip to minimize power" (2013)

144 asynchronous computers on a chip. Idle cores use 100 nW. Active ones use 4 mW, run at 666 Mips, then return to idle. All computers running flat out: 550mW (half a Watt). screenshot from Chuck's 2013 strange loop talk about 144 computer chip

youtube.com: "Programming a 144-computer chip to minimize power" - Chuck Moore (2013)

https://www.greenarraychips.com/

"COMPLETE SYSTEMS: We refer to our chips as Multi-Computer Systems because they are, in fact, complete systems. Supply one of our chips with power and a reset signal, and it is up and running. All of our chips can load their software at high speed using a single wire that can be daisy chained for multiple chips; if desired, most can be bootstrapped by a simple SPI flash memory."

"Contrast this with a Multi-Core CPU, which is not a computing system until other devices such as crystals, memory controllers, memories, and bus controllers have been added. All of these things consume energy, occupy space, cost money, add complexity, and create bottlenecks."

"NO CLOCKS: Most computing devices have one or more clocks that synchronize all operations. When a conventional computer is powered up and waiting to respond quickly to stimuli, clock generation and distribution are consuming energy at a huge rate by our standards, yet accomplishing nothing." https://www.greenarraychips.com/home/documents/greg/PB002-100822-GA-Arch.pdf

If you read the "Rabbit Hole Alert" under the picture of the surface of comet 67p above, then you saw the term "transputer". I think it's super interesting to compare and contrast the GreenArrays GA144 chips to the Inmos transputer chips.

(Another fun diversion into massively parallel computers is one of my favorites: Danny Hillis's Connection Machine (wikipedia.org) computers featuring a "12-dimensional hypercube" routing design. (Inner parenthetical: Hillis himself is a human rabbit hole of fun stuff. He's the author of one of my favorite non-fiction books, "The Pattern on the Stone," and co-founder of The Long Now Foundation (with another human rabbit hole of awesomeness, Steward Brand) which is responsible for the design and creation of the 10,000 year giant mechanical "Clock of the Long Now". (I could link to all of those things, but you might never be seen again. (Also, boy am I glad that Vim highlights matching parenthesis, even in non-programming contexts.))))

"If you talk about molecular computers that are circulating in your bloodstream, they aren't going to have very much power and they aren't going to have very much memory and they aren't going to be able to use much energy.

-- Chuck Moore, Programming a 144-computer chip to minimize power, 2013

Totally, like, my opinion:

The complete domination of x86 PCs in practically all areas of computing until very recently is an accident of history. There is absolutely no reason to believe that this is how the majority of computing will continue to be done in the decades to come.

Even relatively low-energy devices like our smart phones require massive computing power to do most tasks, especially anything having to do with a web browser.

There are physical realities about how much computing power (and memory) you can have per unit of energy. There are plenty of applications where low energy computing is a hard requirement and I believe these applications will only increase.

Also, virtualization changes things, too. We went from single applications taking over the whole computer to running a bunch of things on the same computer, to virtualized computers or containers where we're back to applications thinking they're the only thing on the computer again. VMs tend to be constrained by both power and memory.

I think Forth-likes have a strong future as we look towards:

The Legend Confirmed

chuck moore as an adorable wizard

Now, behold a new definition of the integer 4:

: 4 12 ;
    

Which results in:

." The value of 4 is " 4 . CR
The value of 4 is 12
    
Tada!