Ratfactor's Judgement of Snobol4

Page created: 2024-10-27

Updated: 2025-06-03

Drawing of a little rat on a snowball with a heart drawn by Dave Gauer in Krita

Back to I learned Snobol and then wrote a Forth.

SNOBOL. Perhaps you’ve heard of it? I often see it come up in conversations about alternatives to regular expressions. Snobol is a whole language based around the concept of string pattern matching.

(Note: Pattern matching is, by far, the most interesting part of Snobol. Feel free to jump down to the "Pattern Matching" section. Though I can’t promise it’ll make sense if you do.)

I’ve had it on my crowded mental "to-learn" list for ages but always found a way to put it off.

Learning Snobol4 in the 2020s

What you find when you start the journey to Snobol in the year 2024 is that Phil Budne of regressive.org has been maintaining a C implementation of Snobol called CSNOBOL4, "an open source port of Macro SNOBOL4," Budne maintains an absolute treasure trove of other SNOBOL info here: https://www.regressive.org/snobol4/

Well, after the initial hurdle of figuring out how to write a simple "Hello World" program, I settled into learning it properly from a vintage source.

The green Snobol4 book by Griswold et al next to an MNT Pocket Reform laptop editing my Snobol4th interpreter

The book is The Snobol4 Programming Language by R.E. Griswold, J.F Poage, and I.P. Polonsky.

What often strikes me about reading many of the older computer books is that the order of the content and the pacing clearly suggests that you’ll be reading the book away from a computer and you might get some time to try stuff out on a time-sharing mainframe or minicomputer system later. We moderns are so spoiled!

Anyway, I read the book and did the exercises. Great!

But I’m wise enough to not stop there. Past experience has taught me that I need a non-trivial project to really understand how to use a new language. (It always amazes me how I can fool myself into thinking I fully understand something after merely reading about it, you know?)

Here’s the ridiculous project I ended up building: the Snobol4th repo - "A Forth written in Snobol4".

I did reference the green book above while working on the project, but I found the Catspaw SNOBOL4 tutorial to be much more useful for looking up how to do things:

https://www.regressive.org/snobol4/docs/burks/tutorial/contents.htm

The completion of one project is just barely enough experience to speak intelligently about a programming language, but I think I learned enough to get a good feel for it. So, what do I make of Snobol?

Pluses and minuses of Snobol in a nutshell

To cut to the chase, I can confirm the rumors right off the bat: Snobol’s first-class support for string pattern matching is very good. Snobol patterns are expressive, fairly compact, and reasonably intuitive. When the moon and stars align and the existing primitives match your needs, it’s downright beautiful.

I can only imagine how magical Snobol must have felt in its heyday.

The biggest downside is the almost complete lack of support for structured programming. All control flow is basically labeled GOTOs. You can define functions, which take care of some housekeeping for you, but even those are shockingly awkward to write.

Other than the pattern matching, Snobol logic is like some weird, awkward high-level assembly. That’s perfectly reasonable considering this is a language that comes from the late 1960s when the programming language features we now take for granted were still hotly debated topics.

And yet. And yet, the conceptual simplicity of Snobol’s single type of statement is both a gift and a curse. It’s a gift to anyone who does not wish to be burdened with learning conceptually difficult control flow mechanisms. Anyone can understand GOTO. It’s simple.

But programming with GOTOs is a curse to those of us who have put the time and effort into learning the richer and more expressive control flow mechanisms of larger languages.

In the end, I’ll frame it in a positive way: Snobol is truly small and that can be a very wonderful thing. (See also Small Programs and Languages.)

Let’s dive in. This is not a Snobol4 tutorial. But I’m hoping that a handful simple examples will be enough to demonstrate what makes this such an interesting language.

Implicit programming

It’s tempting to start with the pattern matching stuff, but I don’t think it will make proper sense until you know how a line of Snobol code is written.

A simple Hello World program demonstrates how weird Snobol is.

Check this out:

    output = "Hello world!"
end

That indent looks like a mistake, right? But I assure you that the indenting is not merely intentional. It is required for the example to work. And it turns out there is a good reason for that.

Let’s run it:

Hello world!

Nice.

So let’s talk about that indented first line. The indenting is required because anything in the first column is implicitly a label.

Every line in a Snobol program consists of exactly five parts:

[label] [subject] [pattern] [replacement] [goto]

All of those parts are optional. Snobol can tell which ones you included because a replacement starts with a = and a goto starts with a :.

The end on the final line is a special label that, when reached, ends the program.

You have to have an end in your program or Snobol4 will refuse to run it.

Again, if you unindent the first line, Snobol will think output is a label and you’ll get an error. Let’s try it:

        OUTPUT = 'Hello world!'
                ^
hello_fail.sno:1: *** Erroneous subject

ERRORS DETECTED IN SOURCE PROGRAM

If output is interpreted as a label, then = is in the subject place, and that doesn’t make sense. So there you go.

Moving on, why does Hello world! print out when we run this program? Clearly, it has something to do with assigning to a variable called output. And that’s true, but output is no ordinary variable.

Any time you assign a string to output, it gets printed. Immediately!

It’s not a function. If you type output "Hello world!" without the equal sign (=), it won’t print anything. This ended up being the most common mistake I made over and over while I was working with Snobol. I couldn’t figure out why my output wasn’t showing up and it’s because I had typed output "Foo" rather than output = "Foo".

What’s really neat about assigning to a variable to print print is that you can use it to debug things by having their assignments go directly to the terminal. This turns out to be especially useful for debugging patterns. I show an example of this in the section below titled Snobol Patterns are "programmable".

Fun aside: To complete the feeling that this is a language from the 1960s, Snobol4 also has a punch variable that works just like output except it prints to the punchcard device instead of your terminal or teletype. You do have your punchcard reader hooked up, right?

Another implicit operation is string concatenation. Any time you put a string (or something that evaluates to a string) next to another string, they’re concatenated.

    foo = 5
    output = "Coconuts: " foo
end

Prints:

Coconuts: 5

Can you guess what the following program does?

    output = "You typed '" input "'"
end

If you guessed that input is the compliment to output and it gets a line from standard input, you’re right.

This program waits for you to enter your name and then greets you:

    output = "What is your name?"
    name = input
    output = "Hello " name "!"
end

Finally, comments are lines with a * in the label field:

* This is a comment

Control flow: Goto, success, failure, and functions

The last field in a Snobol statement is the "goto" field and it follows a colon :.

The goto can be unconditional. The following program always gets a line of input and then goes to the label bar.

Like end, bar goes in the label field.

    foo = input :(bar)
    output = "I'll be skipped!"
bar output = "You typed: " foo
end

Output:

horse
You typed: horse

An unconditional goto has its uses. But it’s much more interesting to have branching control flow based on a logical decision.

All control flow in Snobol depends on the success or failure of pattern matching. For each outcome, you can assign Success and Failure gotos.

The below example exits immediately if you don’t input anything (e.g. you explicitly type CTRL-D for EOF "End-Of-File" or pipe no input to this program):

    foo = input :f(end) s(bar)
    output = "I'll be skipped!"
bar output = "You typed: " foo
end

Here’s how this works:

The foo variable is the subject of our first statement.
The value of input is the replacement.
The value of input is whatever you type on your keyboard when you run the program.
Snobol replaces the contents of foo with the contents of input.
If the replacement fails because of lack of input, the statement fails.
In that case, we goto end.
Or, if the replacement is successful, we goto bar and print the results.
In either case, the "I’ll be skipped!" replacement never happens and we won’t ever see that message.

Successful Snobol statements have a value (which may be null). Failed statements do not have a value. Likewise, a lack of value is a failure. In the case of the program above, the failure to assign/replace a value into foo is the reason for the failure.

Functions

If you don’t have a desire to know how user-defined functions work in Snobol, feel free to skip this section and continue to pattern matching!

Functions in Snobol are basically just some gotos with some extra bookkeeping. (Well, I guess deep down, that’s true in any language, really, but it’s super true in Snobol.) I’ve seen a couple different ways of formatting these, but here’s how I ended up making all of my functions in Snobol4th.

* The 'foo' function:
    define('foo(bar)')  :(endfoo)
foo
    output = "I'll return the value bar+'F'."
    foo = bar 'F'
    :(return)
endfoo

The first line is a comment, but after that, I have a couple labels and gotos. Look at how many times I wrote the name "foo"! Every label in the Snobol program must be unique, so I fell into this pattern of naming for functions and stuck with it.

Let’s try it out the function:

    output = foo('BAR')
end

Output:

I'll return the value bar+'F'.
BARF

Well, it works.

The reason we need to have the unconditional goto :(endfoo) after the define call is that Snobol barely has functions.

Without the endfoo label and goto, Snobol would keep executing the code in your function definition! define just does some internal setup and then returns control to you. It’s not actually establishing a structured block of code like you might expect. Again, the resemblance to assembly language control flow is quite strong!

Going to the return label is special. It returns control to the callsite and the foo function "returns" the value set in the foo variable.

So that’s Snobol functions. Extremely primitive, but they work and you’ll be glad to have them in a larger program.

Pattern matching

(Quick Aside: I’m hard on regular expressions on this page, so I should set the record straight. I like regular expressions. Furthermore, learning the core set of regexp operators has been one of the best investments of my entire programming career. They show up all over the place and they’re incredibly useful.)

A diagram of a Snobol pattern from a book

Okay, now we are ready for the good stuff.

Pattern matching is what makes Snobol interesting and why people still mention it after all these years.

Snobol patterns are:

"First class"
Composable
Light on syntax
More powerful than regex
"Programmable"

Again, my purpose here is not to teach Snobol, so this coverage of Snobol’s patterns will be far from exhaustive. But I’ll try to do them justice by pointing out what I think make them interesting and why I think they are very much worthy of further study.

As briefly mentioned above, patterns are the third part of the five optional parts of a line of Snobol:

[label] [subject] [pattern] [replacement] [goto]

A pattern can be as simple as a string literal:

    "HELLO" "EL"

"HELLO" is the subject of the matching.
"EL" is the pattern used to match the subject.

Since "EL" can be found in "HELLO", this pattern matching is successful.

That’s a real, working Snobol statement, but not terribly useful. We aren’t doing anything with the matching success.

Pattern Success and Failure

When a match succeeds or fails, we can act on it with a goto.

s(label) goes to the label on success
f(label) goes to the label on failure

Let’s add success and failure gotos to this pattern matching:

    "HELLO" "EL" :s(yup) f(nope)

yup output = "There is an EL in HELLO." :(end)
nope output = "LOL, what?"
end

When run, this little program outputs:

There is an EL in HELLO.

Since the match is successful, program execution goes to the yup label.

Also note that the yup line also ends in an unconditional goto to end. We have to have this keep the nope line from executing after the yup!

I know I keep saying this, but, again, this style of control flow logic will seem very weird and awkward to most modern programmers unless you’ve been doing a lot of assembly language.

Replacements

When you want to replace all of the subject with a new string, it looks like this:

        [subject] = [replacement]

We’ve seen this with assigning ("replacing") the contents of output with a string we want to print.

But you can also replace part of a string with something else by using a pattern to determine which part to replace. That form looks like this:

        [subject] [pattern] = [replacement]

Anything matched by the pattern is replaced with the replacement string, which can include pieces captured in the pattern (we’ll talk about capturing below in the "programmability" of patterns section).

Here’s a simple example:

    story = "The chicken says 'baa.'"
    story "chicken" = "sheep"
    output = story
end

Outputs:

The sheep says 'baa.'

So that’s simple find-and-replace in action.

It’s worth repeating that a line with a replacement can also have a label and/or gotos. Are you starting to see the brilliant simplicity and uniformity of Snobol’s syntax?

Snobol patterns are "first class"

What does it mean to have patterns as first class elements of the language?

We’ve already seen simple string patterns like this one:

    cat_pat = 'meow'
    "The animal goes meow." cat_pat :s(cat) f(notcat)
cat    output = "It was a cat." :(end)
notcat output = "It was NOT a cat."
end

Output:

It was a cat.

But patterns can be more interesting. This one allows a "meow" or a "purr":

    cat_pat = 'meow' | 'purr'

And this one takes either a "hiss" or "yowl" before a "meow" or "purr" (with a space between):

    cat_pat = ( 'hiss' | 'yowl' ) ' ' ( 'meow' | 'purr' )

This matches the strings "hiss meow" or "The cat says yowl purr". But it does not match "my meow" or "hiss hiss".

For those like myself who are used to Regular Expressions, this looks very similar to the regexp: (hiss|yowl) (meow|purr). The most obvious difference is that string literals are quoted in the Snobol pattern, which clearly differentiates them from the pattern operators | and ().

This separation means Snobol avoids an entire class of common regexp headaches including tricky cases of escaping of special characters.

Snobol’s patterns certainly have me questioning the wisdom of regular expression’s character literals, character classes, and automata operators all combined in one messy jumble!

Snobol Patterns are composable

For anything even remotely complicated, you’re encouraged to break your patterns into small, simple pieces and compose them into larger patterns.

The process feels natural, greatly improves the readability of large patterns, and lets you re-use common parts.

Here’s that cat sounds pattern again:

    cat_pat = ( 'hiss' | 'yowl' ) ' ' ( 'meow' | 'purr' )

Let’s break it into two parts and compose them into the final pattern:

    cat_complaint = 'hiss' | 'yowl'
    cat_happy = 'meow' | 'purr'
    cat_pat = cat_complaint ' ' cat_happy

This simple composition doesn’t seem all that interesting at first. But something has changed between this and the previous one-line cat_pat definition. Can you see it?

The answer is that the parenthesis are no longer needed to enforce the order of operations.

The reason is that while concatenation is performed before alternation, the stored patterns cat_complaint and cat_happy are self-contained patterns. No parenthesis are needed because these expressions are not just smashed together like a string. They are composed properly, retaining their original precedence.

Like the original, this composed pattern will match one of these four strings:

"hiss meow"
"hiss purr"
"yowl meow"
"yowl purr"

This simple example shows how first-class patterns allow composition in a way that is NOT easily replicated by building up a regular expression string!

If you tried to do this with regular expressions, you’d end up with hiss|yowl meow|purr which would match one of these three strings:

"hiss"
"yowl meow"
"purr"

That’s not what I wanted.

We could also make a different pattern that accepts either a cat complaint or a happy cat sound. To do that, we can create an alternation between the two whole patterns:

    cat_pat = cat_complaint | cat_happy

Composability is awesome for reuse and readability.

Snobol patterns are light on syntax

Believe it or not, you’ve already seen pretty much the entire symbolic pattern matching syntax of Snobol:

| creates an alternation ("or") between two patterns
( ) group patterns to affect the order of operations

's three more symbols are used in patterns: ., @, and $; but they are not part of the pattern matching. We’ll see what these do in the next section about "programming" patterns.

Everything else is named patterns and pattern functions.

Snobol4 provides seven pre-made patterns you can compose into your own patterns: arb, rem, fence, bal, abort, fail, and succeed.

And functions that take parameters and produce patterns for you, such as any(chars), break(chars), len(n), notany(chars), pos(n), etc.

Not only are these names easier to read and remember than regular expression glyphs, there are also way more of them - so you get a lot more more expressive power right out of the box. It’s a richer vocabulary.

Let’s look at one of the provided primitive patterns.

arb matches zero or more characters. This pattern can match the string "meeeeoooow":

    'm' arb 'w'

It also matches "mow" or just "mw".

That’s roughly equivalent to the regular expression:

m.*w

Here’s what some of the simpler pattern functions do:

len(5) returns a pattern that matches any five characters
any('xyz') returns a pattern that matches any of the characters 'x', 'y', or 'z'
notany('xyz') does the opposite of any

You get the idea. They compose well and they give you everything you need, though learning to use them effectively does take a little study.

I’ll admit that when I was writing Snobol patterns, I did feel like there was room for improvement or expansion in the pre-made patterns and functions for the sorts of things I wanted to do. But the beauty of avoiding regular expression-like glyph syntax is that it would be a piece of cake to extend the language.

Snobol Patterns are more powerful than regular expressions

To put it in formal language theory terms:

Regular expressions are limited to type 3 ("regular") language in the Chomsky hierarchy.
Snobol patterns can recognize type 0 ("unrestricted") languages in the Chomsky hierarchy.

Snobol patterns can do things things that are impossible for regular expressions.

The canonical example is matching arbitrarily nested parenthesis. Classical regular expressions cannot do this. (It must be noted that many regexp implementations provide various extensions to allow recursive patterns or sub-expressions that do make this possible - but these extensions are largely incompatible with each other and make the syntax completely bonkers to read.)

By contrast, you can absolutely write a Snobol pattern to match nested parenthesis.

(Actually, Snobol "golfs" that particular problem away entirely by providing a pre-made bal pattern that specifically matches arbitrarily nested and correctly balanced parenthesis. So that one is kind of cheating. But you could also write it yourself.)

Let’s go further than that. One of the most famous answers on Stack Overflow is "You can’t parse HTML with regex". And that’s strictly true. I’ve matched parts of HTML on plenty of occasions. But you absolutely cannot parse the whole language with just classical regexps.

But you can parse HTML with Snobol patterns. I’d, uh, rather not. But you can. It’s possible.

(Aside: To expand on "you can’t parse HTML with regex" for a moment, I’d like to say that this is one of those things that scares people away from doing useful things. You absolutely cannot parse HTML-in-the-wild or HTML-the-W3-spec with regex. But you absolutely CAN parse subsets of HTML under your control and if doing so produces a useful tool, go for it! Demons will not emerge from the ground to swallow you. At least, I don’t think so. Anyway, they’ll get me first. So watch this space.)

Finally, perhaps you’ve seen BNF (or EBNF) grammar used to define Internet protocols in RFCs? Well, the composition of a big, multi-part Snobol pattern looks, and works, remarkably similar to BNF. The secret sauce is the composability mentioned above and the fact that Snobol patterns can be recursive like BNF!

Snobol Patterns are "programmable"

I use capture groups in regular expressions all the time, but it’s always bothered me that the group mechanism and the capture mechanism are one and the same:

/Foo (bar|baz)!/
/Size: (\n+)/

Maybe one of the above regular expressions uses grouping to allow the alternation of two strings, "bar" and "baz". And maybe other one uses grouping to capture a string of digits. But we can’t be sure. The intent is lost and both things are happening.

(And yes, many regular expression engines have extensions which allow named capture groups and non-capture groups which fixes this problem. The extension get hard to read. But the real problem is that the extensions are not standard between implementations, so they’re not portable between implementations.)

Snobol has a much better mechanism for capturing any portion of a pattern’s match. (The syntax may not be exactly what I might have chosen, but I love the mechanism anyway.)

To capture a match, follow it with a dot (.) and then a variable name. The subject string matching the pattern will be assigned to that variable.

It’s easier just to show it than describe it:

   dog_pat = "woof" | "bark"
   "The dog says bark." dog_pat . dog_sound
    output = "The dog went '" dog_sound "'"
end

Output:

The dog went 'bark'

The exact [pattern] . [variable] syntax isn’t my favorite thing in the world, but the mechanism is perfect.

This example is silly and contrived but hopefully it’s pretty clear what it does:

   dog_pat = "woof" . w | "bark" . b
   "The dog says bark." dog_pat
    output = "Woof? " w
    output = "Bark? " b
end

Output:

Woof?
Bark? bark

As you can see, the w capture variable did not capture anything because the "woof" alternative was not matched.

You can compose patterns with captures and it still works:

   moo = "moooo" . moovar
   low = "lowww" . lowvar
   cow_pat = moo arb low

   "moooo... lowww!" cow_pat
   output = "Moo? " moovar
   output = "Low? " lowvar
end

Output:

Moo? moooo
Low? lowww

Pretty interesting, right?

By giving the capture pattern/variable pairs moovar and lowvar their own lines, the composed pattern is super readable.

I think it’s pretty intuitive. Global namespace has its pluses and minuses for sure, but for simplicity, global wins.

Contrast this with the equivalent Perl-style regexp named capture groups:

(?<moovar>moooo).*(?<lowvar>lowww)

And that’s underselling Snobol because it’s such an extremely simple pattern. A real example with a more complicated pattern involving a few character classes and some escaping would give Snobol the clear edge on readability.

Let’s put what we’ve learned so far into a small example that uses a bit of success/failure logic. This tiny program loops over input (either typed or piped) and tells you whether or not you’ve written a variable assignment in a made-up language that uses a dollar sign ('$') before the variable name:

   var_statement = ( '$' arb ) . varname ' = ' arb ';'
again
   line = input :f(end)
   line var_statement :s(yes_match) f(no_match)
yes_match output = 'Matched variable "' varname '"' :(again)
no_match output = "No match." :(again)
end

Both the yes_match and no_match lines print something and then unconditionally jump to the again label.

We don’t need to re-define the var_statement pattern each time the loop executes, so that’s assigned before again.

Here’s a sample session with this program:

hello
No match.

No match.
foo = bar
No match.
$monkey = "Mr. Nilsson";
Matched variable "$monkey"
$qty = 16;
Matched variable "$qty"
^D

Writing simple interactive interpreters seems like a pretty good use-case for Snobol. Writing a Forth interpreter in Snobol is a weirder case, but even with that, I did get some pretty good use out of the pattern matching, variable captures, and success/failure states.

Of course, this was nothing you couldn’t do with regular expressions with capture groups inside a host language like Ruby or Perl. So let’s look at another feature.

Another "programmable" feature of Snobol’s patterns is the ability to access the pattern matching "cursor" as it scans a string.

As with the ability to store a match in a variable, you can also store the cursor position using the @ operator like so:

    'Big Honk' 'o' @o_position
    output = "The 'o' is at: " o_position
end

Output:

The 'o' is at: 6

And by the way, one of my favorite Snobol "ah ha!" moments was when I realized you can assign matches and positions to output in the pattern:

    output = "Matching..."
    mypat = 'Foo' @output arb . output 'Baz'
    'Foo Bar Baz' mypat
    output = "Done."
end

Output:

Matching...
3
 Bar
Done.

Pretty neat, right?

This seems amazing for debugging, but there’s a problem. Assignments are conditional, so if the pattern didn’t match, they’re not assigned. Nothing would be printed to output.

So maybe assigning to output is not so useful for debugging?

Don’t worry, there’s also an "immediate assignment" operator, $, which always assigns matched strings, even if the rest of the pattern fails.

    output = "Matching..."
    mypat = 'Foo' $ output arb $ output 'Baz'
    'Foo Bar Barf' mypat
    output = "Done."
end

Output:

Matching...
Foo


 B
 Ba
 Bar
 Bar
 Bar B
Done.

We didn’t write a loop. Can you guess why that incremental output was printed?

If you guessed that you’re seeing the internal state of the scanner, you are correct.

I highly recommend throwing some $ output assignments in your failing patterns if you need help figuring out what’s happening. I wish I’d known about it sooner in my learning.

It feels naughty at first, but Snobol actively encourages you to make use of the scanner position. For example, the pattern functions pos(number) and rpos(number) both verify the scanner position from either the left or right end of the string.

The pattern function tab(number) matches everything until the given position, and you’ll most likely use it to assign a value to a variable. The name "tab" implies that you’re expected to use this with tabular data, but you can use it any time you want to start matching at a particular position in a string.

I’ll use tab() in an example in a moment.

Keep in mind that pattern functions don’t perform pattern matching, they return a pattern that will perform the matching. And you can save that pattern in a variable and compose with it later.

Stored patterns get even more interesting with unevaluated expressions. You can make an unevaluated expression with the '*' symbol.

Here’s an example ripped from my Snobol4th interpreter:

    wordpat = tab(*pos) anyspace (break(' ') | rem) . token anyspace @pos

This isn’t a simple example, but the part I want to focus on is tab(*pos), which uses the Snobol function tab() to create a pattern that matches until a given position in the subject string. In this case, I’m scanning a Forth program one "word" at a time and assigning the word to a variable named token.

The important thing is that I’m keeping track of my position in the Forth source input with the pos variable.

If I had written tab(pos) in the wordpat pattern, then pos would have been evaluated at the time the pattern was created, and it would have been hard-baked as tab(0).

But since I have tab(*pos), the pos variable is evaluated at the time the pattern is used, which is what I want.

Maybe a better name for this feature is "delayed evaluation" rather than "unevaluated"?

Lastly, and to really drive home the "programmability" of patterns, consider the potent combination of immediate assignment with an unevaluated expression:

    rle_str = "09:ABCDEFGHIOOOOOO"
    read_pat = span('0123456789') $ n ':' len(*n) . datastr
    rle_str read_pat
    output = datastr
end

Output:

ABCDEFGHI

Do you see how this works? The rle_str is a run-length encoded string in which the number of bytes to read from the string comes first in the form length:data+padding.

The captured length is stored immediately as the subject is scanned into variable n. The value of n is evaluated as it is used in the len() function to return a pattern that matches a string of length n. And that result is stored in a variable datastr.

Conclusion

The uniformity of Snobol’s five-element line syntax and the consistency with which it uses only that one syntax for everything makes the language easy to describe and easy to absorb (but maybe a little challenging to master?).

It’s weird, yes, but only if you’re used to "normal" languages.

One of the things that always disappointed me about Awk was that it starts off with this clean and effective implicit "match something and take an action" concept, but it only lets you apply that to one level of your input. After that, you’re hand-writing for loops. This has always felt like like an arbitrary limitation and a huge missed opportunity in Awk.

Unlike Awk, Snobol takes the idea of "match something and take an action" and lets you (well, pretty much forces you, really) to use it at every level of your input data. You can match blocks and then lines and then words and then letters with the exact same syntax. I like that a lot.

There are things I would change about the language syntax if somebody asked me to make a Snobol5. The required "indentation" on lines without labels would be the first thing to go. Function support would be first-class. I’m sure I’d be tempted to add loops and iterators.

But, there are a lot of things I would keep. Part of my motivation for learning Snobol has always been to see if it had some good string processing ideas to "steal" for a possible future tool/language of my own. Having seen it and used it, I will definitely be coming back to Snobol4 to see how I might implement my own take on first-class pattern expressions.

Snobol is absolutely worth learning if you want to see an alternative to regular expressions. It’s also fascinating to study for its conceptual minimalism.

You might also enjoy learning a tiny bit about The Snobol Implementation Language (SIL).