Ratfactor's Judgement of Snobol4

SNOBOL. Perhaps you’ve heard of it? I often see it come up in conversations about alternatives to regular expressions. Snobol is a whole language based around the concept of string pattern matching.
(Note: Pattern matching is, by far, the most interesting part of Snobol. Feel free to jump down to the "Pattern Matching" section. Though I can’t promise it’ll make sense if you do.)
I’ve had it on my crowded mental "to-learn" list for ages but always found a way to put it off.
Learning Snobol4 in the 2020s
What you find when you start the journey to Snobol in the year 2024 is that Phil Budne of regressive.org has been maintaining a C implementation of Snobol called CSNOBOL4, "an open source port of Macro SNOBOL4," Budne maintains an absolute treasure trove of other SNOBOL info here: https://www.regressive.org/snobol4/
Well, after the initial hurdle of figuring out how to write a simple "Hello World" program, I settled into learning it properly from a vintage source.

The book is The Snobol4 Programming Language by R.E. Griswold, J.F Poage, and I.P. Polonsky.
What often strikes me about reading many of the older computer books is that the order of the content and the pacing clearly suggests that you’ll be reading the book away from a computer and you might get some time to try stuff out on a time-sharing mainframe or minicomputer system later. We moderns are so spoiled!
Anyway, I read the book and did the exercises. Great!
But I’m wise enough to not stop there. Past experience has taught me that I need a non-trivial project to really understand how to use a new language. (It always amazes me how I can fool myself into thinking I fully understand something after merely reading about it, you know?)
Here’s the ridiculous project I ended up building: the Snobol4th repo - "A Forth written in Snobol4".
I did reference the green book above while working on the project, but I found the Catspaw SNOBOL4 tutorial to be much more useful for looking up how to do things:
The completion of one project is just barely enough experience to speak intelligently about a programming language, but I think I learned enough to get a good feel for it. So, what do I make of Snobol?
Pluses and minuses of Snobol in a nutshell
To cut to the chase, I can confirm the rumors right off the bat: Snobol’s first-class support for string pattern matching is very good. Snobol patterns are expressive, fairly compact, and reasonably intuitive. When the moon and stars align and the existing primitives match your needs, it’s downright beautiful.
I can only imagine how magical Snobol must have felt in its heyday.
The biggest downside is the almost complete lack of support for structured programming. All control flow is basically labeled GOTOs. You can define functions, which take care of some housekeeping for you, but even those are shockingly awkward to write.
Other than the pattern matching, Snobol logic is like some weird, awkward high-level assembly. That’s perfectly reasonable considering this is a language that comes from the late 1960s when the programming language features we now take for granted were still hotly debated topics.
And yet. And yet, the conceptual simplicity of Snobol’s single type of statement is both a gift and a curse. It’s a gift to anyone who does not wish to be burdened with learning conceptually difficult control flow mechanisms. Anyone can understand GOTO. It’s simple.
But programming with GOTOs is a curse to those of us who have put the time and effort into learning the richer and more expressive control flow mechanisms of larger languages.
In the end, I’ll frame it in a positive way: Snobol is truly small and that can be a very wonderful thing.
Let’s dive in. This is not a Snobol4 tutorial. But I’m hoping that a handful simple examples will be enough to demonstrate what makes this such an interesting language.
Implicit programming
It’s tempting to start with the pattern matching stuff, but I don’t think it will make proper sense until you know how a line of Snobol code is written.
A simple Hello World program demonstrates how weird Snobol is.
Check this out:
output = "Hello world!" end
That indent looks like a mistake, right? But I assure you that the indenting is not merely intentional. It is required for the example to work. And it turns out there is a good reason for that.
Let’s run it:
Hello world!
Nice.
So let’s talk about that indented first line. The indenting is required because anything in the first column is implicitly a label.
Every line in a Snobol program consists of exactly five parts:
[label] [subject] [pattern] [replacement] [goto]
All of those parts are optional. Snobol can tell which ones you included
because a replacement starts with a =
and a goto starts with a :
.
The end
on the final line is a special label that, when reached, ends the
program.
You have to have an end
in your program or Snobol4 will refuse to run it.
Again, if you unindent the first line, Snobol will think output
is a label
and you’ll get an error. Let’s try it:
OUTPUT = 'Hello world!' ^ hello_fail.sno:1: *** Erroneous subject ERRORS DETECTED IN SOURCE PROGRAM
If output
is interpreted as a label, then =
is in the subject place,
and that doesn’t make sense. So there you go.
Moving on, why does Hello world!
print out when we run this program?
Clearly, it has something to do with assigning to a variable called
output
. And that’s true, but output
is no ordinary variable.
Any time you assign a string to output
, it gets printed. Immediately!
It’s not a function. If you type output "Hello world!"
without the
equal sign (=
), it won’t print anything. This ended up being the most common
mistake I made over and over while I was working with Snobol. I couldn’t figure
out why my output wasn’t showing up and it’s because I had typed
output "Foo"
rather than output = "Foo"
.
What’s really neat about assigning to a variable to print print is that you can use it to debug things by having their assignments go directly to the terminal. This turns out to be especially useful for debugging patterns. I show an example of this in the section below titled Snobol Patterns are "programmable".
Fun aside: To complete the feeling that this is a language from the 1960s,
Snobol4 also has a punch
variable that works just like output
except it
prints to the punchcard device instead of your terminal or teletype. You do
have your punchcard reader hooked up, right?
Another implicit operation is string concatenation. Any time you put a string (or something that evaluates to a string) next to another string, they’re concatenated.
foo = 5 output = "Coconuts: " foo end
Prints:
Coconuts: 5
Can you guess what the following program does?
output = "You typed '" input "'" end
If you guessed that input
is the compliment to output
and it gets a
line from standard input, you’re right.
This program waits for you to enter your name and then greets you:
output = "What is your name?" name = input output = "Hello " name "!" end
Finally, comments are lines with a *
in the label field:
* This is a comment
Control flow: Goto, success, failure, and functions
The last field in a Snobol statement is the "goto" field and it follows a colon
:
.
The goto can be unconditional. The following program always gets a line of
input and then goes to the label bar
.
Like end
, bar
goes in the label field.
foo = input :(bar) output = "I'll be skipped!" bar output = "You typed: " foo end
Output:
horse You typed: horse
An unconditional goto has its uses. But it’s much more interesting to have branching control flow based on a logical decision.
All control flow in Snobol depends on the success or failure of pattern matching. For each outcome, you can assign Success and Failure gotos.
The below example exits immediately if you don’t input anything (e.g. you
explicitly type CTRL-D
for EOF "End-Of-File" or pipe no input to this
program):
foo = input :f(end) s(bar) output = "I'll be skipped!" bar output = "You typed: " foo end
Here’s how this works:
-
The
foo
variable is the subject of our first statement. -
The value of
input
is the replacement. -
The value of
input
is whatever you type on your keyboard when you run the program. -
Snobol replaces the contents of
foo
with the contents ofinput
. -
If the replacement fails because of lack of input, the statement fails.
-
In that case, we goto
end
. -
Or, if the replacement is successful, we goto
bar
and print the results. -
In either case, the "I’ll be skipped!" replacement never happens and we won’t ever see that message.
Successful Snobol statements have a value (which may be null). Failed
statements do not have a value. Likewise, a lack of value is a failure. In the
case of the program above, the failure to assign/replace a value into foo
is
the reason for the failure.
Functions
If you don’t have a desire to know how user-defined functions work in Snobol, feel free to skip this section and continue to pattern matching!
Functions in Snobol are basically just some gotos with some extra bookkeeping. (Well, I guess deep down, that’s true in any language, really, but it’s super true in Snobol.) I’ve seen a couple different ways of formatting these, but here’s how I ended up making all of my functions in Snobol4th.
* The 'foo' function: define('foo(bar)') :(endfoo) foo output = "I'll return the value bar+'F'." foo = bar 'F' :(return) endfoo
The first line is a comment, but after that, I have a couple labels and gotos. Look at how many times I wrote the name "foo"! Every label in the Snobol program must be unique, so I fell into this pattern of naming for functions and stuck with it.
Let’s try it out the function:
output = foo('BAR') end
Output:
I'll return the value bar+'F'. BARF
Well, it works.
The reason we need to have
the unconditional goto :(endfoo)
after the define
call is that
Snobol barely has functions.
Without the endfoo
label and goto, Snobol would keep executing the code in
your function definition! define
just does some internal setup and then
returns control to you. It’s not actually establishing a structured block of
code like you might expect.
Again, the resemblance to assembly language control flow is quite strong!
Going to the return
label is special. It returns control to the callsite and
the foo
function "returns" the value set in the foo
variable.
So that’s Snobol functions. Extremely primitive, but they work and you’ll be glad to have them in a larger program.
Pattern matching
(Quick Aside: I’m hard on regular expressions on this page, so I should set the record straight. I like regular expressions. Furthermore, learning the core set of regexp operators has been one of the best investments of my entire programming career. They show up all over the place and they’re incredibly useful.)

Okay, now we are ready for the good stuff.
Pattern matching is what makes Snobol interesting and why people still mention it after all these years.
Snobol patterns are:
-
"First class"
-
Composable
-
Light on syntax
-
More powerful than regex
-
"Programmable"
Again, my purpose here is not to teach Snobol, so this coverage of Snobol’s patterns will be far from exhaustive. But I’ll try to do them justice by pointing out what I think make them interesting and why I think they are very much worthy of further study.
As briefly mentioned above, patterns are the third part of the five optional parts of a line of Snobol:
[label] [subject] [pattern] [replacement] [goto]
A pattern can be as simple as a string literal:
"HELLO" "EL"
-
"HELLO"
is the subject of the matching. -
"EL"
is the pattern used to match the subject.
Since "EL" can be found in "HELLO", this pattern matching is successful.
That’s a real, working Snobol statement, but not terribly useful. We aren’t doing anything with the matching success.
Pattern Success and Failure
When a match succeeds or fails, we can act on it with a goto.
-
s(label)
goes to the label on success -
f(label)
goes to the label on failure
Let’s add success and failure gotos to this pattern matching:
"HELLO" "EL" :s(yup) f(nope) yup output = "There is an EL in HELLO." :(end) nope output = "LOL, what?" end
When run, this little program outputs:
There is an EL in HELLO.
Since the match is successful, program execution goes to the yup
label.
Also note that the yup
line also ends in an unconditional goto to end
. We
have to have this keep the nope
line from executing after the yup
!
I know I keep saying this, but, again, this style of control flow logic will seem very weird and awkward to most modern programmers unless you’ve been doing a lot of assembly language.
Replacements
When you want to replace all of the subject with a new string, it looks like this:
[subject] = [replacement]
We’ve seen this with assigning ("replacing") the contents of output
with
a string we want to print.
But you can also replace part of a string with something else by using a pattern to determine which part to replace. That form looks like this:
[subject] [pattern] = [replacement]
Anything matched by the pattern is replaced with the replacement string, which can include pieces captured in the pattern (we’ll talk about capturing below in the "programmability" of patterns section).
Here’s a simple example:
story = "The chicken says 'baa.'" story "chicken" = "sheep" output = story end
Outputs:
The sheep says 'baa.'
So that’s simple find-and-replace in action.
It’s worth repeating that a line with a replacement can also have a label and/or gotos. Are you starting to see the brilliant simplicity and uniformity of Snobol’s syntax?
Snobol patterns are "first class"
What does it mean to have patterns as first class elements of the language?
We’ve already seen simple string patterns like this one:
cat_pat = 'meow' "The animal goes meow." cat_pat :s(cat) f(notcat) cat output = "It was a cat." :(end) notcat output = "It was NOT a cat." end
Output:
It was a cat.
But patterns can be more interesting. This one allows a "meow" or a "purr":
cat_pat = 'meow' | 'purr'
And this one takes either a "hiss" or "yowl" before a "meow" or "purr" (with a space between):
cat_pat = ( 'hiss' | 'yowl' ) ' ' ( 'meow' | 'purr' )
This matches the strings "hiss meow" or "The cat says yowl purr". But it does not match "my meow" or "hiss hiss".
For those like myself who are used to Regular Expressions,
this looks very similar to the regexp: (hiss|yowl) (meow|purr)
.
The most obvious difference is that string literals are
quoted in the Snobol pattern, which clearly differentiates
them from the pattern operators |
and ()
.
This separation means Snobol avoids an entire class of common regexp headaches including tricky cases of escaping of special characters.
Snobol’s patterns certainly have me questioning the wisdom of regular expression’s character literals, character classes, and automata operators all combined in one messy jumble!
Snobol Patterns are composable
For anything even remotely complicated, you’re encouraged to break your patterns into small, simple pieces and compose them into larger patterns.
The process feels natural, greatly improves the readability of large patterns, and lets you re-use common parts.
Here’s that cat sounds pattern again:
cat_pat = ( 'hiss' | 'yowl' ) ' ' ( 'meow' | 'purr' )
Let’s break it into two parts and compose them into the final pattern:
cat_complaint = 'hiss' | 'yowl' cat_happy = 'meow' | 'purr' cat_pat = cat_complaint ' ' cat_happy
This simple composition doesn’t seem all that interesting at first. But
something has changed between this and the previous one-line cat_pat
definition. Can you see it?
The answer is that the parenthesis are no longer needed to enforce the order of operations.
The reason is that while concatenation is performed before
alternation, the stored patterns cat_complaint
and cat_happy
are self-contained patterns. No parenthesis are needed
because these expressions are not just smashed together like a string.
They are composed properly, retaining their original precedence.
Like the original, this composed pattern will match one of these four strings:
-
"hiss meow"
-
"hiss purr"
-
"yowl meow"
-
"yowl purr"
This simple example shows how first-class patterns allow composition in a way that is NOT easily replicated by building up a regular expression string!
If you tried to do this with regular expressions, you’d end up with
hiss|yowl meow|purr
which
would match one of these three strings:
-
"hiss"
-
"yowl meow"
-
"purr"
That’s not what I wanted.
We could also make a different pattern that accepts either a cat complaint or a happy cat sound. To do that, we can create an alternation between the two whole patterns:
cat_pat = cat_complaint | cat_happy
Composability is awesome for reuse and readability.
Snobol patterns are light on syntax
Believe it or not, you’ve already seen pretty much the entire symbolic pattern matching syntax of Snobol:
-
|
creates an alternation ("or") between two patterns -
( )
group patterns to affect the order of operations
's three more symbols are used in patterns: .
, @
, and $
; but they are
not part of the pattern matching. We’ll see what these do in the next section
about "programming" patterns.
Everything else is named patterns and pattern functions.
Snobol4 provides seven pre-made patterns you can compose into your own
patterns: arb
, rem
, fence
, bal
, abort
, fail
, and succeed
.
And functions that take parameters and produce patterns for you, such as
any(chars)
, break(chars)
, len(n)
, notany(chars)
, pos(n)
, etc.
Not only are these names easier to read and remember than regular expression glyphs, there are also way more of them - so you get a lot more more expressive power right out of the box. It’s a richer vocabulary.
Let’s look at one of the provided primitive patterns.
arb
matches zero or more characters.
This pattern can match the string "meeeeoooow":
'm' arb 'w'
It also matches "mow" or just "mw".
That’s roughly equivalent to the regular expression:
m.*w
Here’s what some of the simpler pattern functions do:
-
len(5)
returns a pattern that matches any five characters -
any('xyz')
returns a pattern that matches any of the characters 'x', 'y', or 'z' -
notany('xyz')
does the opposite ofany
You get the idea. They compose well and they give you everything you need, though learning to use them effectively does take a little study.
I’ll admit that when I was writing Snobol patterns, I did feel like there was room for improvement or expansion in the pre-made patterns and functions for the sorts of things I wanted to do. But the beauty of avoiding regular expression-like glyph syntax is that it would be a piece of cake to extend the language.
Snobol Patterns are more powerful than regular expressions
To put it in formal language theory terms:
-
Regular expressions are limited to type 3 ("regular") language in the Chomsky hierarchy.
-
Snobol patterns can recognize type 0 ("unrestricted") languages in the Chomsky hierarchy.
Snobol patterns can do things things that are impossible for regular expressions.
The canonical example is matching arbitrarily nested parenthesis. Classical regular expressions cannot do this. (It must be noted that many regexp implementations provide various extensions to allow recursive patterns or sub-expressions that do make this possible - but these extensions are largely incompatible with each other and make the syntax completely bonkers to read.)
By contrast, you can absolutely write a Snobol pattern to match nested parenthesis.
(Actually, Snobol "golfs" that particular problem away entirely by providing a
pre-made bal
pattern that specifically matches arbitrarily nested and correctly
balanced parenthesis. So that one is kind of cheating. But you could also write it
yourself.)
Let’s go further than that. One of the most famous answers on Stack Overflow is "You can’t parse HTML with regex". And that’s strictly true. I’ve matched parts of HTML on plenty of occasions. But you absolutely cannot parse the whole language with just classical regexps.
But you can parse HTML with Snobol patterns. I’d, uh, rather not. But you can. It’s possible.
(Aside: To expand on "you can’t parse HTML with regex" for a moment, I’d like to say that this is one of those things that scares people away from doing useful things. You absolutely cannot parse HTML-in-the-wild or HTML-the-W3-spec with regex. But you absolutely CAN parse subsets of HTML under your control and if doing so produces a useful tool, go for it! Demons will not emerge from the ground to swallow you. At least, I don’t think so. Anyway, they’ll get me first. So watch this space.)
Finally, perhaps you’ve seen BNF (or EBNF) grammar used to define Internet protocols in RFCs? Well, the composition of a big, multi-part Snobol pattern looks, and works, remarkably similar to BNF. The secret sauce is the composability mentioned above and the fact that Snobol patterns can be recursive like BNF!
Snobol Patterns are "programmable"
I use capture groups in regular expressions all the time, but it’s always bothered me that the group mechanism and the capture mechanism are one and the same:
/Foo (bar|baz)!/ /Size: (\n+)/
Maybe one of the above regular expressions uses grouping to allow the alternation of two strings, "bar" and "baz". And maybe other one uses grouping to capture a string of digits. But we can’t be sure. The intent is lost and both things are happening.
(And yes, many regular expression engines have extensions which allow named capture groups and non-capture groups which fixes this problem. The extension get hard to read. But the real problem is that the extensions are not standard between implementations, so they’re not portable between implementations.)
Snobol has a much better mechanism for capturing any portion of a pattern’s match. (The syntax may not be exactly what I might have chosen, but I love the mechanism anyway.)
To capture a match, follow it with a dot (.
) and then a variable
name. The subject string matching the pattern will be assigned to that
variable.
It’s easier just to show it than describe it:
dog_pat = "woof" | "bark" "The dog says bark." dog_pat . dog_sound output = "The dog went '" dog_sound "'" end
Output:
The dog went 'bark'
The exact [pattern] . [variable]
syntax isn’t my favorite thing in the
world, but the mechanism is perfect.
This example is silly and contrived but hopefully it’s pretty clear what it does:
dog_pat = "woof" . w | "bark" . b "The dog says bark." dog_pat output = "Woof? " w output = "Bark? " b end
Output:
Woof? Bark? bark
As you can see, the w
capture variable did not capture anything because the
"woof" alternative was not matched.
You can compose patterns with captures and it still works:
moo = "moooo" . moovar low = "lowww" . lowvar cow_pat = moo arb low "moooo... lowww!" cow_pat output = "Moo? " moovar output = "Low? " lowvar end
Output:
Moo? moooo Low? lowww
Pretty interesting, right?
By giving the capture pattern/variable pairs moovar
and lowvar
their own
lines, the composed pattern is super readable.
I think it’s pretty intuitive. Global namespace has its pluses and minuses for sure, but for simplicity, global wins.
Contrast this with the equivalent Perl-style regexp named capture groups:
(?<moovar>moooo).*(?<lowvar>lowww)
And that’s underselling Snobol because it’s such an extremely simple pattern. A real example with a more complicated pattern involving a few character classes and some escaping would give Snobol the clear edge on readability.
Let’s put what we’ve learned so far into a small example that uses a bit of success/failure logic. This tiny program loops over input (either typed or piped) and tells you whether or not you’ve written a variable assignment in a made-up language that uses a dollar sign ('$') before the variable name:
var_statement = ( '$' arb ) . varname ' = ' arb ';' again line = input :f(end) line var_statement :s(yes_match) f(no_match) yes_match output = 'Matched variable "' varname '"' :(again) no_match output = "No match." :(again) end
Both the yes_match
and no_match
lines print something and then
unconditionally jump to the again
label.
We don’t need to re-define the var_statement
pattern each time the loop
executes, so that’s assigned before again
.
Here’s a sample session with this program:
hello No match. No match. foo = bar No match. $monkey = "Mr. Nilsson"; Matched variable "$monkey" $qty = 16; Matched variable "$qty" ^D
Writing simple interactive interpreters seems like a pretty good use-case for Snobol. Writing a Forth interpreter in Snobol is a weirder case, but even with that, I did get some pretty good use out of the pattern matching, variable captures, and success/failure states.
Of course, this was nothing you couldn’t do with regular expressions with capture groups inside a host language like Ruby or Perl. So let’s look at another feature.
Another "programmable" feature of Snobol’s patterns is the ability to access the pattern matching "cursor" as it scans a string.
As with the ability to store a match in a variable, you can also store the
cursor position using the @
operator like so:
'Big Honk' 'o' @o_position output = "The 'o' is at: " o_position end
Output:
The 'o' is at: 6
And by the way, one of my favorite Snobol "ah ha!" moments was when I realized
you can assign matches and positions to output
in the pattern:
output = "Matching..." mypat = 'Foo' @output arb . output 'Baz' 'Foo Bar Baz' mypat output = "Done." end
Output:
Matching... 3 Bar Done.
Pretty neat, right?
This seems amazing for debugging, but there’s a problem. Assignments are conditional, so if the pattern didn’t match, they’re not assigned. Nothing would be printed to output.
So maybe assigning to output
is not so useful for debugging?
Don’t worry, there’s also an "immediate assignment" operator, $
, which
always assigns matched strings, even if the rest of the pattern fails.
output = "Matching..." mypat = 'Foo' $ output arb $ output 'Baz' 'Foo Bar Barf' mypat output = "Done." end
Output:
Matching... Foo B Ba Bar Bar Bar B Done.
We didn’t write a loop. Can you guess why that incremental output was printed?
If you guessed that you’re seeing the internal state of the scanner, you are correct.
I highly recommend throwing some $ output
assignments in your failing
patterns if you need help figuring out what’s happening. I wish I’d known about
it sooner in my learning.
It feels naughty at first, but Snobol actively encourages you to make use of
the scanner position. For example, the pattern functions pos(number)
and
rpos(number)
both verify the scanner position from either the left or right
end of the string.
The pattern function tab(number)
matches everything until the given position,
and you’ll most likely use it to assign a value to a variable. The name "tab"
implies that you’re expected to use this with tabular data, but you can use it
any time you want to start matching at a particular position in a string.
I’ll use tab()
in an example in a moment.
Keep in mind that pattern functions don’t perform pattern matching, they return a pattern that will perform the matching. And you can save that pattern in a variable and compose with it later.
Stored patterns get even more interesting with unevaluated expressions. You can make an unevaluated expression with the '*' symbol.
Here’s an example ripped from my Snobol4th interpreter:
wordpat = tab(*pos) anyspace (break(' ') | rem) . token anyspace @pos
This isn’t a simple example, but the part I want to focus on is
tab(*pos)
, which uses the Snobol function tab()
to create a pattern
that matches until a given position in the subject string.
In this case, I’m scanning a Forth program one "word" at a time and assigning
the word to a variable named token
.
The important thing is that I’m keeping track of my position in the
Forth source input with the pos
variable.
If I had written tab(pos)
in the wordpat
pattern, then pos
would have
been evaluated at the time the pattern was created, and it would have been
hard-baked as tab(0)
.
But since I have tab(*pos)
, the pos
variable is evaluated at the time the
pattern is used, which is what I want.
Maybe a better name for this feature is "delayed evaluation" rather than "unevaluated"?
Lastly, and to really drive home the "programmability" of patterns, consider the potent combination of immediate assignment with an unevaluated expression:
rle_str = "09:ABCDEFGHIOOOOOO" read_pat = span('0123456789') $ n ':' len(*n) . datastr rle_str read_pat output = datastr end
Output:
ABCDEFGHI
Do you see how this works? The rle_str
is a run-length encoded string
in which the number of bytes to read from the string comes first in
the form length:data+padding
.
The captured length is stored immediately as the subject is scanned into
variable n
. The value of n
is evaluated as it is used in the len()
function to return a pattern that matches a string of length n
. And that
result is stored in a variable datastr
.
Conclusion
The uniformity of Snobol’s five-element line syntax and the consistency with which it uses only that one syntax for everything makes the language easy to describe and easy to absorb (but maybe a little challenging to master?).
It’s weird, yes, but only if you’re used to "normal" languages.
One of the things that always disappointed me about Awk was that it starts off
with this clean and effective implicit "match something and take an action"
concept, but it only lets you apply that to one level of your input. After
that, you’re hand-writing for
loops. This has always felt like like an
arbitrary limitation and a huge missed opportunity in Awk.
Unlike Awk, Snobol takes the idea of "match something and take an action" and lets you (well, pretty much forces you, really) to use it at every level of your input data. You can match blocks and then lines and then words and then letters with the exact same syntax. I like that a lot.
There are things I would change about the language syntax if somebody asked me to make a Snobol5. The required "indentation" on lines without labels would be the first thing to go. Function support would be first-class. I’m sure I’d be tempted to add loops and iterators.
But, there are a lot of things I would keep. Part of my motivation for learning Snobol has always been to see if it had some good string processing ideas to "steal" for a possible future tool/language of my own. Having seen it and used it, I will definitely be coming back to Snobol4 to see how I might implement my own take on first-class pattern expressions.
Snobol is absolutely worth learning if you want to see an alternative to regular expressions. It’s also fascinating to study for its conceptual minimalism.
You might also enjoy learning a tiny bit about The Snobol Implementation Language (SIL).