Slackware's Dobbs Tux with Pipe logo - not created by me
Dave's Slackware Package Blog

coreutils

The mega-package of GNU core utilities!
Page created: 2019-09-22 , updated: 2023-05-08 (Many updates!)

This package is not what I had in mind when I started this project! It’s quite a large collection of utilities and it’s going to take me a while to get through them all.

I’m going to simply tackle each executable in alphabetical order.

Note: If you really want to dig into the ultimate "source" of truth (pun intended) for these tools, check out the GitHub mirror of the coreutils source:

https://github.com/coreutils/coreutils/tree/master/src

Well, let’s get on with it: an exploration of each of the coreutils commands!

[ and test

These both take an expression as a command line argument, evaluate it, and exit with a 0 (success) or 1 (fail) status.

In terms of the values of the argument string itself, empty args are fail/false and everything else is success/true. We can see that with the special variable $? in bash, which prints the last exit status:

$ test;       echo $? # 1
$ test 1;     echo $? # 0
$ test 0;     echo $? # 0
$ test true;  echo $? # 0
$ test false; echo $? # 0

Or to perhaps make it clearer:

$ if test 1 ; then echo 'true' ; else echo 'false' ; fi  # true
$ if test ;   then echo 'true' ; else echo 'false' ; fi  # false

Hey, wait a second, isn’t test and [ provided by Bash!? Yes, that’s correct.

Without specifically specifying the path to the executables, Bash is going to supply the test functionality with the shell built-in version of these:

$ [ --version
bash: [: missing `]'

So to call the executables from the coreutils package, we have to supply an explicit path:

$ /bin/[ --version
[ (GNU coreutils) 8.25
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Kevin Braunsdorf and Matthew Bradburn.

I’ll be honest, I couldn’t be bothered to see if there were any differences between Gnu Bash and Gnu coreutils versions of test. My eyes glazed over after staring at the output of help test (Bash) and man test (coreutils) for a while.

I did think it was interesting that [ and test are separate executables as installed by this package (as opposed to one of them being a symlink to the other). And the executables are not identical:

$ md5sum /bin/[ /bin/test
6e6588788b3ec5110b4532f8f1d912e3  /bin/[
76db2a10639424c0ba7c09c9d6626ec5  /bin/test

So I took a look at the coreutils source (github mirror) of test.c and found that they are the same program, but compiled with slightly different options.

...
  if (LBRACKET)
    {
      /* Recognize --help or --version, but only when invoked in the
         "[" form, when the last argument is not "]".  Use direct
         parsing, rather than parse_long_options, to avoid accepting
         abbreviations.  POSIX allows "[ --help" and "[ --version" to
         have the usual GNU behavior, but it requires "test --help"
         and "test --version" to exit silently with status 0.  */
...

That’s the only difference.

Here is the entire contents of lbracket.c:

#define LBRACKET 1
#include "test.c"

arch

Prints the machine architecture.

Example:

$ arch
x86_64

The man page points out that this is the same as running uname -m, which I can confirm so you don’t have to.

base32

Encodes and decodes (typically binary) data as a string using A-Z and 2-7. Here I encode and decode a string. This is generally a silly thing to do, but the base32 encoded string is a valid filename, unlike the input string (which contains a '/'.

$ echo "Hello! Foo/Bar" | base32
JBSWY3DPEEQEM33PF5BGC4QK
$ echo "JBSWY3DPEEQEM33PF5BGC4QK" | base32 -d
Hello! Foo/Bar

The -w option to wrap columns may be quite handy if you’re sending this data through a communication platform that will do hard line wrapping for you. Or perhaps for printing if you want to have hardcopy of some small piece of binary data?

I enjoy the --ignore-garbage option, which I also use around the house when I want to relax.

base64

Works exactly the same way, but uses 64 characters (including '/'), so the result can’t reliably used as a filename, etc.:

$ echo "Hello! Foo/Bar" | base64
SGVsbG8hIEZvby9CYXIK
$ echo "SGVsbG8hIEZvby9CYXIK" | base64 -d
Hello! Foo/Bar

Notice the inclusion of lower case characters and the better information density of the base64 encoding.

Any time you find yourself in a text-only medium, but you need to send or store binary data, base32 and base64

basename

A very handy utility to know for both scripting and constructing cool one-off commands. Best demonstrated and learned by trying it out:

$ basename /foo/bar.txt
bar.txt
$ basename --suffix=.txt /foo/bar.txt
bar
$ basename /foo/bar.txt /foo/baz.txt
bar.txt
$ basename --multiple /foo/bar.txt /foo/baz.txt
bar.txt
baz.txt
$ basename --suffix=.txt /foo/bar.txt /foo/baz.txt
bar
baz

There are also short options for each of these (-s for suffix and -a for multiple ("all"?)).

cat

That amazingly handy command we all know and love.

I often use it to quickly write multi-line files and to build up larger commands starting with getting input from a file. Some people say that this is a "useless use of cat", but I think that 'rule' has been parroted far too much.

Sprinkle cat around as much as you like. Do it with pride. Do it with a "meow". You can always remove it later if you need to.

Knowing cat is arguably more about knowing your shell (with redirection, pipes, etc.) but it does have some interesting options of its own such as --squeeze-blank, --number, --number-nonblank, --show-nonprinting, --show-ends, --show-tabs, and --show-all. All of these have short option names too.

Of course, the intended function of cat is to conCATenate files together, which cat does by simply streaming each file sequentially to STDOUT:

$ cat foo1.txt foo2.txt > multifoo.txt

Now multifoo.txt contains the contents of both foos.

chcon

Change the security context of a file.

Okay, what is a "security context"? It comes from SELinux, which sounds like the name of a Linux distro. But it’s not, it’s a kernel module.

I’m not really sure what the current story is with Slackware and SELinux and casual searching didn’t reveal any easy answers. As near as I can tell, it’s not a first-class citizen on Slackware. So I’m not going to dive into this right now.

chgrp, chmod, chown

Change a file’s group, "mode" (permissions), owner:

$ chgrp apache foo.html # apache group now owns file
$ chmod a+r foo.html    # give "all" the read permission
$ chown dave foo.html   # dave now owns file. hell yeah

chroot

I’d seen this mentioned in the context of sandboxing or "jailing" processes as a security measure. But Michael Kerrisk makes it clear in The Linux Programming Interface that this is not the purpose nor strength of chroot. Instead, there are plenty of useful reasons to give a process a new root directory. Wikipedia has a history of chroot which indicates it is possible the command was created by Bill Joy in order to test Version 7 UNIX’s installation and build system.

The other place I’d seen mention of chroot was when using it to recover a system using the Slackware install media like so.

What chroot does is quite simple: it modifies the apparent root directory for all file access for the affected process. In other words, it sets / and everything under / to a different directory of your choice.

So, how do we go about trying it out?

The man page is a little…​uh…​light on details. As usual, we’re supposed to use the info documentation instead. I’ll perhaps rant about that some other day.

At any rate, we do learn that the most basic invocation is (roughly):

chroot NEWROOT [COMMAND]`

and that

If no command is given, run '"$SHELL" -i' (default: '/bin/sh -i').

Well, that sounds easy. Let’s try:

$ mkdir newroot
$ chroot newroot
chroot: cannot change root directory to 'newroot': Operation not permitted

Uh, okay. The man page failed to mention that we also have to run chroot as the superuser.

$ sudo chroot newroot
chroot: failed to run command '/bin/bash': No such file or directory

Ah, now that makes sense! Indeed, there is no /bin/bash under the new root - it’s completely empty!

So how do we get Bash in there? Well, we can just copy it:

$ mkdir newroot/bin
$ cp /bin/bash newroot/bin
$ tree newroot
newroot
`-- bin
    `-- bash

1 directory, 1 file

That ought to do it:

$ sudo chroot newroot
chroot: failed to run command '/bin/bash': No such file or directory

Huh? Okay, so it turns out the error message here is extremely unhelpful - there is, indeed such a file. However, it could not be executed because it relies on dynamic libaries. That executable is all alone in a cruel, formless void.

Fine, we can fix that, but I want to see something running right now! I’m going to compile a static executable.

$ cat > hello.c
#include <stdio.h>
int main(){
	printf("Hello world!");
	return 0;
}

$ gcc -static hello.c -o hello
$ cp hello newroot/bin/

$ sudo chroot newroot /bin/hello
Hello world!

Ha!

Now, back to the problem at hand: we need an environment that can actually run regular executables that use dynamic shared libraries. Well, I could copy all of those. But I don’t want to. And I can’t just make a symlink because symlinks resolve using paths…​and those paths would point outside of my new root.

That’s where "bind mounts" come in. You can call mount --bind <fromdir> <todir> which will make the contents of fromdir accessible in both places.

$ mkdir -p newroot/lib64 newroot/lib newroot/usr
$ sudo mount --bind /lib64 newroot/lib64
$ sudo mount --bind /lib newroot/lib
$ sudo mount --bind /usr newroot/usr

Now let’s see if Bash is happy:

$ sudo chroot newroot
$

No news is good news. Now let’s look around:

$ ls
bash: ls: command not found

Huh? But I thought ls was in /usr/bin?

$ exit
$ which ls
/usr/bin/ls

$ ls -l newroot/usr/bin/ls
lrwxrwxrwx 1 root root 12 Apr 15 02:35 newroot/usr/bin/ls -> ../../bin/ls

Ohhhhh, that’s a broken symlink to a /bin/ls that doesn’t exist. Since we already created /bin to put our copy of Bash in, I’ll just copy ls over there too:

$ cp /bin/ls newroot/bin
$ sudo chroot newroot
$ ls
bin  lib  lib64  usr

Hey! There we are!

So, I would probably want to mount the real /bin in my newroot as we have with /lib and /usr. And we might also need /dev and /proc and the like. But this definitely works.

cksum

This does a cyclic redundancy check (CRC) of a given file (or STDIN).

$ echo "hello" | cksum
3015617425 6

The first number is the CRC sum, the second number is the size of input data.

I reckon the real question is: are there still uses for cksum when MD5 and SHA-based hashing methods are so much more popular for checking file integrity?

I haven’t been able to find any evidence that there are any compelling reasons to use cksum. It is simple and fast, but so is md5sum. It appears this utility simply exists for POSIX compatibility. So perhaps you can dig up some ancient scripts which rely on it. I didn’t find any on my Slackware 14.2 system using a naive search.

comm

Compares sorted lines of files and displays the results in three columns. The output is intuitively understood using an example:

$ cat foo1.txt
ant
bat
cat
dog

$ cat foo2.txt
ant
bat
crunch
dog

$ comm foo1.txt foo2.txt
               ant
               bat
cat
      crunch
               dog

As you can see, this is perfect for comparing lists of items. Turn specific columns off, and specify the output delimiter between columns.

I’m probably most likely to reach for diff to compare files 99.9% of the time out of sheer muscle memory. But this could certainly be extremely handy in specific circumstances.

It looks like later versions of GNU comm has a --total option which prints a numeric summary of each column at the end. My version doesn’t have this option.

cp

Copy file(s)!

There are some really interesting options to cp which I have to admit, I didn’t know existed:

       -i, --interactive
              prompt before overwrite (overrides a previous -n option)

       -n, --no-clobber
              do not overwrite an  existing  file  (overrides  a  previous  -i
              option)

       -s, --symbolic-link
              make symbolic links instead of copying

       -u, --update
              copy only when the SOURCE file is  newer  than  the  destination
              file or when the destination file is missing

       --backup[=CONTROL]
              make a backup of each existing destination file

The backup feature is interesting. You can have cp make backups of any files which are about to be overwritten. You can even specify how the backups are named. See the man page for details.

I do use -r sometimes to copy directories:

$ cp -r foo foo2 # where foo is a dir

UPDATE: Scroll down to 'dd' for more about the AMAZING POWER of cp!

csplit

Like comm above, csplit is a utility that may come in very handy for specific needs. In this case, you can split an input file (or STDIN if you use - as the filename) into separate files using the delimiter(s) of your choice. There are options for specifying output filenames, etc.

Sure, it’s not as flexible as a bespoke script (AWK, Perl, etc.), but it’s a lot easier to learn than a full-blown programming language. And it’s pretty dang flexible. I’m not going to go into a deep exploration of all of the options, but I did learn enough to come up with this example:

$ cat | csplit - /===/ {*}
Line 1
Line 2
===
14
Line 3
Line 4
Line 5
===
25
Line 6
Line infinity.
26

(I typed everything above except the numbers after each === line - csplit wrote those - they’re the number of bytes written to each file it created.

By default, the output files are named xxNN where NN is a number starting with 00. Here’s the output:

$ head xx*
==> xx00 <==
Line 1
Line 2

==> xx01 <==
===
Line 3
Line 4
Line 5

==> xx02 <==
===
Line 6
Line infinity.

As you can see, csplit created three files for us: xx00, xx01, xx02.

cut

cut splits up lines by delimiters and prints out only the fields (columns) you specify. Example:

$ cat | cut -d' ' -f2 -
cat cow dog
cow
moose lemon snakes
lemon

As you can see, we specified a space ' ' and requested field 2. So we get the second item of each list echoed back at us.

Honestly, I have no use for cut. AWK is just so much better.

Here’s the same thing in AWK:

$ cat | awk '{print $2}'
cat cow dog
cow
moose lemon snakes
lemon

By default, AWK breaks up tokens by whitespace. cut would fail in the above example if there were two spaces (or a tab) between items. AWK would succeed "out of the box".

Note: also see paste below.

date

By default, date gives you a human-readable date and time:

$ date
Thu Jan 23 20:07:38 EST 2020

You can also set the date with this utility or parse and reformat a date. It’s pretty impressive what it can understand:

$ date -d'feb 29 2020'
Sat Feb 29 00:00:00 EST 2020  # the 29th will be on a Saturday

$ date -d'monday'
Mon Jan 27 00:00:00 EST 2020  # the next Monday will be on the 27th

Handy output formatting options include:

$ date -u
Fri Jan 24 01:10:12 UTC 2020 # UTC date

$ date +%F
2020-01-23  # YYYY-MM-DD format

$ date +%T
20:17:19    # HH:MM:SS format

$ date +%s
1579828611  # standard UNIX epoch timestamp

So to get the current datetime in the One True Datetime format, ISO 8601, you do this:

$ date +'%FT%T'
2020-01-23T20:30:07

dd

Copy to and from files (including UNIX devices). Probably the weirdest old-school weapon of destruction you’re likely to actually use on a somewhat regular basis.

I use this all the time to write file system images to USB "thumb" flash memory drives. (Update: Ha ha, I don’t any more! Keep reading.) I have to look up the syntax with man every. Single. Damn. Time.

The basic usage is simple: specify an input file ("if") and output file ("of"). Writing a bootable ISO image to a USB drive might look like this:

$ dd if=/slackware14.2.iso of=/dev/sdb

There are about a billion options with this utility (at last, you can convert your old EBCDIC text files!).

One thing I’ve always wondered: how much of a performance hit am I taking if I don’t specify the block size to read/write?

In searching for an answer to this question (which amounts to, basically, "it varies"), I came across this really interesting answer on Unix & Linux Stack Exchange:

There’s a bit of a cargo cult around dd. Originally, there were two bugs in cp that caused problems: It would misdetect files as sparse when reported with a block size other than 512 (Linux used a block size of 1024), and it did not clear empty blocks from the destination when copying from a sparse file to a block device…​

I love this stuff! Turns out you can use cp or even cat (!) to perform exactly the same thing (and probably even better, since these tools do their best to figure out an optimal block size for you!)

I will try to remember to use cp next time I write an ISO. So long, dd.

A year later: I’m still writing this article and since then, I’ve written dozens of OS install images to USB using cp. It works great. I won’t claim any speed increases since I haven’t benchmarked the two methods, but the syntax is so much easier:

# cp downloads/os.3.9.iso /dev/sdb

Second year: I’ve written ISOs for all three major BSD installers (FreeBSD, OpenBSD, and NetBSD) and lost track of the number of Linux distro installers to several generations of SanDisk and Kingston USB "thumb" drives with the humble cp command like the example above and have had absoutely no issues with it! Do not waste any more of your brain cells on dd. "Burn" your ISOs with cp with this one weird trick!

Third year: LOL, I even used cp to recover an entire drive in 2022.

df

df stands for "disk free" and shows the usage and free space available on the mounted filesystems available.

dave@europa~$ df
Filesystem     1K-blocks      Used Available Use% Mounted on
tmpfs              32768      1492     31276   5% /run
devtmpfs        16399100         0  16399100   0% /dev
/dev/sda3      894228804 113366324 735368584  14% /
tmpfs           16407384     28408  16378976   1% /dev/shm
cgroup_root         8192         0      8192   0% /sys/fs/cgroup
/dev/sda1         101590     17120     84470  17% /boot/efi
none              102400         8    102392   1% /var/run/user/1000

The best option is the -h flag which (like many utilities) displays "human readable" values rather than bytes:

dave@europa~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            32M  1.5M   31M   5% /run
devtmpfs         16G     0   16G   0% /dev
/dev/sda3       853G  109G  702G  14% /
tmpfs            16G   28M   16G   1% /dev/shm
cgroup_root     8.0M     0  8.0M   0% /sys/fs/cgroup
/dev/sda1       100M   17M   83M  17% /boot/efi
none            100M  8.0K  100M   1% /var/run/user/1000

dir

Amazingly, dir is not just an alias to ls. Is is very, very similar, but does provide a slightly different output format.

Both utilities share the same source. Check it out!

#include "ls.h"
int ls_mode = LS_MULTI_COL;

The source for ls is the same thing, but with ls_mode = LS_LS;. You can see the little subtle differences by searching for ls_mode in ls.c.

dircolors

I had no idea this existed. It creates a pretty setting for the LS_COLORS environment variable used by ls. Check it out:

$ dircolors
LS_COLORS='rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*....

Wooooeeee, that would have been fun to type by hand. Good thing dircolors does it for us.

If you eval the output of dircolors, you’ll set the environment variable in your current shell session and ls will be mighty pretty.

$ eval $(dircolors)

See more invocation goodness here:

dirname

You may have used this one in shell scripts before (I know I have). It chops the last item after the '/' slash directory separator from a string.

Note that a trailing slash doesn’t "count" as the final component of the path:

$ dirname /foo/bar/baz
/foo/bar
$ dirname /foo/bar/baz/
/foo/bar

du

Stands for 'disk usage' and the default output is the size in bytes of every dang file in every dang directory and subdirectory in the current directory.

Most of the time, you probably want the -s ("summarize") and -h ("human readable") options to clean up the output. Here’s the size of the current directory (including any subdirectories):

du -sh
448M    .

Given file/directory name arguments (or a glob), it will summarize those as well. This is what I want most of the time:

$ du -sh *
11M    bin
110M   docs
7.1M   dotfiles
53M    img
96M    proj
173M   wiki

echo

Right up there with cat in terms of importance in CLI and shell script wizardry.

$ echo "Echo!"
Echo!

GNU echo has a couple really important options. -n does not output the trailing newline. -e turns on backslash string escape characters.

But, reading man echo will steer you to a really important note: the echo command is usually provided by your shell!

Check it out in Bash:

$ help echo
echo: echo [-neE] [arg ...]
    Write arguments to the standard output.

    Display the ARGs, separated by a single space character and followed by a
    newline, on the standard output.

    Options:
      -n    do not append a newline
      -e    enable interpretation of the following backslash escapes
      -E    explicitly suppress interpretation of backslash escapes
    ...

Of course, you can still call GNU echo explicitly like so:

$ /bin/echo "In the name of the Senate and the people of Rome."
In the name of the Senate and the people of Rome.

Either way, it’s cool to embed things like ANSI color codes in strings and output them with echo like so:

echo -e "\e[31mRed Text\e[0m"
Red Text

You can’t see it here, but in my terminal, the above "Red Text" is red.

It’s fun to read the source code for GNU echo here:

The whole thing is under 300 lines and boils down to a series of calls to putchar().

env

I’m most familiar with this little util from writing scripts in languages such as Ruby where the standard "shebang" invocation is thus:

#!/usr/bin/env ruby

The description "run a program in a modified environment" simply means that env gives you control over the environment variables, working directory, signal handling (such as blocking signals from being delivered), and CLI arguments.

When we use env in a script shebang line, we’re typically not using any of those features. We’re just using it to invoke the executable using the current $PATH rather than by absolute path. In the above Ruby example, env will execute whichever version of ruby it can find based on my environment rather than executing a specific version such as whatever I have installed at /usr/bin/ruby.

The -s or --split-string option lets you pass multiple arguments to an executable on the "shebang" line, which is something that is not otherwise possible if you’re calling the executable directly.

expand

Convert tabs to spaces! expand -t 4 fart.c will convert tabs to four spaces. The -i option only converts the initial tabs on each line to spaces.

Fire the first shot in a "tabs vs spaces wars" with your friends!

Also see the unexpand command which is the reverse of this below.

expr

According to Wikipedia, expr was born in 1979. It can be used as a serious scripting tool or just do some quick math at the command line:

$ expr 5 + 5
10

Supports string (including regex), numeric, and logical functions and relational tests.

$ expr length "Goat cheese"
11
$ expr index "Goat cheese" c
6
$ expr substr "Goat cheese" 2 8
oat chee
$ expr match foo foo
3
$ expr match foo bar
0

This is one of those legacy things that will probably be around forever. So you can count on it. But there are likely better alternatives for most of its functionality.

factor

Returns the prime factors of an integer:

$ factor 96
96: 2 2 2 2 2 3

This comes to us from 1974. Is it the weirdest thing that ships with coreutils? We’ll see…​

false

Always returns a failure (false) exit status. Though it can be used in shell scripts when we want a false condition to always occur, I think I’ve mostly seen it used as the "dummy shell" of user accounts that cannot log in. As in, "enjoy running that in false, ha ha ha."

I recall that Bash also has a false built-in and other shells may also. So unless you specifically call /usr/bin/false, you’re actually calling your shell’s command.

Not to be confused with the False esoteric programming language which was apparently one of the inspirations for everybody’s favorite esolang: Brainfuck.

fmt

Oooh, I have some experience with this one. It re-wraps lines to fit a certain width. I tried to script a Gopher content generator using a very light script wrapper around fmt. Sadly, it just wasn’t quite featureful enough to format all of my content (there was no way to have it ignore my "ASCII art" and stuff like that. Otherwise, it does a really nice job of formatting paragraphs of text using an algorithm that tries to even out the lengths of neighboring lines to make the output attractive.

$ fmt -w 80 meow
The Cats Will Know
By Cesare Pavese Translated by Geoffrey Brock Rain will fall again on your
smooth pavement, a light rain like a breath or a step.  The breeze and
the dawn will flourish again when you return, as if beneath your step.
Between flowers and sills the cats will know.

$ fmt -w 20 meow
The Cats Will Know
By Cesare Pavese
Translated by
Geoffrey Brock
Rain will fall
again on your
smooth pavement,
a light rain like
a breath or a step.
The breeze and the
dawn will flourish
again when you
return, as if
beneath your step.
Between flowers
and sills the cats
will know.

fold

Reformats line lengths. You probably want fmt (above), which is generally superior. This version might be used in scripts that want very specific behavior.

groups

Prints the groups a user is in. If you don’t specify the user, it’s you! Here’s the interesting bit: if you don’t specify the user, you may see different groups based on the current process:

$ groups dave
dave : users wheel

$ groups
users lp wheel floppy audio video cdrom scanner

Hey, I can look in /etc/passwd and /etc/group and see that I’m explicitly a member of users and wheel, but where are all these extra "system" groups coming from?

I spent the better part of an evening trying to figure it out and then finally asked and got the answer here:

They’re being assigned at runtime by the login executable. Specifically, they’re set in the CONSOLE_GROUPS setting in /etc/login.defs:

$ grep floppy /etc/login.defs
CONSOLE_GROUPS         floppy:audio:cdrom:video:lp:scanner

head

One of those truly great little utlities that has all kinds of uses. One of the most important options to know is -n, which lets you specify the number of lines to display from the "head" of a text file:

$ head -n 2 meow
 The Cats Will Know
By Cesare Pavese

I use head all the time to preview files. I especially like using it to preview multiple files at once:

$ head -n 2 meow*
==> meow <==
 The Cats Will Know
By Cesare Pavese

==> meow2 <==
 The cat's song
By Marge Piercy

==> meow3 <==
 A Little Language
By Robert Duncan

I’ll often use it with a glob and pipe it through a pager to view a whole directory’s worth of files really quickly.

UPDATE: I also recently used this to send part of a binary file over a serial connection while debugging some microcontroller code. To end the stream after a number of bytes, use the -c option like so:

$ head -c 1024 rickroll.ogg > /dev/boombox0

You can also end the stream some number of bytes before the end. Check the man page. This is the sort of utility that makes "The Unix Way" glorious sometimes.

hostid

Huh, this was completely new to me. Returns a unique ID for the current machine.

$ hostid
007f0100

Basically a wrapper for gethostid(). See man gethostid:

NOTES
In the glibc implementation, the hostid is stored in the file /etc/hostid.  (In
glibc  versions before 2.2, the file /var/adm/hostid was used.)

In  the  glibc implementation, if gethostid() cannot open the file containing
the host ID, then it obtains the hostname using gethostname(2), passes that
hostname to gethostbyname_r(3) in order  to  obtain the host's IPv4 address,
and returns a value obtained by bit-twiddling the IPv4 address.  (This value
may not be unique.)

Apparently this is mostly used for software licensing. A lazy Web search on my part returned no other uses.

id

The output of id is a little geekier than groups (above), but I like that it shows the uid and gid along with the supplimentary groups. Otherwise, it’s the exact same info:

$ id
uid=1000(dave) gid=100(users) groups=100(users),7(lp),10(wheel),11(floppy),17(audio),18(video),19(cdrom),93(scanner)

$ id dave
uid=1000(dave) gid=100(users) groups=100(users),10(wheel)

(Note that my current shell has inherited membership to additional groups by the login process at "runtime" - default Slackware behavior.)

install

I’d never heard of this util. I’m not surprised. It’s hilariously difficult to look up using a Web search (with a name like "install"…​)

It can create directories and copy files and set attributes (such as permissions) on them.

My first thought was "is this traditionally related to the 'install' step of some Makefiles?" Looks like that can be the case:

Here’s the only other non-manpage link I found in a lazy search:

I suspect you’re generally better off with cp (which can do recursive copying and has a lot of the same features, such as --backup) or something much more powerful like rsync -a.

I’d love to know if there’s a compelling use case for install.

2021-05-11 Update

Danny writes in:

I actually know the reason for install! In unix-likes (I’ve verified on Linux and OpenBSD) you can’t overwrite the file of a running executable. If you have, say, vi running and your install script tries to cp over /bin/vi it’ll fail. Install does the proper dance to handle this (rename /bin/vi to something like /bin/vi.old, cp vi into /bin/vi, rm /bin/vi.old)

Which makes total sense. Thank you Danny!

I just tried it. Terminal 1:

$ cp /usr/bin/ed .
$ ./ed

Terminal 2:

$ cp foo ed
cp foo ed
cp: cannot create regular file 'ed': Text file busy

But it would let me delete and replace ed!

2022-10-25 Update

Julien writes:

I use install a lot to make one-liners: install -d -m0750 /tmp/toto, instead of mkdir + chmod. …​And yes, install is very much used in autoconf/automake dev setups.

join

I’ve seen examples of join before, but never used it myself. It does a relational join (like database tables) on two files based on a common field. If you don’t specify the common field to join on, it defaults to the first field. Of course, the concept of "field" is common to a lot of UNIX tools (like awk) and means "whitespace-separated stuff" unless you specify otherwise.

I’ll make a little contrived example:

$ cat > users
101 Dave
102 Skeletor
103 Beansmaster
104 Thegoat

$ cat > email
101 dave@example.com
102 skeleton99@example.com
103 iluvbeans@example.com
104 goat.goaterson@example.com

$ join users email
101 Dave dave@example.com
102 Skeletor skeleton99@example.com
103 Beansmaster iluvbeans@example.com
104 Thegoat goat.goaterson@example.com

Boom! You have a text file database!

There are all kinds of ways to use this utility and the man page could sure use some more examples! But I think the most important thing is to just remember that it exists. I could easily see this saving me from writing a shell script someday!

The moment I saw this I typed "ln vs link" in my search engine of choice which returned this as the top answer:

Short answer: Link always creates hard links and differs in some other details. But I can’t fathom why you would use link on purpose! Perhaps someone can enlighten me.

ln

Ah, this is the linking tool we’re looking for. You can create hard and soft links with ln. I make a fair amount of soft links to make my life easier. Often to put executable stuff in my path without having to add to my path or move files out of their homes (like when I compile Git projects I’ve cloned).

There are a modest number of options, so check out man ln for the full details. Basically, you can have ln prompt you before overwriting, or make a backup, or force, or make hard/soft/"physical"/relative links.

logname

Nobody knows who wrote this and it doesn’t work?

$ man logname
logname - print user's login name

$ echo $LOGNAME
dave

$ logname
logname: no login name

On Slackware, I rate this: "1/10 Not recommended."

UPDATE: This is a fractal of garbage. I was amused by my original review and decided to pursue it further. The source of logname.c has 73 lines of GNU boilerplate code and then this:

  /* POSIX requires using getlogin (or equivalent code) and prohibits
     using a fallback technique.  */
  cp = getlogin ();
  if (! cp)
    die (EXIT_FAILURE, 0, _("no login name"));

  puts (cp);

From this we learn two things:

  1. This is just a front-end for the Lib C function, getlogin().

  2. POSIX requires that it behaves this way.

The man page for getlogin() appears to be the output of some tortured souls. Here’s the BUGS section

Unfortunately, it is often rather easy to fool getlogin().  Sometimes it does
not  work  at  all, because some program messed up the utmp file.  Often, it
gives only the first 8 characters of the login name.  The user currently logged
in on the controlling terminal of our program need not  be the user who started
it.  Avoid getlogin() for security-related purposes.

Note  that  glibc  does not follow the POSIX specification...

And the best part:

Nobody knows precisely what cuserid() does; avoid it in portable programs.  Or
avoid it altogether: use getpwuid(geteuid()) instead, if that is what you
meant.  Do not use cuserid().

ls

List files! Often used as an example of a utility that breaks the "do one thing and do it well" maxim. Check out the man page for the (surprisingly long) list of options. Truth is, I think it’s pretty handy to have all of those output options for ls. I pretty much never need them, but when I do, a lot of them would be really hard to replicate using other tools, right?

md5sum

Gotta have this. People love taking a big old stinky dump on MD5 all the time, but it still does exactly what it was meant to do: fast, low-collision hashes. It’s the right tool for the job when the job calls for MD5 sums!

$ md5sum meow
2731deb26ce04aa850042dcc40cccdb3  meow

Having said that, coreutils also comes with sha1sum (see below) which "better". But it really depends on what you’re trying to accomplish.

mkdir

Make directories! Looking at the man page, I see that you can set the mode (as in permissions) as you create the directory, which is nice. But I think the best option to memorize is the -p option to make any parent directories as needed:

$ tree
.

0 directories, 0 files
$ mkdir -p a/b/c
$ tree
.
`-- a
    `-- b
        `-- c

3 directories, 0 files

mkfifo

As a developer, I know the acronym FIFO means "First In, First Out", which describes a queue - the first thing into the queue is the first thing that comes out. (The opposite is what we call a "stack" - FILO or "First In, Last Out" in which items are pushed onto and popped off of the top of a stack like dinner plates - you can’t get to the first plate until all of the ones on top of it have been removed.)

It turns out, mkfifo creates a queue mechanism we’re all very familiar with, a pipe. Specifically a named pipe. I’d heard of named pipes, but had never used them before.

It’s really interesting. First you create the pipe "file":

dave@europa~/tmp/foo$ mkfifo foo
dave@europa~/tmp/foo$ ls -l
prw-r--r--  1 dave users    0 Aug 30 14:43 foo

Note the p at the beginning of prw-r—​r--.

Now I’m going to start displaying the "contents" of the foo pipe "file":

$ cat < foo

Then in another terminal, I’ll send something to that "file":

$ date > foo

In my first terminal, I now see this:

$ cat < foo
Sun Aug 30 14:48:47 EDT 2020

The pipe closed when date was finished. Now, in the second terminal I can repeat the output…​

$ date > foo

…​and it sits there waiting until I open the pipe up for output again.

It’s a fun party trick, but I’ll be keeping an eye out for uses in my real computing life.

mknod

This command line tool seems to be almost entirely a historical curiosity.

Wikipedia has a great writeup on the Device file article.

In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file.

and

Nodes are created by the mknod system call. The command-line program for creating nodes is also called mknod. Nodes can be moved or deleted by the usual filesystem system calls (rename, unlink) and commands (mv, rm).

mknod takes four parameters: a filename (to create), a type (b=block, c|u=character, p=FIFO), and then two numbers: major and minor device type IDs. The device numbers are the most opaque (and stupid) thing about this system.

The best explanation came from Chapter 14 (File Systems) in the book The Linux Programming Interface. Reading that is how I learned that you can list these magic numbers with ls -l:

$ ls -l /dev
total 0
crw-------   1 root root     10, 235 Sep  3 10:44 autofs
drwxr-xr-x   2 root root         600 Sep  3 10:44 block
drwxr-xr-x   2 root root          60 Sep  3 10:44 bsg
crw-------   1 root root     10, 234 Sep  3 10:44 btrfs-control
drwxr-xr-x   3 root root          60 Sep  3 10:44 bus
drwxr-xr-x   2 root root        5600 Sep  3 14:48 char
crw-------   1 root root      5,   1 Sep  3 14:44 console
lrwxrwxrwx   1 root root          11 Sep  3 10:44 core -> /proc/kcore
drwxr-xr-x  18 root root         380 Sep  3 10:44 cpu
crw-------   1 root root     10,  60 Sep  3 14:44 cpu_dma_latency
crw-------   1 root root     10, 203 Sep  3 10:44 cuse
drwxr-xr-x   6 root root         120 Sep  3 10:44 disk
drwxr-xr-x   3 root root         100 Sep  3 10:44 dri
crw-------   1 root root    239,   0 Sep  3 14:44 drm_dp_aux0
...

Huh, so can I make a "fun" one like /dev/random? Let’s see. I’ll start by looking at the device numbers on the current one:

$ ls -l /dev/random
crw-rw-rw- 1 root root 1, 8 Sep  3 14:44 /dev/random

$ mknod rando c 1 8
mknod: rando: Operation not permitted

$ sudo mknod rando c 1 8

$ ls -l rando
crw-r--r-- 1 root root 1, 8 Sep  3 17:37 rando

$ head --bytes=16 rando | hexdump
0000000 1c1a 28e5 8690 7c84 486b 2d64 504a f88f
0000010

Ha ha, that’s pretty cool. By the way, I totally guessed the c option for "character" device.

The list of device numbers should be here: http://lanana.org/ But when I checked, The Linux Device List page link was throwing a 404 error.

The Wayback Machine has a copy of the list that was current in 2009 (archived in 2019): https://web.archive.org/web/20190429050512/http://www.lanana.org/docs/device-list/devices-2.6+.txt

And yeah, check it out, there’s random and some other character devices near the top of the list:

  1 char    Memory devices
          1 = /dev/mem      Physical memory access
          2 = /dev/kmem     Kernel virtual memory access
          3 = /dev/null     Null device
          4 = /dev/port     I/O port access
          5 = /dev/zero     Null byte source
          6 = /dev/core     OBSOLETE - replaced by /proc/kcore
          7 = /dev/full     Returns ENOSPC on write
          8 = /dev/random   Nondeterministic random number gen.
          9 = /dev/urandom  Faster, less secure random number gen.
         10 = /dev/aio      Asyncronous I/O notification interface
         11 = /dev/kmsg     Writes to this come out as printk's

The rest of the list contains devices from the general (my main SSD drive is a block,8,0 "First SCSI disk whole disk") to the specific (char,10,4 "Amiga mouse (68k/Amiga)").

There are a couple great references to mknod in the Unix Admin Horror Story list here: http://www-uxsup.csx.cam.ac.uk/misc/horror.txt

mktemp

Note that depending on when you read this, mktemp might be the GNU version or an old Debian version, which Slackware used for backward compatibility with scripts until 2018. If you have the new GNU version, you’ll see this:

$ mktemp --version
mktemp (GNU coreutils) 8.32
...

…​in which case, the old version is still available as mkdir-debianutils.

The olde mktemp that comes with Slackware 14.2 tells you it’s version like so:

$ mktemp -V
mktemp version debianutils-2.7

mktemp is an incredibly handy tool for a lot of file-handling tasks, especially in scripts. If you don’t know about it, you’ll end up re-implementing it poorly (like me).

The important thing to know is that if you run it, it’ll create a unique temporary file for you in /tmp/ and return the name so that you can capture it in your script:

$ mktemp
/tmp/tmp.tVn7z3

I think the most important option to remember is -d to make a new directory rather than a file (which is usually what I need):

$ mktemp -d
/tmp/tmp.ecmk4s

$ ls -ld /tmp/tmp*
drwx------ 2 dave users 4096 Sep  4 00:49 /tmp/tmp.ecmk4s/
-rw------- 1 dave users    0 Sep  4 00:48 /tmp/tmp.tVn7z3

Both versions of mktemp work the same way for simple file and directory creation, so just consult the man page for the installed version if you need to get advanced (like using templates for the filename format).

mv

Move files or directories!

$ mv source destination

There are a lot of nice options and you should definitely scim the man page real quick if you’re about to use it in a script.

Otherwise, just know that it can let you control how to deal with existing files with the same names including prompting you (interactively) for each one rather than just overwriting them.

nice

You know, I have yet to find a compelling reason to set the priority of processes on my systems. Maybe I’m just not cool enough. I did know about nice, though.

I reckon there are two hard things to remember about the command.

First, the - in front of the priority argument is not a negatation, but the argument prefix:

$ nice foo -10
$ nice bar --15

launches foo with a priority of positive ten and bar with negative fifteen!

Second, the priority scale goes from -20 to 19 where the lower the number, the higher the priority given to the process.

I ran some examples and looked at them with htop and…​the results are too boring even for me to care about.

Oh, and if you run it without any arguments, nice will tell you what your current shell’s priority is:

$ nice
0

Boring.

nl

Huh, I had no idea this existed. nl stands for "number lines" and here’s what it does:

$ cat foo
Wolves enjoy waterballoons.
Don't eat the wrong fruit.
My skull is made of cheese.
Welcome to the pain castle.

$ nl foo
     1  Wolves enjoy waterballoons.
     2  Don't eat the wrong fruit.
     3  My skull is made of cheese.
     4  Welcome to the pain castle.

Clearly it is giving us some padding in the gutter for multiple digits. By the way, that’s also a TAB character after the numbers. As you’d expect, you can adjust these settings:

$ nl -w2 -s' ' foo
 1 Wolves enjoy waterballoons.
 2 Don't eat the wrong fruit.
 3 My skull is made of cheese.
 4 Welcome to the pain castle.

It’s actually a pretty cool tool. You can specify the format of the numbering, chose to number only non-blank lines, chose a delimeter for "sections" (and restart numbering for sections). I would seriously consider reading the man page for this tool before embarking on any sort of scripted numbering task. It looks to be flexible enough to cover most uses. I like it.

nohup

Like nice above, I’d heard of but never used the "no hangup" command nohup. I understood that it lets your process run even when the terminal "hangs up the phone" (more or less literally back in the early days of UNIX).

But this leads to those "when should I use" and "why should I use" questions that the man pages and other documentation carefully avoid answering. (Grrrr.)

Thankfully, there is this excellent explanation here on serverfault:

nohup makes a program ignore the HUP signal, allowing it to run after current terminal is closed / user logged out. Nohup does not send program to background.

Usually nohup and & are combined to launch program which runs after logout of user and allows to continue work at current shell session.

So, if you want to launch a process from the current terminal and not have it die when you exit that terminal, use nohup.

But, as Tim O’Reilly (yes, that Tim O’Reilly) points out in my copy of the book Unix Power Tools, maybe what you really want to do is start a job with something like at, cron, or batch.

nproc

This was a surprise. Did you know there is a command that returns the number of processors ("processing units") available to the current process?

Not to brag (in the year 2020 - I realize that when you are reading this, you’ll have a much larger number), but here’s mine:

$ nproc
16

With apologies to Tennessee Ernie Ford:

You have sixteen cores, what do you get?
Another day older and deeper in technical debt.
Saint IGNUcius don't you call me, 'cause I can't go
I owe my soul to the company repo.

numfmt

Today is just full of surprises. Need that number in a different (human-readable) format? GNU’s got the coreutil for you!

Let’s jump straight to some examples:

$ numfmt --to=si 10000
10K

$ numfmt --to=iec 10000
9.8K

$ numfmt --grouping 8712634782364
8,712,634,782,364

Note: the "iec" format above refers to the International Electrotechnical Commission binary prefix designations. Where 1K is 1024 as in 210. Not to be confused with the International System of Units (SI).

od

As you’re probably used to hearing a lot around the house, it’s "time to take an od!" Refering, of course, to an "octal dump".

$ od foo
0000000 067527 073154 071545 062440 065156 074557 073440 072141
0000020 071145 060542 066154 067557 071556 005056 067504 023556
0000040 020164 060545 020164 064164 020145 071167 067157 020147

It’s just like the hexdump command (I was familiar with) from the util-linux package…​but in octal!

Actually, od can output hexadecimal and a ton of other units and formats. Here’s hex:

$ od -x foo
0000000 6f57 766c 7365 6520 6a6e 796f 7720 7461
0000020 7265 6162 6c6c 6f6f 736e 0a2e 6f44 276e

There’s a huge number of formatting options including the endianness of the bytes, so if you have some specific needs while trying to view the contents of a file or stream, check out the man page!

paste

This is a weird utility.

$ paste foo bar
Wolves enjoy waterballoons. Bar line one.
Don't eat the wrong fruit.  Bar line two.
My skull is made of cheese. Bar line three.
Welcome to the pain castle. Bar line four.
    Bar line five!

In the words of the man page, it "Write lines consisting of the sequentially corresponding lines from each FILE, separated by TABs, to standard output."

So the question is not "what", but "why". Why would we use this?

First of all, it’s clearly a companion to the cut command (reviewed above):

$ cut -d' ' -f 1 foo | paste - bar
Wolves  Bar line one.
Don't   Bar line two.
My  Bar line three.
Welcome Bar line four.
    Bar line five!

or with AWK (which I find a lot easier to remember):

$ awk '{print $1}' foo | paste - bar
Wolves  Bar line one.
Don't   Bar line two.
My  Bar line three.
Welcome Bar line four.
    Bar line five!

In the above examples, I’ve used the standard - filename placeholder for STDIN.

A neat trick is to use mulple - to put any input into columns:

$ cat | paste - - -
dog
cat
cow
dog cat cow

$ cat foo | paste - -
Wolves enjoy waterballoons. Don't eat the wrong fruit.
My skull is made of cheese. Welcome to the pain castle.

This isn’t something you’re going to need every day, but when you do, it will be waiting, ever faithful.

pathchk

Wow, the man page for this is useless! pathchk takes a pathname as a parameter and returns an exit code if it’s not a valid path.

Mind you, this utility only tells you if a path is, like, potentially valid.

$ if pathchk 'foo'; then echo "ok"; else echo "bad"; fi
ok
$ if pathchk '~f'; then echo "ok"; else echo "bad"; fi
ok
$ if pathchk '/////~f'; then echo "ok"; else echo "bad"; fi
ok
$ if pathchk '/////~f\9000'; then echo "ok"; else echo "bad"; fi
ok
$ if pathchk '/////~f\x9'; then echo "ok"; else echo "bad"; fi
ok
$ if pathchk '/////~f$meow!@#$'; then echo "ok"; else echo "bad"; fi
ok
$ if pathchk ''; then echo "ok"; else echo "bad"; fi
pathchk: '': No such file or directory
bad

It’s actually pretty hard to make a path that isn’t valid. What’s probably more useful is knowing if a path actually points to a file or directory and your shell has way better utilities for that.

Even a full StackExchange search for pathchk reveals very little. I would say that uses of this utility are going to be pretty rare.

pinky

From the man page:

pinky - lightweight finger

Ha ha, what?

Oh yeah, that finger! Okay, how do they compare?

$ finger dave
Login: dave                     Name:
Directory: /home/dave                   Shell: /bin/bash
On since Sun Sep 13 10:55 (EDT) on tty1   2 hours 25 minutes idle
No mail.
No Plan.

$ pinky dave
Login    Name                 TTY      Idle   When             Where
dave                          tty1     02:25  2020-09-13 10:55

Okay. Ha ha, looks like I need to make a .plan file so all of my friends on this system can see what I’m up to these days, huh? :-)

pr

Considering all of the time I’ve put into learning about the text formatting tools available on Linux, I’m kind of amazed I hadn’t yet run into pr before. It’s for converting text files "for printing".

$ pr foo


2020-09-13 11:09                       foo                        Page 1


Wolves enjoy waterballoons.
Don't eat the wrong fruit.
My skull is made of cheese.
Welcome to the pain castle.

Huh, so it’s like groff/troff/roff with plaintext output. Neat.

It’s actually got a pretty great set of options like columnar output:

$ pr foo -2


2020-09-13 11:09                       foo                        Page 1


Wolves enjoy waterballoons.     My skull is made of cheese.
Don't eat the wrong fruit.      Welcome to the pain castle.

Oooh! Or check out the merge (-m) feature (eat your heart out, paste):

$ pr -m foo bar


2020-09-13 13:31                                                  Page 1


Wolves enjoy waterballoons.     Bar line one.
Don't eat the wrong fruit.      Bar line two.
My skull is made of cheese.     Bar line three.
Welcome to the pain castle.     Bar line four.
                    Bar line five!

This is seriously great. Waaaay lighter and easier to use than anything similar (like the aforementioned 'roffs). I have no idea when I’ll need to paginate some text files…​but I’ll try to remember it exists.

printenv

Super useful. Prints all of the current environment variables. Usually best with grep or a pager, since you’re likely to have screenfulls of output:

$ printenv | grep PERL
PERL5LIB=/home/dave/perl5/lib/perl5:/home/dave/perl5/lib/perl5
PERL_MB_OPT=--install_base "/home/dave/perl5"
PERL_MM_OPT=INSTALL_BASE=/home/dave/perl5
PERL_LOCAL_LIB_ROOT=/home/dave/perl5:/home/dave/perl5

printf

When you want precise control over output to the terminal, look no further than printf. By default, it does not print newlines:

$ printf hello
hello$

But if you’re a developer, the usual escapes work exactly as expected. Here’s a newline:

$ printf "hello\n"
hello

A Unicode smiley:

$ printf "\u263a\n"
☺

ptx

The man page is useless, just this description and a list of command options:

ptx - produce a permuted index of file contents

The info document is somehow even worse because it’s just additional explanation of the options.

Output is interesting. My foo file contains four short sentences, one per line.

$ ptx foo
                                       Don't eat the wrong fruit.
                                       My skull is made of cheese.
                                       Welcome to the pain castle.
                                       Wolves enjoy waterballoons.
                 Welcome to the pain   castle.
                 My skull is made of   cheese.
                               Don't   eat the wrong fruit.
                              Wolves   enjoy waterballoons.
                 Don't eat the wrong   fruit.
                            My skull   is made of cheese.
                         My skull is   made of cheese.
                    My skull is made   of cheese.
                      Welcome to the   pain castle.
                                  My   skull is made of cheese.
                                Don'   t eat the wrong fruit.
                           Don't eat   the wrong fruit.
                          Welcome to   the pain castle.
                             Welcome   to the pain castle.
                        Wolves enjoy   waterballoons.
                       Don't eat the   wrong fruit.

Fascinating!

Here’s a paragraph from The Lord of the Rings:

$ cat baggins.txt
I am old, Gandalf. I don't look it, but I am beginning to feel it in my heart of hearts. Well-preserved indeed! Why, I feel all thin, sort of stretched, if you know what I mean: like butter that has been scraped over too much bread. That can't be right. I need a change, or something.
$ ptx baggins.txt
   beginning to feel/      I am old,   Gandalf. I don't look it, but I am
   , but I am beginning to/            I am old, Gandalf. I don't look it
   beginning/     I am old, Gandalf.   I don't look it, but I am
    /, Gandalf. I don't look it, but   I am beginning to feel it in my/
      /. Well-preserved indeed! Why,   I feel all thin, sort of stretched/
     /of stretched, if you know what   I mean: like butter that has been/
        /bread. That can't be right.   I need a change, or something.
       /scraped over too much bread.   That can't be right. I need a/
     /feel it in my heart of hearts.   Well-preserved indeed! Why, I feel/
      hearts. Well-preserved indeed!   Why, I feel all thin, sort of/   /of
...
(continues for a total of 59 lines)

What the heck are we looking at here?

Wikipedia a useful writeup here with an additional clue: https://en.wikipedia.org/wiki/Ptx_(Unix)

ptx is a Unix utility, named after the permuted index algorithm which it uses to produce a search or concordance report in the Keyword in Context (KWIC) format.

Ah, now I see it! If you look at both examples above, you’ll see that the second column is in alphabetical order. What made it tricky to recognize this is that upper and lower case letters are treated differently by default.

Using the -f or --ignore-case option fixes this and now it’s much more obvious:

$ ptx -f baggins.txt
       . That can't be right. I need   a change, or something.       /bread
     /-preserved indeed! Why, I feel   all thin, sort of stretched, if/
   but I am beginning to feel/     I   am old, Gandalf. I don't look it,
     Gandalf. I don't look it, but I   am beginning to feel it in my/    /,
    /over too much bread. That can't   be right. I need a change, or/
       /I mean: like butter that has   been scraped over too much bread./
        /. I don't look it, but I am   beginning to feel it in my heart/
     /has been scraped over too much   bread. That can't be right. I need/
     /old, Gandalf. I don't look it,   but I am beginning to feel it in/
    /, if you know what I mean: like   butter that has been scraped over/
...and so forth...

The first column is textual context that comes before the alphabetized term at the beginning of the second column. The rest of the second column is context after the term. It’s a little unusual, but pretty nice once you get used to reading it.

My copy of the book Unix Power Tools doesn’t mention ptx, but it does have "permuted indexes" in the index which leads to…​an example of how these permuted indexes were used in the traditional UNIX manuals and an explanation of how and why they’re used.

It turns out, there’s currently a copy of UNIX System User’s Manual, Release 5.0 by Bell Laboratories in the Internet Archive. Here’s a screenshot of a portion of the permuted index:

a small portion of the permuted index at the front of the unix system user’s manual

You can see the rest here.

After having spent a couple days learning about these as I complete this coreutils entry over a series of evenings, I’ve come to really like this style of index. It takes up more room than a normal book index with a list of terms and page numbers. But it’s way more useful. Having context for each term makes it incredibly quick and easy to find the usage you’re looking for (it might even provide the answer you’re looking for without having to go to the actual source material).

That’s neat and all. But how can we use this tool for other interesting stuff?

One of the most useful/interesting options is -o or --only-file, which lets you specify a file that contains a list of terms to include in the index. (Note that there is also an ignore-file option.)

You can use the old - trick to specify STDIN as the filename, so making an index for a single term could be done like this:

$ echo foo | ptx --only-file=- input.txt

And now I’ve got a compelling example of usage. Compare the useless grep output when I search for the word "I" in our Bilbo Baggins quote:

$ grep I baggins.txt
I am old, Gandalf. I don't look it, but I am beginning to feel it in my heart of hearts. Well-preserved indeed! Why, I feel all thin, sort of stretched, if you know what I mean: like butter that has been scraped over too much bread. That can't be right. I need a change, or something.

Versus the excellent output of ptx for the same query:

$ echo I | ptx --only-file=- baggins.txt
   , but I am beginning to/            I am old, Gandalf. I don't look it
   beginning/     I am old, Gandalf.   I don't look it, but I am
    /, Gandalf. I don't look it, but   I am beginning to feel it in my/
      /. Well-preserved indeed! Why,   I feel all thin, sort of stretched/
     /of stretched, if you know what   I mean: like butter that has been/
        /bread. That can't be right.   I need a change, or something.

I just wish it were a little more intuitive to do this sort of search. Easily scripted.

pwd

Always handy: "print working directory":

$ cd ~/tmp/foobar
$ pwd
/home/dave/tmp/foobar
$ echo $PWD
/home/dave/tmp/foobar

Also interesting are the -L (logical) and -P (physical) options, which change how symlinks are treated:

$ ln -s /usr/bin bonk
$ cd bonk
$ pwd
/home/dave/tmp/foobar/bonk
$ pwd -L
/home/dave/tmp/foobar/bonk
$ pwd -P
/usr/bin

Read a file link. You can also use -f to get the canonical path.

$ ln -s foo bar
$ readlink bar
foo
$ $ readlink -f bar
/home/dave/tmp/foo

Consider using the next command, realpath instead of -f. Check out some historical background here.

realpath

Basically readlink -f. Has a lot more options and flexibility, so use this instead and check out the man page.

$ realpath bar
/home/dave/tmp/foo

rm

You know this one from the first Star Wars prequel:

sidious# rm -rf /naboo/gungans/ # wipe them out, all of them

But wow, next time you need to delete a bunch of stuff, check out the man page. Like cp and friends, there are a lot of options including two levels of interactive prompting.

I wish the -d option were on by default (remove empty directories). Sure, I could make an alias, but my point is that I wish this were the default behavior. I’m not about to start messing with the default behavior of rm on some machines. That leads to fear. Fear leads to hate. Hate leads to the dark side. And prequels.

rmdir

Remove a directory. The nice feature I didn’t know about was -p, which removes empty ancestor directories (exactly the opposite of mkdir -p)!

$ tree foo
foo
`-- bar
    `-- biz
        `-- baz

3 directories, 0 files
$ rmdir foo/bar/biz/baz

runcon

Another SELinux thing (the first was chcon above).

To quote Wikipedia from the SELinux article:

The command runcon allows for the launching of a process into an explicitly specified context (user, role, and domain), but SELinux may deny the transition if it is not approved by the policy.

I have no interest in this until someone demonstrates how SELinux helps you make a more delicious hot dog.

seq

My favorite type of UNIX utility! It simply prints a sequence of numbers. At its simplest:

$ seq 4
1
2
3
4

Nice and terse usage, too. Here’s a sequence separated by commas counting down from ten to zero subtracting twos:

$ seq -s, 10 -2 0
10,8,6,4,2,0

It’ll save you a minute writing a script to do the same thing some day.

sha1sum

Pretend that you are Git by computing some SHA-1 hashes for files! Here’s a mini-tutorial for the lifecycle of a SHA-1 checksum for a file:

ha1sum foo.txt
fa7dd7e51436401f0555f0cb6030393a0f18cfd5  foo.txt
$ sha1sum foo.txt > foo.sha1
$ sha1sum -c foo.sha1
foo.txt: OK

There are also the sha224sum, sha256sum, sha384sum, and sha512sum strengths available:

$ sha1sum foo.txt
fa7dd7e51436401f0555f0cb6030393a0f18cfd5  foo.txt
$ sha224sum foo.txt
0dbfebfe2057dd9b63ebbbeb8d21925323bc4ea293e4b23e1eb4a66b  foo.txt
$ sha256sum foo.txt
959a0da619f2594a481ee6e78c7c11f3686abdbbbab91f5b7d429ef8a0b46383  foo.txt
$ sha384sum foo.txt
ce87107ae3baa9f2217266770d37ddc8350609f856fd4441b6a80dd7a1fb0c362bdc427f5505a56e70aed083154fce2f  foo.txt
$ sha512sum foo.txt
0e83f638730bec5d0531382a4e40ea4fe9b1da05e444833282af16af03020697faf0baaa8db23b05a650b210477b7e50618a903584d140529cb2203198906b92  foo.txt

I’m sure someday there will be a sha9087123448sum available in coreutils to fill your screen with hex goop. (Not to be confused with "octal dump" - see od above.)

shred

This is a super-cool spy command. It overwrites a file multiple times with random data, which makes it very hard to recover the file…​from traditional, old-school spinning magnetic platters with traditional, old-school file systems where writing data to the same file would likely overwrite the same physical space on the storage media.

Shred is pretty much useless in our modern times, but I shredded a file just for fun:

$ cat > foo.txt
Super cool spy stuff.
I am a secret agent from Mars.
$ file foo.txt
foo.txt: ASCII text
$ shred foo.txt
$ file foo.txt
foo.txt: data
$ hexdump foo.txt
0000000 5b90 3445 6e50 da24 69f4 5f77 4ee9 3f9e
0000010 6d1b ddfe 47d8 ba69 bd10 72cc a59f ee52
0000020 2184 3f03 3d29 8de9 fb32 3bc2 f758 242e
...

Now no one will know my secret.

shuf

This one is great, and I didn’t even know it existed! How have you people been keeping this a secret from me for so long?

shuf is short for shuffle (say that three times fast) and it randomly shuffles elements from a file:

$ cat > words.txt
Apple
Bat
Cat
Donkey
Elephant
Fruit
Goat
Horse

$ shuf words.txt
Elephant
Donkey
Bat
Goat
Fruit
Cat
Apple
Horse

It’s a really nice tool with excellent, useful options. For example, -e shuffles the input arguments:

$ shuf -e A B C D E F G
E
F
B
G
A
C
D

Or you can give it a range of numbers:

$ shuf -i 1-10
10
5
2
1
4
8
3
9
7
6

And you can request a certain number of results for any of the above:

$ shuf -n 1 words.txt
Fruit
$ shuf -n 1 -e A B C D E F G
E
$ shuf -n 1 -i 1-10
2

I just added this:

$ alias rolld6='shuf -n 1 -i 1-6'
$ rolld6
6
$ rolld6
4

This is the most fun utility yet and it combines great with a lot of the others. I can’t believe this is the first time I’ve encountered it!

Check out this handy one-liner to delete a random file from the current directory:

$ alias randorm='ls | shuf -n 1 | xargs rm'

$ touch foo{1..5}.txt
$ ls
foo1.txt  foo2.txt  foo3.txt  foo4.txt  foo5.txt
$ randorm
$ ls
foo1.txt  foo2.txt  foo4.txt  foo5.txt

Bye bye foo3.txt. I bet you’ll be using that one in your next project. I know I will.

sleep

Pauses the current process (script or terminal) for a specified amount of time, in seconds by default:

$ sleep 5
$ sleep 10m
$ sleep 3d

So the question is: why would you want to do this? The most common case seems to be when scripting a loop where you don’t want something to happen too frequently - maybe some sort of network request.

Another fun use I just came up with is a simple command line timer. This plays a bell MP3 sound after one minute:

$ sleep 60s | play ~/Downloads/bell.mp3

sort

Well, this one I use all the time. It’s one of the classic, indespensible UNIX tools.

Let’s get these lines into alphabetical order:

$ cat foo
Wolves enjoy waterballoons.
Don't eat the wrong fruit.
My skull is made of cheese.
Welcome to the pain castle.

$ sort foo
Don't eat the wrong fruit.
My skull is made of cheese.
Welcome to the pain castle.
Wolves enjoy waterballoons.

I’m not going to list all of the options for this command, but there are a ton of them and they’re really useful and helpful. You can ignore leading whitespace, case, non-printable characters, and sort by numeric order.

You can even sort by month order! Check it out:

$ cat | sort -M
June - eat a cake
August - burn wood
January - melt everything
December - eat wax
July - drink all the liquids

January - melt everything
June - eat a cake
July - drink all the liquids
August - burn wood
December - eat wax

By far, the option I use most is -n for numeric sort:

$ cat | sort -n
198 clowns
16 dogs
985 snakes
84 goats

16 dogs
84 goats
198 clowns
985 snakes

Brings order to shuf. Love it. Can’t do without it.

split

Splits a file (or STDIN) into files based on size in bytes, lines, or by generating a specified number of files and letting split figure out how big each one should be.

It’s a great little utility. I just haven’t had need for it yet.

Here I split a text file into separate files with one file per line:

$ cat foo
Wolves enjoy waterballoons.
Don't eat the wrong fruit.
My skull is made of cheese.
Welcome to the pain castle.

$ split --lines=1 foo
$ ls
foo  xaa  xab  xac  xad
$ head x*
==> xaa <==
Wolves enjoy waterballoons.

==> xab <==
Don't eat the wrong fruit.

==> xac <==
My skull is made of cheese.

==> xad <==
Welcome to the pain castle.

The other super useful option is -t or --separator to specify something other than newline to separate "records".

Here I separate the Bilbo Baggins quote on the word "I":

$ cat baggins.txt
I am old, Gandalf. I don't look it, but I am beginning to feel it in my heart of hearts. Well-preserved indeed! Why, I feel all thin, sort of stretched, if you know what I mean: like butter that has been scraped over too much bread. That can't be right. I need a change, or something.

$ split -t 'I' -l 1 baggins.txt
$ head x*
==> xaa <==
I
==> xab <==
 am old, Gandalf. I
==> xac <==
 don't look it, but I
==> xad <==
 am beginning to feel it in my heart of hearts. Well-preserved indeed! Why, I
==> xae <==
 feel all thin, sort of stretched, if you know what I
==> xaf <==
 mean: like butter that has been scraped over too much bread. That can't be right. I
==> xag <==
 need a change, or something.

The man page for split lists Richard Stallman as one of the authors. I wonder how many of the coreutils share this distinction?

stat

Sigh. Man pages should really be required to have some examples up at the top. This tool’s man page is not an especially bad example. I’m just getting worn down by the annoying format after using it so much lately (man <thing> is 100% better than trying to do a Web search for most of these tools)…​ So, I’m super thankful for man. I just wish most of them were better.

Anyway, point stat at a file and you can see detail about it

$ stat foo
  File: foo
  Size: 111         Blocks: 8          IO Block: 4096   regular file
Device: 803h/2051d  Inode: 1573151     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/    dave)   Gid: (  100/   users)
Access: 2020-09-15 18:09:48.662941892 -0400
Modify: 2020-09-13 11:09:19.077060019 -0400
Change: 2020-09-13 11:09:19.077060019 -0400
 Birth: 2020-09-13 11:08:12.213055161 -0400

We can also learn about the filesystem that the file lives on with the -f option:

$ stat -f foo
  File: "foo"
    ID: 53e08d3f116ca070 Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 223557201  Free: 193352658  Available: 181979184
Inodes: Total: 56852480   Free: 55538042

You can even give it a format to output, making it extremely useful for scripting. Everybody’s favorite thing to do (after tell you that you’re using cat wrong) is to tell you that you shouldn’t parse the output of ls. Formatted output from stat seems like a great alternative:

Here’s the octal permissions and owner of foo

$ stat --printf="perms: %a, owner: %U\n" foo
perms: 644, owner: dave

stdbuf

Interesting. This lets you set the modes of the STDIN, STDOUT, and STDERR of a process.

And the man page for stat has an interesting example (which immediately displays unique entries from a file called access.log):

tail -f access.log | stdbuf -oL cut -d aq aq -f1 | uniq

Unfortuntely, when I tried a contrived example, it worked just fine without using stdbuf:

# terminal 1:
$ echo "cat" >> lawg.log
$ echo "cat" >> lawg.log
$ echo "cat" >> lawg.log
$ echo "cow" >> lawg.log

# terminal 2:
$ tail -f lawg.log | uniq
cat
cow

So that kind of ruins my plans to demonstrate a "before and after" example of stdbuf in action. (But I love the idea of piping tail -f (which I use all the time to follow changes to Apache web logs while developing) through uniq to get only unique messages.)

There’s a really great article with information about UNIX buffering here. But I am running out of interest in this topic quickly.

The important thing is that if you need it, this tool exists and may someday solve a problem you have.

stty

Terminal settings are a deep subject.

This utility can either print the current settings or change them. Here are my current settings:

$ stty
speed 38400 baud; line = 0;
-brkint -imaxbel

Change them at your own risk. I put this in the same category as stdbuf above - it’s there in the unlikely event that you need it.

UPDATE: I ended up using this to set the baud rate of a serial device to talk to a microcontroller. Check it out here. But the usage is beyond arcane and should only be used as a last resort. You’re better off doing almost anything else before using this utility.

sum

This one appears to be strictly historic. I haven’t been able to find anything interesting about this command. Here it is:

$ sum foo
49295     1

The first number is a checksum (using one of two algorithms available) and the second is the number of disk blocks it’s using.

I’d love to know if any person or thing is still using this utility in the year 2020 and why.

sync

There’s a good write-up of this command here.

In short, Linux buffers data in memory and writes to disk in an efficient manner. This utility forces all drive data or even just a single file to be written immediately to disk.

Why would you need to do this? I guess if you’re about to throw your computer in a lake, you’d want to do it?

$ sync
$ # throw computer in lake

Otherwise, I’m a big believer in letting the OS do its thing.

Also, there is zero reason to do this on a device that you are about to unmount. Unmount does a better job of making sure data is synced than you can. Have faith.

tac

Reverse lines! (It’s cat backwards, get it?)

dave@europa~$ tac
Cat is a cool guy.
Ant is a friend.
Termite eats houses.
Termite eats houses.
Ant is a friend.
Cat is a cool guy.

But it can also reverse using any separator:

$ tac --separator=,
a,b,c,d,e
e
d,c,b,a,$

(You can fix the trailing comma by using the --before option. Experiment until your output is delicious.)

tail

I use this pretty much daily, much like head aboove.

The killer feature (other than the ability to see the end of files, of course), is the -f for "follow" option. It shows additional lines as they are appended to the file:

$ tail -f /var/log/httpd/error_log
[Wed Sep 16 20:46:53.530838 2020] CRM9921: Web crimes detected.
[Thu Sep 17 19:04:49.254350 2020] SML0012: Smells too intense. Stop it.

It’s hard to demonstrate this in action on a web page, because it’s a dynamic thing. But as new errors roll into this unfortunate server, we’d see them pop up in realtime. I use this all the time for PHP error debugging and such.

tee

Such a clever name. The tee is like a "T" junction in a water pipe. It lets you send output to multiple places at once! Ever want to redirect output to a file but also see it? Check it out:

$ uname | tee outpoot
Linux
$ cat outpoot
Linux

I can see it in the outpoot file and I can see it on the screeeeeeen.

Okay, these coreutils have got me a bit slap-happy at this point.

Anybody still reading?

timeout

Oooh, this is cool! I had no idea this existed. It runs a command and then kills it after a specified timeout:

$ timeout 5 ping phobos
PING phobos.localdomain (10.0.0.37) 56(84) bytes of data.
64 bytes from phobos.localdomain (10.0.0.37): icmp_seq=1 ttl=64 time=0.161 ms
64 bytes from phobos.localdomain (10.0.0.37): icmp_seq=2 ttl=64 time=0.914 ms
64 bytes from phobos.localdomain (10.0.0.37): icmp_seq=3 ttl=64 time=0.900 ms
64 bytes from phobos.localdomain (10.0.0.37): icmp_seq=4 ttl=64 time=0.919 ms
64 bytes from phobos.localdomain (10.0.0.37): icmp_seq=5 ttl=64 time=0.904 ms
$

But this is a super flexible tool. You can specify the signal to send (default is TERM) after the timeout. The duration can be really long (like 16d is sixteen days). And a handful of other useful settings. Check out the man page!

touch

Another essential in the UNIX toolbox. Learn it. Know it. Live it.

$ touch foobar
$ ls -l foobar
-rw-r--r-- 1 dave users 0 Sep 17 20:07 foobar

I reached out my hand and touched that file right into existance. Time and space mean nothing to me. All matter is an extension of my mind.

Also, you can update just the access time, modification time, set times to specific dates (default is right the heck now), or even say "set this file’s access and mod time to be identical to that other file’s time". Whatever. The world is your oyster. Carpe diem.

tr

Translate (or remove) characters from input stream and write out the result.

Gosh, I always just use sed for this stuff. Or awk or even whatever general purpose programming language I’m particularly into that week.

Having said that, sometimes these really specific tools are way more efficient than a more powerful/general tool.

I guess if you’re really just replacing all instances of single characters, tr is shorter than the equivalent sed command because it’s global by default and doesn’t need regex syntax…​

$ echo "foo" | tr 'o' 'z'
fzz
$ echo "foo" | sed 's/o/z/g'
fzz

But for me, learning sed lets you do so much more. Typing, what?, three more characters on the command line is nothing compared to having to remember how to use a different tool.

I guess the -s ("squeeze repeated characters into one") is also a place where the tr command is going to be way more terse:

$ echo "fooooo" | tr -s 'o'
fo
$ echo "fooooo" | sed 's/o\+/o/g'
fo

Clearly you’d want to type the tr command instead. But, again, if I already have the sed syntax memorized, I can type that faster than I can lookup the squeeze option in the tr man page…​

I don’t mean to pick on the tr command. But sufice it to say that I don’t think you need to memorize everything. Just know it exists. Ideally, do memorize a few of the tools you use the most often.

true

See what I wrote about false above. This is like the lawful good to false’s lawful evil.

truncate

Interesting. Hacks the ends off of files, making them the size you specify.

$ cat foo2
Wolves enjoy waterballoons.
Don't eat the wrong fruit.
My skull is made of cheese.
Welcome to the pain castle.

$ truncate --size=16 foo2
$ cat foo2
Wolves enjoy wat$

It also grows files to make them longer (where the rest of the file becomes "sparse":

-rw-r--r-- 1 dave users 16 Sep 17 20:33 foo2
$ truncate --size=128 foo2
$ ls -l foo2
-rw-r--r-- 1 dave users 128 Sep 17 20:35 foo2

Very interesting. Hmm. Maybe some day I’ll find a use for that.

tsort

This tool’s man page really needs an example or two.

tsort - perform topological sort

The info page(s) are almost as bad, but if you read the following sentences slowly enough, and run the example, it does actually make sense:

tsort reads its input as pairs of strings, separated by blanks, indicating a partial ordering. The output is a total ordering that corresponds to the given partial ordering.

Alright, let’s give it some pairs:

$ cat | tsort
d e
b c
a b
c d
e f

a
b
c
d
e
f

Neat. A sorted alphabet. What’s going on here? Okay, so the input consists of pairs of letters. The first pair is:

d e

Which simply means that d comes before e (or "points to" or "is the parent of" or whatever directed relationship you’d like). That pair becomes the rule "d comes before e".

Now let’s add the rest of the rules: "b before c", "a before b", "c before d", "e before f".

(Now, you and I know that this is just alphabetical order. But tsort doesn’t know that. I’m just using this example because it’s easy to type and easy to confirm correctness.)

What’s cool is that tsort took all of these rules and then constructed the sorted list correctly for us: a b c d e f.

Just for fun, let’s also use Graphviz (the dot command and others) to draw this set of rules - which, as we say in graphing terminology, is a directed graph.

$ cat | dot -Tsvg > tsort1.svg
digraph G {
  rankdir=LR;
  node [shape=circle]

  d->e
  b->c
  a->b
  c->d
  e->f
}

Which produces the delightful graphic below:

a directed graph of the above created with graphviz

But that first example was pretty weak. Let’s give it something to chew on. I’ve made this very abridged history of program language evolution:

$ cat langs_unordered
bcpl b
c c++
algol simula
fortran algol
b c
smalltalk object_pascal
c++ java
algol pascal
simula smalltalk
algol bcpl
smalltalk java

Which looks like this when I use AWK and dot (from Graphviz) to turn this list of pairs into an SVG of the directed graph:

a directed graph of the computer language evolution

Now let’s tsort this to see in what order these languages have to be born in order to make cronological sense:

$ tsort langs_unordered
fortran
algol
bcpl
pascal
simula
b
smalltalk
c
object_pascal
c++
java

Yup! That’s exactly right.

Apparently, the historical purpose for this tool was for input to the old UNIX linker. Info has the whole story, which you can read here. Cool.

I don’t have any immediate uses for this. But it’s certainly interesting.

I have a really similar dataset for my website - it’s a list of all of the redirects I’ve ever had (as pages change names and locations). To generate my Apache aliases, I have a Ruby script that gets all of the unique old URLs and finds the new URL endpoints (following multiple moves and renames as needed). Sadly, this tool doesn’t do anything like that. But I’m just saying, I see how this sort of thing can be useful.

This sort of thing.

This topological sort of thing.

tty

Nice and simple: prints the device filename of your current terminal:

$ tty
/dev/pts/1

If we run a command from a pipe, that command is not not connected directly to the terminal. Same if it’s launched from cron or other process. In that case, we get this output:

$ ls | tty
not a tty

Ah, but beyond that message, we can also use the exit code from tty to determine if we’re attached to a terminal or not. We don’t want any additional messages in that case, so we can use the -s (silent) option to surpress the messages and just get the exit status:

$ if tty -s; then echo "Hello, hoopy frood!"; else echo "gruntbuggly"; fi
Hello, hoopy frood!

$ ls | if tty -s; then echo "Hello, hoopy frood!"; else echo "gruntbuggly"; fi
gruntbuggly

Obviously, this is something you’re more likely to do in a script than on the command line.

uname

The standard tool for getting some basic information about the system:

$ uname
Linux

$ uname -a
Linux europa.ratfactor.com 5.4.35 #1 SMP Thu Apr 23 13:47:56 CDT 2020 x86_64 AMD Ryzen 7 3700X 8-Core Processor AuthenticAMD GNU/Linux

unexpand

The opposite of expand! Converts spaces to tabs. It’ll be especially important to take a look at the man page to make sure you’re converting the right number of spaces to tabs because the document will (presumably!) use a specific number of spaces as tab stops.

Hint: you use the -t option to specify the number of spaces in thy input:

$ ed tab_poem
tab_poem: No such file or directory
i
Start of poem
  Indented by two
    I say this to you
  Enjoy the indent
  Expanded intent
    It's always nice
      to have tabs
        when they'll do!
.
wq
158 bonus Unix points awarded for the use of ed.

$ unexpand -t 2 tab_poem
Start of poem
    Indented by two
        I say this to you
    Enjoy the indent
    Expanded intent
        It's always nice
            to have tabs
                when they'll do!

My terminal displays tabs as 8 characters, so we know this has worked because the indented lines are now way more indented.

Also, note the bonus points awarded to me for the use of ed, the standard text editor. Start earning yours today!

uniq

Given a sorted list of items, returns only uniq items:

$ cat > animals
cow
cow
chicken
chicken
chicken
pig
pig
pig
pig
chicken
cow

$ uniq animals
cow
chicken
pig
chicken
cow

Note it only works as you’d expect for a sorted list. I can write a better uniq in one line of AWK:

$ awk '{lines[$0]=$0} END{for(l in lines) print l}' animals
chicken
pig
cow

To get the same effect, we can run our file through sort first:

$ sort animals | uniq
chicken
cow
pig

And that is certainly easier to type than the AWK program.

Plus, GNU uniq has got some other great features such as counting:

$ uniq -c sorted_animals
      4 chicken
      3 cow
      4 pig

Or printing only the duplicated entries (with -d).

Or showing all lines, but grouping them:

$ uniq --group sorted_animals
chicken
chicken
chicken
chicken

cow
cow
cow

pig
pig
pig
pig

Which I imagine would be most useful when paired with -w, which lets you specify how many characters to check:

$ uniq -w 1 --group sorted_animals
chicken
chicken
chicken
chicken
cow
cow
cow

pig
pig
pig
pig

(Note how the chickens and cows are now in the same group because they both start with 'c'.)

It’s not perfect, but give uniq a good, hard look before you write your own script for this sort of task.

Unlinking a file means detaching it from a "link" (filename). I’m most familar with "unlink" as unlink(), the system call to delete a file.

Of course, that’s what rm does, too. So what’s the difference? In practice, unlink is just a much less useful and less safe rm.

Use rm.

users

Who’s logged into this system?

$ users
dave

Heck yeah. And loving it.

vdir

Huh, another historical way to display files. See dir above.

This one does a "long" output, much like ls -l:

$ vdir
total 8
-rw-r--r-- 1 dave users 285 Sep 15 21:06 baggins.txt
-rw-r--r-- 1 dave users 111 Sep 15 20:58 foo

It has many options.

wc

Word count. One of my favorites! I use this all the time.

$ cat foo
Wolves enjoy waterballoons.
Don't eat the wrong fruit.
My skull is made of cheese.
Welcome to the pain castle.

$ wc foo
  4  19 111 foo

The output above is as follows:

  4 lines
 19 words
111 characters

You can also request one of those three items:

$ wc -w foo
19 foo

It also understands the difference between bytes and characters. For historical reasons, -c is bytes, and the newer -m is actual characters.

Here’s an interesting option I didn’t know about, -L, for max line width:

$ wc -L foo
27 foo

who

Who is currently logged in?

$ who
dave     tty1         2020-09-19 12:19

There are quite a few options. You can see most of them with -a for "all" and -H for "headers" makes it more readable:

$ who -a -H
NAME       LINE         TIME             IDLE          PID COMMENT  EXIT
                        2020-09-19 12:19               478 id=si    term=0 exit=0
           system boot  2020-09-19 12:19
           run-level 3  2020-09-19 12:19                   last=S
                        2020-09-19 12:19              1114 id=rc    term=0 exit=0
dave     + tty1         2020-09-19 12:19 02:18        1439
LOGIN      tty2         2020-09-19 12:19              1440 id=c2
LOGIN      tty3         2020-09-19 12:19              1441 id=c3
LOGIN      tty4         2020-09-19 12:19              1442 id=c4
LOGIN      tty5         2020-09-19 12:19              1443 id=c5
LOGIN      tty6         2020-09-19 12:19              1444 id=c6

whoami

Simple existential answers:

$ whoami
dave

yes

Repeats the string of your choice (default "y") and a newline to STDIN forever.

$ yes
y
y
y
y
...

$ yes no
no
no
no
no
...

Typically used to "answer" programs which expect you to answer "y" to confirm things interactively.

$ yes | annoying_script.sh

GNU yes is hilariously good at its job. Check out this classic: How is GNU yes so fast?.

Conclusion

Holy cow. After…​(checks date)…​almost exactly a year, I have finally completed this entry. Arguably a bunch of these could have been (and should be) separate articles/blog posts in their own right. Then people would at least have a chance to find the info they’re looking for when a search engine sends them here, right?

TODO: Split some of these into their own pages for quicker reference.

Anyway, after a year (mostly a long hiatus, mind you), I’ve completed this. I seriously doubt I’ll have too many more entries as big as this. The previous record was bash:

$ wc -w coreutils.adoc bash.adoc
 13797 coreutils.adoc
  8602 bash.adoc

Wow, over 13,000 words. Now I’m getting into NaNoWriMo territory.

I’m extremely excited to get to the next package, which looks to be the straight-forward utility, cpio. Should be able to knock that one out in way less than a year.

Until next time, happy hacking!