This is a card in Dave's Virtual Box of Cards.


Created: 2022-03-19

What is "plaintext"? To me, in the year 2022, this means UTF-8 text files. As an English speaker, this usually means the ASCII subset of UTF-8. ASCII text files are probably the most universally compatible series of 1s and 0s you could hope to deal with on computing devices.

$ cat > foo.txt
$ xxd foo.txt
00000000: 6865 6c6c 6f0a     hello.

The only tricky part of this file is the invisible 0x0a "Line Feed" (LF, \n, "newline", etc.) character at the end. But as of 2018, even MS Notepad knows what to do with a single LF as opposed to the native Windows CRLF (0x0d0a).

There are no big-endian, little-endian byte ordering problems in ASCII because every character in ASCII is contained in 7 bits. And unlike UTF-16 and other mistakes, there are no endianness issues in UTF-8 because even for multi-byte characters, the individual units are still single bytes!.

$ cat > foo.txt
$ xxd foo.txt
00000000: c2a3 0a            ...

(In the above, the c part of the byte c2 is 1100, which tells us that the pound sign character will have an additional byte, which happens to be a3.)

I agree with this manifesto: (Which also happens to be a very helpful resource for understanding Unicode in general!)

See also text-editing and text-markup