This is a card in Dave's Virtual Box of Cards.

Linux Data Recovery - the stupid way!

Created: 2022-09-25 Updated: 2022-11-09

Short version:

cp /dev/sda3 foo.img
strings foo.img > foo.txt
grep "my precious data" foo.txt

And yes, you can use cp to copy entire partitions and disk images. See my huge coreutils write-up and scroll down to dd (or search for "weirdest old-school weapon").

Long version:

So my EeePC 701's original SSD drive finally gave up the ghost the other day. Linux died three different ways on boot before finally settling on the same kernel panic each time after that.

Oh dear, I’d been caught with a nascent project on there that had not yet been backed up anywhere else! Terrible timing. But that’s Murphy’s Law.

Well, let’s see what can be recovered…​

I started the machine up with a USB drive containing a 32-bit Slackware 14.2 installion image (the handiest thing I happened to have on hand).

lsblk clearly showed the three partitions on the drive, so the device still worked and could be read.

I tried fsck (fsck.ext4) and related tools from the util-linux package. I even played around in e2fsprogs like debugfs. It was educational and the man pages are very good. But nothing resulted in finding my files. The blame lies in my lack of knowledge. I don’t know anything about finding and restore files in a journaling filesystem like ext4.

But playing with those tools did give me a dumb idea. And in the end, it worked well enough and didn’t take very long.

What I’ve got:

  • /dev/sda is my corrupt drive

  • /dev/sda3 is the ext4 partition with the data I want to recover and is not mounted

  • /dev/sdd1 is a USB drive I’ll use for data recovery

(So yeah, I’ve got a USB drive sticking out of either end of my little netbook like a pair of ears: one providing Linux to boot from, the other as a destination for my recovered data.)

# mount /dev/sdd1 /mnt/usb             # mount my storage
# cp /dev/sda3 /mnt/usb/recovery.img  # copy the entire drive!

The drive on the EeePC is tiny and this partition was under 3Gb.

Then on another computer:

# mount /dev/sdd1 /mnt/usb   # mount my storage on working computer
$ mkdir recovery
$ cp /mnt/usb/ recovery/    # drive image on working computer

All of the data I wanted to recover was text (including source code). So I ran the strings program, specifying an arbitrary minimum length of string as 20 contiguous characters:

~/recover$ strings -n 20 recovery.img > str.txt

The result was 430Mb of text. I’d originally planned to use something like sed or awk to extract parts of the file to work with. But since it was merely hundreds of Mb, I decided to just open the darned thing in Vim and write out parts manually.

Looking at the file timestamps, I can see it only took me about a half hour to get what I needed. Learning how to really recover these files properly could have taken me days.

The end result looked like this:

~/recover$ ls -lh
total 3.2G
-rwxr-xr-x 1 dave users 2.7G Sep 24 17:58 recovery.img
-rw-r--r-- 1 dave users 8.7K Sep 24 19:04 latest_maybe.txt
-rw-r--r-- 1 dave users 1.3K Sep 24 18:20 log1
-rw-r--r-- 1 dave users  12K Sep 24 18:21 log2
-rw-r--r-- 1 dave users   3K Sep 24 18:21 log2_partial
-rw-r--r-- 1 dave users  42K Sep 24 18:27 hopefully.asm
-rw-r--r-- 1 dave users  21K Sep 24 18:27 hopefully2.asm
-rw-r--r-- 1 dave users 2.6K Sep 24 18:33 hopefully_readme
-rw-r--r-- 1 dave users 431M Sep 24 18:01 str.txt

You can see my degrees of certainty with the filenames. :-)

This would have been terrible if I had a bunch of files to recover. But in my case, there were only a handful of files. Everything else on this computer was also stored elsewhere. Which is, of course, always the plan.

(In fact, the timing was particularly unlucky since I had just created a new Git repo for this project and was going to upload it the very next evening. It’s totally not fair!)

It was also interesting to see that the drive seemed to contain a complete copy of my file(s) for every single save I had made. I’m not sure if this is "journaling" in action or just copy-on-write. Either way, it makes sense that writing a new copy in a new spot and updating references is much safer than actually overwriting the old file and I assume that’s what’s happening.

I found that searching from the end of my giant str.txt for a match seemed to always yeild the most recent change. But I don’t know how reliable that is. I assume the old spaces would have been overwritten eventually with newer copies once the space was needed?

It was only about another couple hours of editing to get the project completely back to functioning again. Of course, I also improved some things in the editing process, so it’s actually nicer now than it was before the Great Corruption.

Here’s the project I managed to save, by the way:

(It’s a little assembly toy I’ve wanted to build ever since I had the idea halfway into nasmjf.)