A Tale of Failure, Success and Open Source.

PDF24    Send article as PDF   

Everyone talks about making sure that you have adequate backups, especially when upgrading your operating system, but how many of us actually carry out regular backups? As an IT professional, I can say that in my working environment I’m completely paranoid about making sure that I have proper backups and, that they work. However, as far as my personal laptop is concerned, I don’t tend to bother and I know that this is completely wrong in so many ways!

I wanted to carry out an upgrade of my laptop from OpenSuse 11.2 to 11.4 and, at the same time, recover an extra 100 Gb of disk space that had previously been used by Windows 7. Additionally, I wanted to combine two separate data partitions into one larger one. I knew that I needed to plan my actions carefully and, as repartitioning was required, make sure I had a working backup.

I started by installing Kbackup and ran a backup of the required directories. This seemed to work and only flagged a couple of warnings about a couple of files not being readable. As I was logged in with my own account, this was not a problem and I resolved the issue by logging in as root and re-running Kbackup on the failed directories. I now had two tar files one of 140 Gb and another of 46 Gb. Life is good.

A quick listing of the backup files with tar -tf appeared to show that they were readable and all ok – how little I knew.

With my data safely backed up and stored on the network drive, I began the task of repartitioning my hard disk. The root and home partitions were increased to 30 Gb each, swap was set to 8 GB – the size of my RAM – and the remainder was assigned to my new “all in one” /data partition. I admit, I did pause before writing the partition table back to the disk – this was my last chance to abort after all!

OpenSuse 11.4 easily installed, and after a reboot, I was ready to restore all my data to a safe place ready to copy into its final position as required.

I configured the network storage, mounted it and checked that my backup files were still there – they were. I then ran the following commands before going off for the evening to do something sociable:

cd /data
mkdir BACKUP
cd BACKUP
export BACKUP_FILE=/media/nfs/norman/BACKUP/backup_1.tar
tar -xvf $BACKUP_FILE

When I returned, all was not well. The tar command, had encountered problems with the backup file and the restore had errored out after restoring only part of the old /data partition. At this point, panic was considered as a viable option!

I tried to run the tar command to list the contents of the backup file, and sure enough, it too gave up at exactly the same point. The error reported was something like “skipping to next header” and then tar simply died.

tar -tf $BACKUP_FILE
…
data/VirtualBox/
data/VirtualBox/HardDisks/
data/VirtualBox/HardDisks/OracleEnterpriseLinux5.5.vdi
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Googling for the error turned up nothing of any use – other people reported the same error message as I was getting, but they had other error messages telling them what was wrong, I was sadly lacking in extra information. I appeared to be on my own.

My options appeared to be to give up and live with the loss of data, or to try and find out what went wrong, and if I could work around the problem. Having been a programmer for many years, I decided to take the latter option.

First, I needed to find out exactly what is inside a tar file. Google helped out and I discovered that a tar file consists of a sequence of a 512 byte header block followed by the contents of the file itself rounded up to the nearest 512 bytes. The extra bytes of padding are zero filled. Repeat this for each file in the archive.

Google also pointed me to the fact that there is a limit of 8 Gb for files being added to a tar archive. This is caused by there only being 11 digits allocated in the header block for the original file length. As this is 11 octal digits, the maximum in 77,777,777,777 which equates to one byte less than 8 Gb.

Many of my VirtualBox hard disks, included in the backup, are bigger than this limit. Hmmm! However, because no compression or encryption is applied by tar, my data should be extractable – provided I can get past the corrupted section of the tar file.

I decided that I needed to write some code to “walk” through the tar file and list the places within it where my data files could be found. The following (bad!) C code does exactly that:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define VERSION "0.01"
#define USTAR "ustar"

struct tarHeader {
    char name[100]; // full file path and name.
    char stuff[24]; // Ignore this bit.
    char size[12]; // File size in OCTAL digits.
    char stuff2[121]; // More stuff to ignore.
    char magic[6]; // Tar magic character sequence.
};

int main (int argc, char *argv[])
{
    struct tarHeader th;
    size_t offset = 0;
    size_t rounding = 511;
    size_t blockSize = 512;
    size_t mask = 0xfffffffffffffe00;
    char *dummy;

    printf("\nTarSlicer by Norman Dunbar.");
    printf("\nVersion %s\n", VERSION);

    // Read the first header which should (!) be valid.
    fread((void *)&th, sizeof(th), 1, stdin);

    while (1) {
        if (feof(stdin)) break;

        if (!strncmp(th.magic, USTAR, 5)) {
            size_t size = strtol(th.size, &dummy, 8);

            // Print offset to actual file not to the tar header.
            printf("\n%p: %s", (void *)(offset + blockSize), th.name);

            // Add file size rounded to nearest 512.
            // Needs another 512 added of course, see below...
            offset += ((size + rounding) & mask);
        }

        // If this was a valid header, add 512 to point at the next one.
        // Else, add 512 and keep scanning for the next valid header.
        Offset += blockSize;

        // Read the next header, or keep looking in 512 increments until
        // one is found.
        fseek(stdin, offset, SEEK_SET);
        fread((void *)&th, sizeof(th), 1, stdin);
    }

    printf("\n\n");
    return 0;
}

The code starts by reading the beginning of the file into a buffer and then enters the main loop. The program will exit from the loop when end of file is detected.

If the buffer contains the start of a valid tar header block – determined by the characters “ustar” being found in the “magic” field of the header block – then the location and name of the archived file is printed. The start of the file is 512 bytes further on from the header, so the current position has 512 added to it for printing purposes.

The code then extracts the file’s original size from the header block, rounds it up to the next multiple of 512 bytes, and adds that to the current offset to hopefully point at a position 512 bytes short of where we need to be to read the next header block.

If, because of the 8 Gb limit, the extracted file size is wrong, then the next pass through the loop will not find a valid header block. In this case, all the code can do is add 512 bytes to the current position – which will always be a multiple of 512 – and try again. This 512 byte offset is the size of a header block and is also required when the program found a valid header last time.

At the end of the loop, the new offset is either pointing to the next valid header block, or, at a position in the file where the next header block might be. In the event that a file bigger that 8 Gb was archived, this causes the code to step through the tar file in 512 byte steps looking for a new header block.

An example of the program’s output is:

TarSlicer by Norman Dunbar.
Version 0.01

0x200: data/
0x400: data/.Trash-1000/
0x600: data/.Trash-1000/files/
0x800: data/.Trash-1000/info/
0xa00: data/audacity/
0xc00: data/databases/
0xe00: data/DigiKam/
0x1200: data/DigiKam/.directory
0x1400: data/DigiKam/Arthur/
0x1600: data/DigiKam/Arthur/arthur_150dpi.tiff
0x13f400: data/DigiKam/Arthur/arthur_2_150dpi.tiff
...

I compiled the code above and ran it against my corrupted tar file and redirecting the program output to a file, I ended up with a full listing of the contents of the corrupt tar file.

Actually, I also ended up with some interesting output. It’s entirely possible that within one or more of my VirtualBox hard drive files, I have a tar file stored. Those files are also aligned on a 512 byte boundary, so sometimes, within the scan for the next legal tar header block, you gets the embedded ones as well. It makes for an interesting printout – but the file names themselves give the game away, so it’s not such a massive problem.

From the output listing, I found that the corruption occurred at the location of my first VirtualBox hard disk file which was 15 Gb in size. Looking at the offsets to that file and the following one, I determined that the whole 15 Gb had been added to the tar file rather than just 8 Gb. This was good news. It appears that tar stores any size of file, it just cannot extract or list past any file bigger than 8 Gb.

I wanted to try to fool tar into extracting all the normal files and miss out the files it had already restored and also, to ignore all my VirtualBox hard disk files. To do this I needed to slice the top section off of my tar file so that extraction began at the location of the first file header past the VirtualBox ones.

I knew where the next file began in the file from my listing, so all I had to do was calculate some parameters to pass to the “dd” command and pipe that into tar. Obviously, I needed the header block located 512 bytes before the listed file start as opposed to the position of the file itself.

I calculated that I had to extract as much as possible from a position 0xB294b6000 bytes into the tar file.

This offset is 93,627,824 blocks of 512 bytes into the file and should be the location of the tar header. I then ran the extraction, as follows:

dd if=$BACKUP_FILE bs=512 skip=93627824 | tar -xvf -

After a short delay, I saw a list of file names scrolling up the screen. Success! I had persuaded tar to read past the “corruption” and extract some more of my data.

I left this running and managed to restore all my data except for the VirtualBox hard disks. Progress!

I now needed to attempt to extract the large files that had caused my original problem, so for each of my VirtualBox hard disks, I noted the start position, from my listing, and calculated the length – both in 512 blocks. These values were plugged into “dd” and a test extraction of one VirtualBox hard disk file was tested out by the command:

export VDI_FILE=data/VirtualBox/HardDisks/OracleEnterpriseLinux5.5.vdi
dd if=$BACKUP_FILE bs=512 skip=93627825 count=31457401 of=$VDI_FILE

After a brief delay, I found myself with a 15 Gb VirtualBox hard disk. Crossing my fingers, I ran up VirtualBox and started the Oracle Enterprise Linux VM that this file was attached to. OEL started perfectly with no problems. Success! This proved that tar had indeed stored the entire file regardless of what it stores in the file header for the file size.

I repeated the above process and extracted each of my hard disk files with only one problem – for some reason, my Linux Mint 10 virtual hard disc caused the Mint VM to hang solid while booting.

Linux Mint is the VM I use to demonstrate to Windows users exactly what they are missing. It was simple to wipe the affected file, create a new one and reinstall Mint 10. Other than this problem, everything that I needed to preserve during the upgrade had been restored.

There are a few morals to this tale:

  • When you do take backups, always take more than one.
  • Use a different tool and file format for each.
  • Don’t assume because there were no errors that all went well.
  • Test that your backups can be restored. The more time you spend making sure that backups are usable means less time attempting to recover your precious data afterwards.
  • Always use Open Source. Would things have been so easy (and I use the term loosely) if I’d been backing up a Windows system using some proprietary file format, most likely not documented anywhere? Somehow I doubt it.
  • Finally, beware. Tar archives huge files without error, but you cannot extract them again afterwards. Well, you can now!

 

Leave a Reply