Computer Archeology | Linux Journal

Software

by Charles Curley

on January 26, 2004

The term "computer archeology" has two meanings. One meaning is the art of using computers in archeology. But that's only because they haven't found a computer in a dig--yet. Here, we're using the term in the way such science fiction writers as James P. Hogan and Vernor Vinge use it: the art of recovering data from defunct, possibly alien, computers.

This article explains how to recover data from an Atari ST hard drive, using PC hardware, Linux and a bit of care. The effort benefits from two design decisions the Atari ST designers made that show the benefits of open standards. First, the Atari ST can use a standard SCSI I hard drive with an Atari-specific host adapter. Second, the ST uses 12-bit FAT filesystems, so I did not have to develop a filesystem driver. Linux's open architecture, open-source nature and excellent native development tools would have made it the ideal platform for such a project.

The target hardware consists of a 234MB Quantum ProDrive LPS hard drive (large in those days) originally connected to the Atari by an ICD host adapter.

Recovering the Hard Drive

Recovering the data from the hard drive was easy. I already had installed a SCSI host adapter on a PC. All I needed to do was check the SCSI address of the hard drive, add it to the SCSI cable and provide power. Upon firing up Linux, I found the hard drive associated with a device file in /dev. As I already had two other SCSI drives on the computer, the Atari drive showed up as /dev/sdc. So far, so good. The first thing to do was read the master boot record to make sure I was accessing the correct drive. In UNIX, everything is a file, even memory (/proc/kcore), so I was able to get the data from the hard drive with a one-line command:



dd of=table if=/dev/sdc bs=512 count=1

Examining the file table in Emacs' hexl mode (Listing 1) convinced me I was in the right place. The first several bytes are 68,000 instructions which make an operating system call, not the Intel 808x CLI and branch instructions one would expect in a PC MBR. (See the LILO and/or GRUB documentation for the gory details of a PC MBR.) Listing 2 shows a partial Forth disassembly (in Forth's Reverse Polish notation) I later made of the MBR. Second, the word GEM, the name of the Atari graphical user interface, or GUI, shows up in the region where the partition table should be.

Convinced I was in the right place, I copied the entire hard drive to a file:



dd of=quantum if=/dev/sdc bs=512

That gave me a file image of the hard drive. Now I could shut down, remove the Quantum hard drive from the PC, put it back in its static-free bag and boot to Linux.

Accessing the Partitions

One problem with the Atari ST is that hard-drive partitioning, if done at all, had to be done by a driver specific to the host adapter. Unfortunately, this means that every host adapter provided a different partitioning scheme. Fortunately, back when this Atari ST was my active machine, Mike Yantis had reverse-engineered the ICD partition table and had written code in Forth to access the table. I had ported his code to my Atari, and it was in a file on the hard drive. So, I had to recover the partition table so I could read the file that has the code I need in order to recover the partition table. Great.

There is, however, a workaround. I tried reading the entire hard-drive image into that great programmers' Swiss Army Knife, Emacs, but Emacs balked at the size of the file. So I made a copy of the first 64MB:



dd if=quantum of=test bs=1024 count=61440

I loaded that into Emacs, switched to hexl mode (M-x hexl-mode) and searched for the word "partition". That didn't work. I tried the same thing with the next 64MB:



dd if=quantum of=test bs=1024 count=61440 skip=61440

That file contained the code I needed. I now knew the layout of the partition table: each entry was twelve bytes long. The first one, for C:, started at an offset of 454 (0x1C6) bytes. It was followed by three more. The entry for G: started at 342 (0x156), and it was followed by five more partitions. The ICD partition scheme could support up to 13 partitions, but I never used more than 10, so that was as far as my code went.

There are no extended or logical partitions here as there are in PC MBRs; thanks, Murphy. Whoever named them logical partitions had a vicious sense of humor; they aren't.

Each partition table entry consists of the following:

A byte of binary data. At a guess, bit 7 may indicate this partition is to be booted, while bit 0 says it has a valid filesystem.
Three characters of text indicating, possibly, the filesystem or OS that uses the partition. The Atari used the string "GEM". There is no zero termination in this field.
A four-byte value (in 68000 byte order, which also happens to be in network byte order) indicating the location of the first sector of the partition. As SCSI drives are addressed in a strictly linear fashion, a simple sector number is sufficient. As with IDE's LBA addressing mode, cylinders, heads and sector manipulations are internal to the drive.
Another four-byte value indicating the size in sectors of the partition.

Designing Code

Now I had the information I needed. I wasn't writing an operating system, so speed of development was more important than speed of execution. There was a byte order issue, and the code converted from packed binary data to individual variables. Perl's unpack facility is just the thing for this sort of hacking.

Writing Code

The first marginally useful program is partition, a Perl script to print out the partition data on an Atari/ICD hard drive (see Listing 3). It consisted of a subroutine to parse each entry, called apartition, and the main line code.

Because of the discontinuity in the partition table, I chose to represent the starting addresses of the partition table in an array, which I then could use as a lookup table. C: would use the 0th element, G: the fourth and so on. The result was the array @bases, and the code that immediately follows it. Being lazy, I wrote code to fill in the array.

Running partitions on the hard-drive image file produced the output shown in Listing 4. Because the hard drive is a SCSI drive, there were no cylinder boundaries or other nonsense to worry about. Each partition simply started on the sector after the previous one. This makes a bit of integrity checking easy to do, and there is some in partition.

Extracting a Partition

The next step was to write code to extract a partition. The program to do that is extract (see Listing 5). It extracts a partition into a specified directory and creates a mountpoint so that the user can mount the file easily with the loopback device.

The structure of extract is similar to that of partition, because it is a direct steal from it. Instead of simply printing out each entry in the partition table, extract accepts the file from which to extract the partition, the drive letter of the partition to extract (for example, c:) and the path to where the partition file should be built. The main line code qualifies the inputs, and the subroutine apartition does the actual work. The extract worked like so:



[ccurley@charlesc atari]$ mkdir disks
[ccurley@charlesc atari]$ ./extract quantum c: disks/

The next thing to do was mount as read-only the newly minted partition image file, like so:



[ccurley@charlesc atari]$ su -c "mount -o ro -o loop -t msdos disks/cdrive disks/c"

Password:

Only root can mount a filesystem, unless you add it to /etc/fstab with the appropriate keyword. So the user has to su to root temporarily and run the required command. su asks for root's password before it executes the given command. Later, to unmount it, you need to do almost the same thing:



[ccurley@charlesc atari]$ su -c "umount disks/c"

Password:

You might find it useful to modify extract to extract all of the partitions in your hard drive at once. The mod consists of a simple loop, and you can comment out all the code that qualifies the drive name's argument.

Using the Data

Now that you can access your Atari ST filesystem, what can you do with the data on it? Well, you can recover source code, old articles, old love letters, and so on to see what you really did say. Linux has some excellent tools for this sort of thing. For example, the GNU program strings can extract printable characters from files, such as old WordPerfect files. However, what if there is data in an application's proprietary format and you'd like to keep running that application?

For that you need an emulator. There is something elegant about an Atari ST emulator, because at its peak there were more emulators available to run on the ST than on any other computer. Several Atari emulators are available for Linux; see Resources for suggestions. I ended up with Steem, even though it does not include source, because it works well enough for my purposes. It uses Linux's filesystem, so you can run it without a hard-drive image. That means you can use Linux's file manipulation tools, editors and so forth on your Atari files.

One of my purposes was to be able to run the 32-bit Forth I wrote more than 15 years ago. Having recovered the executable and source files from the hard drive, I can now run a 32-bit 68000 Forth on top of Linux. There is more elegance to this, as Forth is an emulation of a platonic ideal stack machine Chuck Moore (the inventor of Forth) imagined.

Conclusion

Linux is a powerful platform for computer archeology. In addition, open standards, such as the SCSI, bus greatly simplify computer archeology.

Resources

Atari ST Emulators for Linux, BSD and more.

GEM, the GUI on the ST, also is available for the PC.

Hatari is an Atari ST emulator for Linux, BSD, BeOS, Mac OS X and other systems supported by the SDL library. See especially the Hatari links page.

James P Hogan

Little Green Desktop. Worth it for the domain name alone.

Steem is a Freeware Atari STE emulator for Windows and Linux.

Vernor Vinge

Charles Curley lives in Wyoming, where he has deer, buffalo, archaeologists, hot springs, and the Wind River Canyon for neighbors.

Load Disqus comments