Archiving CDs to ISO from the Command Line

A few weeks ago I was working on a PC when I needed to grab the motherboard driver CD.  In a perfect world, the CD would be located in a nice protective sleeve, safely kept away from the nasty elements that encompass the IT tech area (read: coffee, scratches, and the occasional jelly doughnut).  But in this case, it appeared someone had taken this CD and wiped it across a Brillo pad.  I'm sure you have all had this problem from time to time, heck, my toddler kids tend to use them as Frisbees around the house when they find my stash of CDs.

But alas, I wasn't worried about this minor setback.  Why? Because when I get new hardware that comes with a CD one of the first things I do is burn an ISO image of the CD/DVD and put a copy of the ISO image on one of my servers.  This effectively rules out the chance of destroying a CD with drivers that I might need later on.

Some people may argue that you can always go online and get the drivers, that's true.  But if you need older drivers that are no longer available, or drivers for hardware that's been bought out by another company, it can become a royal pain in the butt trying to track down the software.  If you have copies of the CD/DVD on an ISO folder, it's as easy as burning a new CD or DVD.

Now, before I get started, I'm going to start with the customary disclaimer and get all that nasty but necessary legal mumbo jumbo out of the way.  This might not be 100% necessary, but hey, it covers mine and LinuxJournal's butt in the long run so I might as well get it over with.

Depending upon your State/Country/Planet/Solar System copying a CD or DVD may be against state/federal/planetary regulations.  Not only that, but it might be in violation of the software agreement's End User License Agreement (EULA). Myself and LinuxJournal are not responsible if you decide to burn CD/DVDs into ISO images to take over the world from your mothers basement. Please abide by state/country/planetary/EULA regulations before making ISO images.

With that said, I personally don't see any harm in creating an ISO image of driver/software CDs for archiving purposes as long as said ISO image is not given away or sold to anyone else.  I don't share my personal ISO images and never will.

In this blog post, I'm going to show you how not not make coasters out of CDs.  What a lot of people don't realize is that it's not as simple as a dd if= of= and go about your merry way. In order to make a proper burnable ISO image you need to take blocksize and blockcount into account.  Not only that, once the ISO image is complete, you really need to compare the MD5 hash of the CD against the ISO image itself.  I'll be going into detail about each one and provide a nice script that I whipped up for this article.

Why dd if= of= is a bad idea

A standard dd if= of= image write can be good in certain situations when it is necessary, but not when it comes to writing CDs and DVDs to ISO images.  Do a google.com search sometime for "linux make an ISO image" and a dozen searches come up where people recommend just using dd if=/dev/sd0 of=/pathto/file.iso.

Now I'm not saying the internet is full of bad information, granted maybe this worked out for someone at one point and they passed it off to someone else.  Then someone blogged it, and the circle repeats itself.  In any case, if you want a proper ISO image of that CD you need to get the blocksize and blockcount correct before you create your image.  When the CD was originally created it had a logical block size associated with it.  For the most part I have usually seen 1024 and 2048.  The other thing to look at is the block size, otherwise known as the volume size.  This is the amount of data stored on the CD.  We pass both of this information onto dd when creating the CD in order to tell the dd application the proper blocksize and blockcount to write.

If you want to follow along and see where I'm getting this information from, find the location of your cdrom (check /etc/fstab but it's usually is linked to /dev/cdrom) and run the following command from the commandline:

isoinfo -d -i /dev/cdrom

This command will scan your cd and output the necessary information.  As you can see, it outputs the blocksize and block count necessary for burning the CD. If you feel like skipping the rest of this post and burning coasterless CDs than you can stop now and use the following command:

dd if=/dev/cdrom bs=blocksize count=count
of=/path/to/isoimage.iso

Obviously replace blocksize and count with your blocksize and count collected from isoinfo.

MD5sum

You know that phrase? The one your teacher probably drilled into you as a school child? The proverbial "Don't count your chickens before they hatch"? Well in this case, we do want to count our chickens before they hatch.  Before you make an ISO image of a CD and file the CD away for good you want to make sure the ISO image you created is a good, clean copy.  This is where MD5 hashs come into play.  I'm not going to go into great detail about MD5, but if you haven't looked into checking MD5 hashs against downloaded files, now's the time to open up a linuxjournal.com search and check out some articles.  In this case, we are going to check the MD5 hash of the CDROM against the ISO image that you may have created in the previous step (if you're following along.  If not, the script provided at the end of this post will do this for you).

So, if you have already created an ISO image with the above command, let's check that MD5 hash.  With the CD loaded run the following command:

dd if=/dev/cdrom bs=blocksize count=count | md5sum

This will spit out a 128-bit cryptographic hash based on the contents of the CD.  Now let's check it against the ISO image you generated with the following command:

cat imagename.iso | md5sum

The output should match the MD5 sum generated above.  If they match, then you can rest assured that the ISO image that you generated will be good enough to burn CDs from.  If the MD5 sum doesn't match than make sure that you entered the correct information in from isoinfo into dd and try again.

Script

The script itself will be pasted at the bottom of this post, but what I will quickly touch on is what the script does.  I typed this script up specifically for this blog post.  Usually I just run isoinfo, grab my blocksize and count, and make the CD then check the md5sum and go from there.  Why haven't I created a script yet?  We will call it professional laziness. I'm sure I will be using this script from now on though.  What this script will do is allow you to find out the physical path of your CDROM, specify a path to your ISO image, and check the MD5 sum against the CDROM.  I did take a portion of the script from Troubleshooters.com's 'Coasterless CD Burning' while working on this script, the url will be pasted in the comments section of the script if you wish to look into the website further.

Note: This is a 'demonstration script'.  Bugs might come crawling out of your screen and up our pants leg.  It works IF you have a CD in the tray and you have the proper HAL package.  Feel free to modify the script to your needs, or pick it apart and use it as you see fit.  I have described all of the commands above in case you wish to not use the script.

Conclusion

Well there you have it, how to archive CDs into an ISO image.  As I said earlier, I tend to take any new CD/DVD out there and create ISO images for archiving purposes.  You never know when someone will scratch that one of a kind HP Utilities CD containing the Array Diagnostic Utilities that you need to run at 2am in the morning against an old server.  But if you have an ISO image of that CD all you have to do is create a CD and away you go.  Once you have an
ISO image, it's as simple as burning a CD; of course that's a blog post for another time. :-)  

#!/bin/bash

## ArchiveCD.sh Script whipped up for LinuxJournal.com Blog
## Post on Archiving CD's to ISO Images.  Written by Jayson
## Broughton.  Script updates may be found at the following
## website: www.jaysonbroughton.com
##
## blocksize and blockcount variables taken from Steve Litt's
## script on Troubleshooters.com article 'Coasterless CD
## burning.
## URL: http://www.troubleshooters.com/linux/coasterless.htm
##
## Last Updated: 05/15/2011

## Check HAL for CDrom and grab UDI
UDI=`hal-find-by-capability --capability storage.cdrom`

## Run UDI against block device
device=`hal-get-property --udi $UDI --key block.device`

## Get Block size of CD
blocksize=`isoinfo -d -i $device | grep "^Logical block size is:" | cut -d " " -f 5`
if test "$blocksize" = ""; then
        echo catdevice FATAL ERROR: Blank blocksize >&2
        exit
fi

## Get Block count of CD
blockcount=`isoinfo -d -i $device | grep "^Volume size is:" | cut -d " " -f 4`
if test "$blockcount" = ""; then
        echo catdevice FATAL ERROR: Blank blockcount >&2
        exit
fi

usage()
{
cat <<EOF

usage: $0 options
-h      Show this message
-d      Report the Location of your Device
-m      Check your MD5Hash of CD against Image (Run AFTER making Image)
-l      Location and name of ISO Image (/path/to/image.iso)
-r      Rip CD to ISO image
I'm Lazy, I didn't build much error checking into this script So alas, here's how to run it. Anything else might break the script.

Example 1: Report location of drive
archiveCD.sh -d

Example 2: Rip a CD to ISO
archiveCD.sh -l /path/to/isoimage.iso -r

Example 3: Check MD5Hash (Run AFTER ripping CD to ISO)
archiveCD.sh -l /path/to/isoimage.iso -m


EOF
}



while getopts "hdml:r" OPTION; do
  case $OPTION in
    h)
      usage
      exit 1
       ;;
    d)
      echo "Your CDrom is located on: $device" >&2
      ;;
    m)
      echo "Checking MD5Sum of CD and New ISO Image"
      md5cd=`dd if=$device bs=$blocksize count=$blockcount | md5sum` >&2
      md5iso=`cat $LFLAG | md5sum` >&2
      echo "CD MD5 is:" $md5cd
      echo "ISO MD5 is:" $md5iso
      ;;
    l)
     LFLAG="$OPTARG"
      ;;
    r)
     dd if=$device bs=$blocksize count=$blockcount of=$LFLAG
     echo "Archiving Complete.  ISO Image located at:"$LFLAG
      ;;
  esac
done
 
______________________

www.jaysonbroughton.com

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I just added a space before

Anonymous's picture

I just added a space before "CD MD5 is: ..." for easier viewing.

Discrepancy in Block Counts

Bruce Fowler's picture

First, in my previous post I meant "kB" not "megs" of padding. That's about a quarter of a meg, not a third of the disk!
To further check this out, I created my own .iso, of a couple of directories of .jpg files, using "genisoimage." Then wrote it to disk with "wodim." Then I checked out the .iso file, the one read using the blockcount from "isoinfo" but incremented by 125, and a version read using "dd" with no block count. All three matched in size and md5sum. I get lost really quickly in both the genisoimage and wodim man pages, and am no expert on optical media. But it sure looks like the block count returned by isoinfo needs that additional 125 added on to get consistent results. Or, don't use a block count at all, which hugely simplifies the script. Has anyone else run into this? Does the extra padding matter? What is the right answer?

Discrepancy in Block Counts of .iso File System and CD Image

Bruce Fowler's picture

Um, I've been fooling around with the Fedora-15 iso image I recently downloaded, and I'm finding that the downloaded image is 125 blocks (256 megs) larger than the iso file size indicated by running isoinfo against the CD. If I read back the CD using the isoinfo block count + 125, all is well, the iso file created is the same as the one downloaded originally.
There seems to be some padding on the CD that is not part of the iso file system. So if I use your script to "rip" a data cd, the new cd I burn from that image will be 125 blocks shorter. Doesn't seem that this is a good thing. I'll experiment more with this over the next couple of days...

Just copy the files for drivers?

Anonymous's picture

I have done this on occasion, and all I have needed to do is copy the driver files from a CD with anything from GUI drag-n-drop to xcopy (Windoze) or cp/cpio/tar for *n*x systems.

I would think ISO's would only be useful for bootable CD images, otherwise simple file copies are a lot easier to work with later (do use very descriptive names for the destination directories, so it will be recognizable years later), and should use a lot less space (they are usually already compressed, so using zip/gzip/7zip often does not gain any space savings).

HTH
RO

copying cd

namo's picture

$ man readcd
or
$ man readom

Mixed content CDs

cbware's picture

The problem I ran into is dd only captures the ISO image. It doesn't grab the HFS data for Apple systems if it's a mixed mode CD. For that I use readom dev=/dev/scd0.

MD5 sums of individual files still match

Anonymous's picture

I was never able to get the md5sum of ripped CD images to match before. However, the md5sums of individual files on a mounted image still matched those of the original disc.

I guess I haven't been doing

Isaac's picture

I guess I haven't been doing it the "right way" at all.

I've just been running:

$ cat /dev/cdrom > ~/cdimage.iso

LOL! Luckily I haven't ever run into any problems!

Isaac, From what you can see

Jayson Broughton's picture

Isaac,
From what you can see in the comments. There's really no 'right way' or 'wrong way' to do this. Sure, you can cat /dev/cdrom, dd if= of= or use other utilities. I think some people might have lost the big picture of the article. There are many ways that you can burn a CD to an ISO image. But the point of the article was on creating an ISO image for archiving purposes. I want to make sure 100% that 5 years from now the driver CD, motherboard CD, or software that I made an ISO image of, is still a valid copy from day 1 when I originally created the ISO. If I cat or dd the cd/dvd to an ISO image without looking at blocksize, or bytesize, or verifying that the md5sum matches between the image and the CD then there is no sure way to know that the image I created is a valid image. I would hate to toss the CD (or have it covered in jelly doughnuts) and really need to get my hands on that Image years down the road, to find out that the image is invalid. :-) So I guess to each their own, whatever works to create an image, I just prefer to double and triple check my copies before I archive it off to something I may or may never use in the future.

In the case where a Live CD

Anonymous's picture

In the case where a Live CD is already running, one should first issue the mount command to identify the target device already mounted on /cdrom, and then substitute that devicename in your first command:
$ isoinfo -d -i /dev/
where corresponds to the output devicename from the mount command for /cdrom.

This will avoid getting the following error in the above case:
Errno: 5 (Input/output error), test unit ready scsi sendcmd: no error
CDB: 00 00 00 00 00 00
status: 0x2 (CHECK CONDITION)
Sense Bytes: 70 00 02 00 00 00 00 0A 00 00 00 00 3A 00 00 00
Sense Key: 0x2 Not Ready, Segment 0
Sense Code: 0x3A Qual 0x00 (medium not present) Fru 0x0
Sense flags: Blk 0 (not valid)
cmd finished after 0.001s timeout 20s
isoinfo: No such file or directory. Unable to open /dev/cdrom

Looks like devicename

Anonymous's picture

Looks like devicename enclosed in brackets (less-than, greater-than) was removed automatically from my post and should have read as:
$ isoinfo -d -i /dev/devicename
where devicename corresponds to the output devicename from the mount command for /cdrom.

Very useful info and script

Akhil Oniha's picture

Very useful information ... i too am a "paranoid penguin" and I'll definitely be taking the extra precautions even if they are a bit of overkill.

Thank you.

ddrescue is better

Gavin's picture

I highly recommend using ddrescue over regular old dd for dumping DVD/CD images. It will work even if there are "bad sectors" on that disc of yours. Also, ddrescue is also great for recovering hard drives with bad sectors.

# ddrescue -n -b 2048 /dev/dvd dvd.iso dvd.log

Gavin, Oh I agree that

Jayson Broughton's picture

Gavin,
Oh I agree that ddrescue is awesome. I just tailored my blogpost using common tools that (hopefully) you wouldn't have to install extra packages onto another machine in order to create CD's. Going with that, there are some GUI cd/dvd creators that make awesome ISO images from CD, and ddrescue is also a really good utility. I haven't tried using that command before but I just might have to give it a shot on my personal computer. Thanks!

Use of HAL -> BAD

Anonymous's picture

Looks like most distros are removing HAL, I know Fedora is.

oh wow, they are removing HAL

Jayson Broughton's picture

oh wow, they are removing HAL out of F15. Wonder where the time has gone. I guess in that case if you knew the location of your cd/dvd (mount, cat /etc/fstab, so on and so forth) you could just use that. Thanks for that little tidbit of knowledge.

DD and ddrescue

Anonymous's picture

Just a little tip, when you need to read a cd that has been used as a frisby.

Google "ddrescue", it's a great tool

oh yes, ddrescue is just

Jayson Broughton's picture

oh yes, ddrescue is just awesome (sheesh I know I responded to you. I think my comment went into the same place my socks go in the dryer). If you haven't given systemrescueCD a chance (it comes with ddrescue, among other great utilities) you should check it out http://www.sysresccd.org/Main_Page I always keep at least 2 CD's and an ISO image of sysrescueCD handy for those emergencies. It's saved my bacon more times than I can count. Another commenter posted how you can use ddrescue to create iso images with the utility, which is just great!

Good to know another way to create iso images

jors's picture

Either you can always use genisofs (formerly mkisofs) and/or xorriso (both on Debian repositories).

blocksize is irrelevant

Gert's picture

Hi,

sorry, but the block size given to dd has nothing to do with the block size used on the CD or DVD. it is just the number of bytes that is read and written at a time. You can easily prove this by doing
isoinfo -d -i /dev/dvd
dd if=/dev/dvd of=some.iso bs=32000
isoinfo -d -i some.iso
and you will see that the block size information is the same in both cases regardless of what number you give at the command line.
Without additional arguments dd does a bit-wise copy and preserves all information that may be stored on the CD (including the block size).

In addition, if copying went without (physical) errors, then you will also get equal results for "md5sum /dev/dvd" and "md5sum some.iso". No need to use dd to just call md5sum.

Setting the block size for dd only makes sense if you want to get a high throughput, and the best number is related to the general IO stack, and the hardware buffers available.

Besides, on Linux you can even do "cat /dev/dvd >some.iso" to get a proper copy of your disk.

Best,
Gert

Gert, I'll argue that point

Jayson Broughton's picture

Gert,
I'll argue that point with you. The default blocksize on DD (if memory serves me correctly) is 512 (granted I'd have to pull up a manpage). And yes you could create a blocksize of 32000, or 4000, etc. I've just found it useful over the years to use the blocksize of what the disk was created as, to create the image itself. I could create a larger blocksize to speed things up and find that sweet spot (sometimes too fast causes it to slow down).

As far as your comment about no need to call dd to call md5sum, your right about that. In a perfect world where there are no physical errors (and DD reports them) there would be no need to call md5sum twice to check not only the md5sum of the CD and then compare it against the md5sum of the iso image itself. But I'm a paranoid little penguin, when I download iso images of distro's, I also grab md5sum's to check the image was downloaded properly or not modified. The same thing goes for burning CD's. I can't tell you how many times I've used various GUI utilities (no, I'm not starting a gui war, I swear) where I don't get a proper verification of the CD and I end up with a nifty coaster. Checking the CD's MD5sum against the burned image MD5sum is just a double-check measure that the ISO image that I created is indeed the exact image that created before I upload it to my little space in the world.

On cat /dev/dvd >some.iso. Maybe my computers over the years hate me, but I have a stack of coasters here from cat /dev/cdrom > some.iso and not being able to read some of the ISO images. Thus the post on the script that I use to use dd and verify with md5sum. You did raise some valid points though :-)

well ...

Gert's picture

About the block size, you made it sound like the chosen block size has something to to with the disk being correctly copied to a file and I wanted to point out that this is not the case when you make a backup of a CD to an ISO file. Larger block sizes essentially avoid context switches and make copying a little bit faster, but there is of course a sweet spot. Since Linux does it's own buffering, a difference in block size on the CD shouldn't be important to the user, but I've read elsewhere that they suggest to use a multiple of the dist block size when copying. Usually I use some 2^n bytes n > 10.

Also, I didn't want to imply that is not a good idea to run md5sum after copying the CD to an image file, just that it is not necessary to pipe it through a dd command when reading from the CD/DVD.

Best

True, there is always that

Jayson Broughton's picture

True, there is always that sweet spot. But by not knowing what each person's sweetspot was, I usually pick with what the disk's blocksize is vs the default 512. I've never been good at guessing a sweet spot. Oh, and your right about that md5sum btw, good eye, I've always just dd'd and run it through md5sum, but your way works as well. I guess there's more than one way to skin a carrot huh? At the least I hope you found the article informative :-)

Brilliant

lefke123's picture

Just as I was googling for Linux iso converters, this pops up in my feed list! Thanks!

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState