Subversion: Not Just for Code Anymore
Have you ever needed some information from a file, only to remember that you modified the file a week ago and removed the very information you're interested in? Or, have you ever spent hours sifting through dozens of inconsistently named copies of the same file trying to find one particular version? If you're like me, the answer is probably a resounding yes to both questions. Of course, if you're a programmer, you've probably already solved that problem in your development activities by using a version control system like CVS or Subversion. What about everything else though? Mom's cherry pie recipe may not change as frequently as rpc_init.c, but if you do decide to create a low-cal version, you're not going to want to lose the original. As it turns out, version control isn't only for source files anymore. Many of the features of Subversion make it ideal for versioning all kinds of files.
With Subversion, you can keep a history of changes made to your files. That way, you easily can go back and see exactly what a given file contained at a particular point in time. You also save space, because it stores deltas from one version to the next. That way, when you make a change to a versioned file, it needs only enough extra space to store the changes rather than a complete second copy of the file. Also, unlike with CVS, delta storage on Subversion also applies to binary files as well as text files.
Subversion makes it easy to access your files from multiple computers too. Instead of worrying whether the copy of the budget report on your laptop reflects the changes you made last night on your desktop system at home, you can simply run an update on your laptop and Subversion automatically updates your file to the latest version in the repository. Also, because all of the versions are stored in a single repository, there is a single location that you need to back up in order to keep all of your data safe.
So your interest is piqued. You're sold on the advantages of versioning your files, and you'd like to give it a try. The first question to answer is what files you're going to put under version control. One obvious possibility would be to version your entire hard drive. In practice though, that's not a very practical approach. When you store a portion of a repository's contents locally (in what's called a working copy), Subversion stores a second copy of each file to allow it to compare locally changes you have made with the last version from the repository. Therefore, if you version the entire hard drive, you'll need twice as much hard drive.
There's also little reason to keep full revision history on the largely static parts of your filesystem, such as /usr or /opt. On the other hand, directories that contain a lot of custom files/modifications, such as /etc or /home, are prime candidates for versioning, because the advantage of tracking those changes is more likely to outweigh the disadvantages of extra storage requirements. Furthermore, with Subversion, you can opt to create a working copy from a subtree in the repository hierarchy. That way, you don't need to store any copies of infrequently accessed data locally, which often results in a net reduction in hard drive requirements, even though the files you are storing locally take up twice as much space.
Now, let's dive in and get Subversion running on your machine. Installing is generally pretty easy. You can, of course, download the Subversion source and compile that, but in most cases, it's going to be much easier to install the precompiled binary package for your Linux distribution of choice. Fortunately, Subversion has matured to the point where such a package is available for almost every major distribution. In fact, I don't know of any off the top of my head that it isn't available for.
Once you have Subversion installed, it's time to create a repository. Let's say you have a documents directory in your home that you'd like to version. First, you need to create a new empty repository using the svnadmin create command. For instance, the following creates a new repository in your home directory:
$ svnadmin create $HOME/.documents_repository
Next, you need to import your existing documents into the newly created repository. To do that, use the svn import command with the directory to import and a URL that points to the repository. In this example, the URL refers directly to the repository using a file://-type URL. If your repository will be used only locally, the file:// URL is the easiest way to access a repository (there are other, better ways to access repositories that I'll discuss in a bit though):
$ svn import $HOME/documents file://$HOME/.documents_repository
When you run the import command, Subversion opens an editor and asks you for a log message. Whatever message you enter will be associated with the newly created repository revision and can be seen by examining the repository history logs. Enter something brief, such as “imported documents directory”. As soon as you save the log message and leave the editor, Subversion performs the import and outputs something like the following:
Adding documents/file1.txt Adding documents/file2.txt Adding documents/file3.jpg Committed revision 1.
You can now safely remove the original $HOME/documents and then re-create it as a working copy of the repository, using the svn checkout command:
$ rm -rf $HOME/documents $ svn checkout file://$HOME/.documents_repository $HOME/documents
So far, so good. However, if you want to take advantage of Subversion from multiple machines, you're going to need to set up a server. Several options are available to you, but the best choice is generally to use Apache with mod_dav, which serves a Subversion repository using the WebDAV protocol.
From a basic Apache installation, getting WebDAV to work is fairly simple. First, you need to make sure that mod_dav and mod_dav_svn are being loaded:
LoadModule dav_module modules/mod_dav.so LoadModule dav_svn_module modules/mod_dav_svn.so
Next, you need to set up a <Location> directive to point to your repository. For example, if you want your repository to be referenced with the URL http://example.net/bill/documents, and the repository is located in /srv/repositories/bill_documents, you could use the following Location directive:
<Location /bill/documents> DAV svn SVNPath /srv/repositories/bill_documents AuthType None </Location>
Or, if you want more security, you could allow for valid users only:
<Location /bill/documents> DAV svn SVNPath /srv/repositories/bill_documents AuthType Basic AuthName "Bill's Documents" AuthUserFile /srv/repositories/bill_documents/passwd Require valid-user </Location>
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Introduction to MapReduce with Hadoop on Linux
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?