The Subversion Project: Buiding a Better CVS

This open-source project aims to produce a compelling replacement for the Concurrent Versions System (CVS).
Subversion's Design

Subversion has a modular design; it's implemented as a collection of C libraries. Each layer has a well defined purpose and interface. In general, code flow begins at the top of the diagram and flows downward—each layer provides an interface to the layer above it (see Figure 1). Let's take a short tour of these layers, starting at the bottom.

Figure 1. Design Layers of Subversion

The Subversion Filesystem

The Subversion Filesystem is not a kernel-level filesystem that one would install in an operating system (like the Linux ext2 fs). Instead, it refers to the design of Subversion's repository. The repository is built on top of a database, currently Berkeley DB, and thus is a collection of .db files. However, a library accesses these files and exports a C API that simulates a filesystem, specifically a versioned filesystem.

This means that writing a program to access the repository is like writing against other filesystem APIs: you can open files and directories for reading and writing as usual. The main difference is that this particular filesystem never loses data when written to; old versions of files and directories always are saved as historical artifacts.

Whereas CVS's back end (RCS) stores revision numbers on a per-file basis, Subversion numbers entire trees. Each atomic commit to the repository creates a completely new filesystem tree and is individually labeled with a single, global revision number. Files and directories that have changed are rewritten (and older versions are backed up and stored as differences against the latest version), while unchanged entries are pointed to via a shared-storage mechanism. This is how the repository is able to version tree structures, not just file contents.

Finally, it should be mentioned that using a database like Berkeley DB immediately provides other nice features that Subversion needs: data integrity, atomic writes, recoverability and hot backups. See www.sleepycat.com for more information.

The Network Layer

Subversion has the mark of Apache all over it. At its very core, the client uses the Apache Portable Runtime (APR) library. In fact, this means that a Subversion client should compile and run anywhere Apache httpd does. Right now, this list includes all flavors of UNIX, Win32, BeOS, OS/2, Mac OS X and possibly NetWare.

However, Subversion depends on more than just APR; the Subversion server is Apache httpd itself. Why was Apache chosen? Ultimately, the decision was about not re-inventing the wheel. Apache is a time-tested, open-source server process ready for serious use, yet is still extensible. It can sustain a high-network load. It runs on many platforms and can operate through firewalls. It's able to use a number of different authentication protocols. It can do network pipelining and caching. By using Apache as a server, Subversion gets all these features for free. Why start from scratch?

Subversion uses WebDAV as its network protocol. DAV (distributed authoring and versioning) is a whole discussion in itself (www.webdav.org), but in short, it's an extension to HTTP that allows reads/writes and versioning of files over the Web. The Subversion Project is hoping to ride a slowly rising tide of support for this protocol; all of the latest file browsers for Win32, Mac OS and GNOME speak this protocol already. Interoperability will (hopefully) become more and more of a bonus over time.

For users who simply wish to access Subversion repositories on local disk, the client can do this too; no network is required. The Repository Access (RA) layer is an abstract API implemented by both the DAV and local-access RA libraries. This is a specific benefit of writing a “librarized” revision control system; it's a big win over CVS, which has two very different, difficult-to-maintain code paths for local vs. network repository-access. Feel like writing a new network protocol for Subversion? Just write a new library that implements the RA API.

The Client Libraries

On the client side, the Subversion working copy library maintains administrative information within special /SVN subdirectories, similar in purpose to the /CVS administrative directories found in CVS working copies.

A glance inside the typical /SVN directory turns up a bit more than usual, however. The entries file contains XML that describes the current state of the working copy directory (and that basically serves the purposes of CVS's Entries, Root and Repository files combined). But, other items present (and not found in /CVS) include storage locations for the versioned properties (the metadata mentioned in the “Subversion Features” section above) and private caches of pristine versions of each file. This latter feature provides the ability to report local modifications and do reversions without network access. Authentication data also is stored within /SVN, rather than in a single .cvspass-like file.

The Subversion client library has the broadest responsibility. Its job is to mingle the functionality of the working-copy library with that of the repository-access library, and then to provide a highest-level API to any application that wishes to perform general revision control actions.

For example, the C routine svn_client_checkout() takes a URL as an argument. It passes this URL to the repository-access library and opens an authenticated session with a particular repository. It then asks the repository for a certain tree and sends this tree into the working-copy library, which then writes a full working copy to disk (/SVN directories and all).

The client library is designed to be used by any application. While the Subversion source code includes a standard command-line client, it should be easy to write any number of GUI clients on top of the client library. Hopefully, these GUIs should someday prove to be much better than the current crop of CVS GUI applications, which are no more than fragile wrappers around the CVS command-line client.

In addition, proper SWIG bindings (www.swig.org) should make the Subversion API available for any number of languages: Java, Perl, Python, Guile and so on. In order to Subvert CVS, it helps to be ubiquitous.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

regarding the svn+ssh

kashide's picture

where do I find the detailed steps for setting up the server configuration.

My requirement is that I have database on server A and working forlder on server B.

I am not able to use the tunneling over the SSH.

I think I am doing some mistake.
for the following steps :
$ svn list svn+ssh://host.A.com/repo/project
Kashide@host.A.com's password: *****

I get an error message
"Authontication failed. Please explain me why? and what is the solution for this.

Not as compelling as one would hope

Brian Gallew's picture

I've been a systems administrator for 15 years. I've been a revision control evangelist for at least 10 of those years. I've used RCS, CVS, and PerForce. I've never actually used subversion, though, and for good reason: it's impossible to build.

I'll admit, things have certainly improved over the past few years. At least it's actually possible to get a linux distribution with Subversion installed. Of course, if you're already a user, then you're probably using an older version, and the one that comes with your distro of choice isn't compatible.

Then again, if you happen to actually work for a living with Unix(tm), then you probably aren't in a linux-only environment. How about HP-UX? Solaris? AIX? The only way to actually build subversion on those platforms is to either completely ignore several vendor-supplied packages and build them all from scratch, or else to dig back through Usenet and try to find the magic combination of Subversion, Neon, ... that will work with your installed version of Apache. Or Python. Or Swig. Or whatever.

While I understand that the features offered by Subversion are not only useful, but cover some extremely large gaps in CVS, I can build CVS on any platform, and the *only* dependency is RCS, which also builds cleanly on any platform. Frankly, at this point the only way Subversion is going to become interesting to a huge number of people out there is if they ship it in a pure Python/Java/Perl/Ruby/whatever form (i.e. write it in a high-level scripting language that stands a reasonable chance of running on multiple platforms).

Re: The Subversion Project: Buiding a Better CVS

Anonymous's picture

Subversion is really a great system now. It's making strides against CVS in terms of market share.

From my customers forums:
"One of the features that sold me is the ease of access via http. Using CVS, you always have to struggle with firewalls and setting up ssh tunnels or else, put up with the weak security of pserver."

Feel free to check out our Subversion [wush.net] hosting service if you want to give Subversion a spin. We've got instant setup and a week long free trial. You'll be able to start playing with your repository in 5 minutes :)

regarding the download Subversion for Linux

Ritu's picture

i want to free download subversion on fedora linux.So please tell me the way for this.

Release Date

Anonymous's picture

As of today, more than a year after the proposed v1.0 date, it looks like it's at v0.27 :/

v0.37 = v1.0 RC1 (New target is Feb 23, 2004)

Anonymous's picture

According to their site, v0.37 (dubbed 1.0 RC1) will become 1.0 in another week or so if nothing major gets discovered during testing.

Re: Release Date

Anonymous's picture

Yes, what happened?

Subversion rocks!

Anonymous's picture

This is going to be really good... -- mbp

What is the 'intelligent merging' that is slated?

Anonymous's picture

What does this term 'intelligent merging' mean? It sounds like 'intelligent merging' means if I check in a file that's been updated (ie, changed by someone else) since I checked it out, the VCS *intelligently* merges the changes. Does this system currently support concurrent development or not?

Re: What is the 'intelligent merging' that is slated?

Anonymous's picture

Actually, "intelligent merging" means a solution to the classic "repeated merge" problem you see so often in CVS.

That is, you merge some changes from the trunk to a branch. Then later on, you want to merge more changes from trunk to branch, but end up getting *conflicts*, because some of the changes have already been ported over previously.

With intelligent merging, the system remembers which changesets have been ported to each branch. So it can automatically avoid repeated merges.

This feature won't be in 1.0, but should follow soon after.

Re: What is the 'intelligent merging' that is slated?

Anonymous's picture

Ofcourse it supports concurrent development....

Re: What is the 'intelligent merging' that is slated?

Anonymous's picture

It probably refers to the merging of a "sandbox" sub-project, where, say new functionality is being added to a project, with the main project code tree. That way, new functionality can be added and tested seperately without affecting the stability of the main project code. Once it is ready, this stream can be merged into the main project. Just a guess.

Re: The Subversion Project: Buiding a Better CVS

Anonymous's picture

Found the following link to the project:

http://subversion.tigris.org/

Re: The Subversion Project: Buiding a Better CVS

Anonymous's picture

Was there some reason not to include the project URL in this article? Not that it took long to find with Google, but it does seem like an obvious oversight. The URL in question is http://subversion.tigris.org/.

Re: The Subversion Project: Buiding a Better CVS

Anonymous's picture

I had exactly the same question, why no link available for subversion?

Thanks anyway for posting it here, though it shows up only to those who read beyond the article's end :-)

-shalz

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState