An Introduction to the arch Version Control System

by Ralph Krause

arch is a software version control system similar to CVS and Subversion that was started by programmer Tom Lord to meet his version control needs. This article introduces some of arch's key features and presents a brief tutorial on using version 1.0pre11.

One reason for arch's creation was to overcome some weaknesses in existing version control systems, such as the lack of atomic commits, the inability to keep track of file renames and difficulties when working on different branches of a project.

arch also provides support for easily and intelligently merging code from several different branches (e.g., stable, development, feature-test) of a project. Projects and revisions stored in arch have globally unique names, which allows branch and merge operations to span network boundaries.

arch was started in November 2001 and since then it has gone from a concept to having early adopters and distributed repositories. arch uses many small specialized tools to get the job done and is written using a combination of C code, awk and shell scripts.

Installing arch

First, download arch from www.regexps.com and untar the source file.

arch's configure script frowns upon building arch in the same directory as the source code, so first you must create a build directory under arch's src directory. From the build directory, run ../configure, make and make install to get a working copy of arch.

If you don't specify an installation prefix for the configure script, arch's executables will be put in a directory called /install under the build directory. To control where install puts files use the --prefix= option when you run the configure script. If arch's bin directory isn't in your path, you can either add arch's bin directory to your PATH variable or copy the files from the =install/bin directory to your normal bin (e.g., /usr/local/bin). If you move the other arch directories (include, lib, libexec) arch won't work.

For shared archives, arch can be configured to send e-mails when project changes occur. If sendmail isn't in your path, you have to use the --with-sendmail=sendmail location option with the configure script to tell arch where it is.

While arch is the project name, larch is the program that runs all of arch's commands. You can get a list of arch commands by typing larch --help-commands and command-specific help by typing larch command --help.

Setting up arch

The first thing to do is set up your user ID so arch can correctly identify your changes. An arch ID consists of two parts: a free form part (i.e., your name) and a unique part (i.e., your e-mail address). To create your user ID use the following command:

larch my-id "John Doe <johndoe@somewhere.com>"

Next, you will want to create an archive for arch to use. arch archives can be located on your machine or accessed over a network using FTP. To create an archive on your local machine use the following command:

larch make-archive name location

An archive's name should be globally unique so the arch manual suggests that you use your e-mail address as the base name of the archive followed by a pair of dashes and then the archive name. An example of this type of archive name would be johndoe@somewhere.com--archive. If you are creating a shared archive, make sure that other users have write permissions to it.

Location specifies the directory to create for the archive.

To access shared archives or archives over a network you need to tell arch where they are with larch register-archive name location.

Here the archive's name should be provided by its maintainer. The archive's location can be something local, such as /usr/local/arch/{archive}, or an FTP site such as ftp://somewhere.com/pub/archive.

How arch Identifies Projects in the Archive

The arch archive is organized along development paths that are made up of successive revisions of your source code files. A development path is identified by a combination of user-specified and arch-generated names. The user-specified names describe a project's category, branch and version, while the arch-generated names identify each revision of code put into the archive.

One choice for the archive category would be your application's name. The following command will create a new category in the archive called myprogram:

larch make-category myprogram

A branch name can be used to differentiate between sets of code such as development code and released code. For example, you could make a development branch with larch make-branch myprogram--development.

If you don't want to use a branch name for your code, issue the make-branch command with just the category name with nothing following it.

Finally, each project in the archive has a version number made up of digits in the format of major.minor. The archive version number is not a revision number; it does not change each time you check in changed code. A version number can be equivalent to your application's version numbers (e.g., myprogram 1.0, myprogram 1.5, myprogram 2.0, etc.). Set the version number with larch make-version myprogram--development--1.0

Importing Source Files into arch

To put your source files in the archive you have to initialize the the project tree, create a project log and then import your files into the archive.

Initializing the project tree creates an {arch} directory where arch stores information such as a project inventory and a patch log. Change into your project's source directory and then execute larch init-tree. Next, tell arch which version of your software this tree represents with larch set-tree-version myprogram--development--1.0.

Then tell arch to create a patch log, which stores log entries for revisions in the development path with larch add-log myprogram--development--1.0.

Each revision committed to the archive must be accompanied with a log message created from a template. To create an empty log message run larch make-log. This creates an empty log file whose name begins with "++log". This file contains a Summary: line, Keywords: line and a free form area for longer comments.

Open the empty log file with your editor and fill it in. You need to put something after Summary, but you don't need to add anything after Keywords. There needs a blank line below Keywords, and then you must type a log message below the blank line.

Once these values are filled in and the log is saved, you can import your source files into the arch archive by using larch import.

arch uses one of three methods to determine which of your project files need to be controlled. These three methods are: naming conventions, explicit inventories and implicit inventories. The method arch uses is set by using the larch tagging-method command; to see what the tagging method for a project is type larch tagging-method.

If you use naming conventions, arch assumes that almost every file in the directory is a source file. There are some exceptions, and these are explained in section 5.3 of the arch manual. Using naming conventions requires the least amount of work on your part, but under this method arch won't be able to tell when files have been renamed. Renaming will show up in arch as one file being deleted and a new file added.

With explicit tagging you have to tell arch which files to control by using the larch add command to add them to the project's inventory. Using explicit tagging requires more work because you have to tell arch every time you add, delete or rename a file but this allows arch to keep precise track of changes to your source code tree.

The third method, implicit tagging, combines features of the other two methods. Every file that matches the source code naming convention is assumed to be source code but you can also embed arch tags in specific source files. Using explicit arch add, delete, and rename commands or embedding tags allows arch to track filename changes.

If you use either implicit or explicit tagging, the larch tree-lint command will check your source tree for missing files, untagged files, duplicate file tags and files not matching naming conventions. Use larch inventory to see what arch considers source files in your project tree.

arch allows you to create a list of source files required by your project, which is called a manifest, by typing larch set-manifest. To see if files have been added or removed from your project tree use larch check-manifest.

Committing Changes to arch

The sequence for committing your changed files back into the archive entails a number of steps. Before committing your changes, generate a log form using larch make-log. Edit it with your favorite editor, following the same rules explained above for project logs. After editing the log file, commit the changes to the archive by issuing the commit command, larch commit. arch doesn't provide a mechanism to commit individual files to the archive; the entire project tree is archived.

The commit command causes arch to generate a patch file based on the changes. If you receive an error like unrecognized option --posix during this operation, check your version of patch. GNU patch 2.5 doesn't recognize the --posix option used by arch but version 2.5.4 does.

arch uses a unique revision naming scheme to differentiate among the revisions of code placed into the archive. The first time files are committed to the archive they are given a revision name of base-0.

Subsequent commit actions start the revision value at patch-1 and then increment the digits (e.g., patch-1, patch-2, etc.). These patch revisions are meant to indicate a prerelease project tree.

Once the files are in a releasable state, you would commit them in the archive with larch commit --seal. This sets the revision level to version-0. Once the code is sealed, using larch commit without any flags will generate the error message: too late for ordinary patches. To commit changes (e.g., bug fixes) to a sealed project tree, use the --fix commit flag. This resets the revision to versionfix-1 and increments the number on subsequent fix commits.

While the different commit flags may sound complicated, it helps formalize a develop-release-fix cycle for the software. However, nothing mandates that you have to use anything other than larch commit to put your source code in the archive.

When it is time to work on the next version of the software (e.g., myprogram 2.0) make the new version in the archive, make a log for the new version and then commit your source code to the new version using the larch commit --continuation command. These operations are explained in more detail in section 11.10 of the arch manual.

In commit operations, arch can automatically update a changelog for your project. Create the initial changelog file by issuing larch changelog>ChangeLog. This creates a file called ChangeLog in your source directory. If you are using explicit tagging, add this to your project's inventory. Now each larch commit command looks at this file and updates the file with the log entry you created for the commit operation.

Getting Source Files from the Archive

arch doesn't have a formal check-out procedure to get a working copy of code. Any directory with the {arch} control directory is a working directory. To copy a project from the archive and place it in a local directory, you use the get command: larch get revision target-dir.

You can prepend a specific archive name to revision or leave it off to use your default archive. If you don't specify the target directory in which to store the project tree, a subdirectory will be created under your current directory.

Specify the revision of the code to get using the category--branch--version--revision format. If you don't specify a revision, arch will get the latest revision of code. If you don't specify a version, arch will get the latest version and revision from the specified branch.

Merging Source Code Changes

If you are working on a project with other developers, there will be instances when you need to bring your copy of the source files up to date with changes that have been committed the archive. arch provides two methods of getting up to date files: larch update and larch replay.

These commands apply patches against your local source code to bring it up to date against a specific revision. The update command determines all of the changes between your code and the requested revision and attempts to bring your code up to date with one patch operation. Replay determines which patches you need applied to your code and then attempts to apply them one after the other. If a conflict arises when patching your files, the process halts and you are informed of where the problem occurred.

arch also contains powerful commands for merging code from multiple branches of a project: larch star-merge and larch reconcile. Sections 14 and 17 in the arch manual present detailed examples and instructions for using these commands.

Conclusion

arch is a new version control system that attempts to overcome some shortcomings of current systems. While arch's way of doing things may seem unfamiliar or arbitrary, it is flexible enough to be adapted to your ways instead of forcing you to change. Appendix B in the arch manual explains how to use arch to merge changes in projects using milestone numbering and odd/even numbering.

arch is still under development, and its web site is a good place for information and help. arch also comes with a 160+ page manual, which, while not complete, does provide explanations and examples of most of arch's features.

I would like to thank arch's author, Tom Lord, for taking the time to answer my questions while I was writing this article. If you are interested in arch you can contact him at lord@emf.net.

Ralph Krause is a writer, webmaster, programmer, dog owner and avid reader. He has been working with Linux for over three years.

Load Disqus comments