Revision Control with Arch: Maintenance and Advanced Use
Arch is part of a recent generation of revision control systems that provide an important architectural advantage over the old Concurrent Version System (CVS) and its work-alikes. As a decentralized revision control system, Arch allows remote users to join large development efforts without needing to acquire special access privileges. Arch also provides powerful inter-archive operations that encourage participation from third-party contributors.
The previous article in this series [LJ, November 2004] demonstrated basic Arch operations, such as checking out code and creating branches from remote archives. This installment shows how to revert changes in an archive, how to publish your private archives to public mirrors and how to move a copy of your changes from archive to archive when you forget to make a new branch.
The Arch program is called tla. The program name arch is taken by the POSIX standard, which requires that /bin/arch report system information. A lot of information can be found by running tla help. If you need to figure out the arguments to a particular command, such as commit, it helps to run tla commit -H, to see what the tla commit command can do.
One of the more immediate benefits of any revision control system is the ability to undo a change or set of changes. Everyone makes mistakes now and again, and it is important for your tools to provide the means to a graceful recovery.
The quickest way to return a checked-out tree to a state without your local changes is to run tla undo. This creates a directory called ,,undo-1/ that contains all of the changes made. If you so desire, you simply can tla redo to re-apply those changes. For example:
$ tla register-archive http://www.lnx-bbc.org/arch $ tla get \ firstname.lastname@example.org/lnx-bbc--stable bbc $ cd bbc/ $ echo "BIG MISTAKE" > robots.txt $ echo "#smaller change" >> Makefile $ tla undo $ tla redo
The tla undo command is most useful during hold-that-thought moments, when a line of work needs to be set aside briefly for a quick change of some sort. Arch uses the undo and redo commands internally when performing operations such as update or star-merge.
If a mistake is localized to a single file, the entire changeset doesn't need to be backed out. Arch lets you revert the changes made to a single file by generating a unified diff representing that file's changes since the last commit. This diff then can be fed into the patch program in reverse mode, which causes the changes to be unpatched out of the file.
$ tla file-diffs robots.txt | patch -R
If the file had been deleted accidentally, it would be necessary to do touch robots.txt before executing this command. Without a file (even an empty one), Arch has no basis from which to generate the file-diffs. When working with complete changesets, however, Arch is far more intelligent.
One of the big advantages Arch has over its predecessor, CVS, is that it permits the creation and manipulation of changesets. A changeset is a complete collection of all the edits, renames, added and deleted files and log entries recorded during a single tla commit invocation.
Sometimes a changeset is committed that shouldn't be, or a temporary approach to something needs to be backed out before a more permanent one can be implemented. In these cases, revert the changeset by replaying it in reverse:
$ tla replay --reverse \ email@example.com/foo--bar--1.0--patch-4 $ tla sync-tree \ firstname.lastname@example.org/foo--bar--1.0--patch-4
The first command reverts the fourth changeset in the 1.0 version of the bar branch of the foo tree, even if it is not the most recent revision. This has the added effect of backing out the log entry for that changeset as well, so you can use the tla sync-tree command to put the commit log back the way it ought to be.
The patch-4 changeset still is stored in the email@example.com—projects archive, and the tree still can be checked out in that state. Only the current working copy of the code has been affected by the above commands. When the above user runs tla commit, a new changeset will be added that includes the inverse of patch-4.
The tla replay command can be used for more powerful operations than a simple undo. One of the more compelling features of Arch is the ability to cherry-pick particular changesets from a remote archive without having to apply changes you don't need.
Consider the project, foo, maintained by Bob. Bob keeps a stable branch of the project (foo--stable) and an experimental branch (foo--experimental). All releases are generated from the stable branch—foo--stable--2.4.2 being the most recent. The experimental branch is where adventurous new features are made available in a somewhat official location.
Alice plans to work on some experimental code, so she tags off Bob's experimental branch to work in her own space:
$ tla my-id "Alice B. Hacker <firstname.lastname@example.org>" $ tla make-archive -l email@example.com \ sftp://firstname.lastname@example.org/home/abh/public_html/arch $ tla archive-setup foo--hackery--0.0 $ tla register-archive http://entar.net/~bob/fooarch $ tla tag \ email@example.com/foo--experimental--0.0 \ firstname.lastname@example.org/foo--hackery--1.0
In the process of working on her experimental features, Alice discovers a bug that Bob must have overlooked. The fix is simple, so she puts her current work aside with tla undo and checks in the fix:
$ tla undo $ vi buggy_file.c another_buggy_file.c $ tla commit M buggy_file.c M another_buggy_file.c * committed email@example.com/foo--hackery--1.0--patch-9 $ tla redo
Alice soon finishes her changes and tells Bob where her archive lives. Bob decides that her code is acceptable for the experimental branch and star-merges it in:
$ tla get firstname.lastname@example.org/foo--experimental--0.0 $ cd foo--experimental--0.0/ $ tla register-archive http://zork.net/~abh/arch/ $ tla star-merge \ email@example.com/foo--hackery--1.0
While reading Alice's changelog, Bob realizes the bug she fixed exists in the stable branch as well. Because he doesn't want to grab all of the experimental code from her hackery branch, Bob cherry-picks only the changeset that contains the bug fix:
$ tla get firstname.lastname@example.org/foo--stable--2.4.2 $ cd foo--stable--2.4.2/ $ tla replay \ email@example.com/foo--hackery--1.0--patch-9
Alice and Bob were able to work together despite the fact that neither developer shared access to a single system. Neither developer had set up any sort of dedicated server; they were able to use standard stock protocols such as HTTP, SSH and SFTP. Alice's archive had the advantage of being accessible from a Web directory on the Internet, just as Bob's official archive was.
Arch provided the tools for Alice and Bob to manipulate their two separate archives, and the differences between them, using nothing more exotic than Apache and OpenSSH.
Sending so much code over the Internet always has made free software developers at least a little nervous, even if only in the back of their minds. The current system of peer review seems to have solved the problem of malicious code submissions quickly and effectively, but it would help to be able to identify each changeset's author beyond a reasonable doubt.
Arch allows developers to sign their changesets cryptographically, allowing verification of submitter identity through a web of trust. Although this does not conclusively prove the intentions of the developer in question, it raises the bar for forged submissions.
To use cryptographic signatures in Arch, you first must generate a GnuPG key.
$ gpg --gen-key
Unfortunately, signed archives are somewhat different functionally from the unsigned variety. This makes it necessary to keep a separate archive for signed commits. Running tla make-archive with the -s switch creates an archive capable of storing GnuPG signatures:
$ tla make-archive -ls firstname.lastname@example.org \ ~/SIGNED-ARCHIVE $ tla my-default-archive email@example.com
Finally, a few configuration files must be created in order for Arch to sign changesets and verify signatures. First, an awk script included in the tla distribution, called gpg-check.awk, must be installed somewhere on the system where Arch is run. The Debian tla packages install it to /usr/bin/tla-gpg-check by default. In order for Arch to verify signatures, the file ~/.arch-params/signing/=default.check should contain a single line that reads:
$ mkdir ~/.arch-params/signing/ $ echo \ 'tla-gpg-check gpg_command="gpg --verify-files -"'\ > ~/.arch-params/signing/\=default.check
If you want keys to be downloaded automatically from a public keyserver as needed, you can add parameters such as --keyserver pgp.mit.edu --keyserver-options auto-key-retrieve to the gpg_command. This causes Arch to download keys from pgp.mit.edu as needed and verify the signatures in an archive against these keys during the get or update operations.
For Arch to sign changesets automatically that you commit to an archive created with the -s option, the ~/.arch-params/signing/=default file must be one single line like the following, substituting the address you used when you created your key:
$ echo \ 'gpg --default-key "<firstname.lastname@example.org>" --clearsign' \ > ~/.arch-params/signing/\=default
In the above cherry-picking example, Alice B. Hacker used a Web-accessible directory for her personal archive. This is convenient, but it poses a problem for disconnected use. What if Alice wanted to work from her laptop during a long airplane flight or train ride? She either would have to generate changeset tarballs with tla changes or star-merge her various branches manually one by one from her laptop to her Web-space archive when she reached a network connection. Fortunately, Arch permits the creation of archives that are simply mirrors of other archives:
$ tla make-archive -ls --mirror-from \ email@example.com \ sftp://firstname.lastname@example.org/public_html/arch/
In this instance of make-archive, J. Random Hacker is creating an archive in his public_html directory on an Internet server. Once the mirror archive is created, it shows up in a tla archives listing as email@example.com-MIRROR. Now data can be pushed to it with a single command:
$ tla archive-mirror firstname.lastname@example.org
In addition to push mirrors that copy local archive data to remote systems, Arch allows pull mirrors that create local copies of remote archives:
$ tla make-archive -ls --mirror \ email@example.com \ /var/tmp/gar-cache $ tla archive-mirror firstname.lastname@example.org
This can be handy during disconnected operation, when a local branch may not be sufficient. Pull mirrors allow read-only access to a remote archive's data while off the Net.
One drawback to the email@example.com—signed-MIRROR archive is that it is a separate signed archive in its own right. This means J. Random Hacker must sign each changeset as it is copied from the original archive to the mirror.
In some cases, this is the desired effect. A release manager personally vouches for each changeset that enters the public mirror, for example. In most cases, however, it is important simply to copy the existing signatures along with the changeset. This is achieved by creating a special file on the system where tla archive-mirror is run:
$ echo firstname.lastname@example.org > \ ~/.email@example.com-MIRROR
Mirrors are extremely useful, but they are, by nature, read-only. The only way changes can be committed to a mirror is through the original archive by way of tla archive-mirror.
Consider Alice's laptop mirror situation. While sitting in the observation car of Amtrak's Coast Starlight, she pulls out her laptop and does tla get to grab some code out of a local mirror of firstname.lastname@example.org. Somewhere in the Willamette Valley, she finds inspiration and completes a remarkably useful hack.
Any attempt to commit her changes would receive the message attempt to write directly to mirror, which means the commit failed. The simple solution is to wait until she reaches an Internet access point and use the undo and redo commands:
$ tla undo ,changes-to-mirror $ cd ~/real-project/ $ tla redo ~/mirror-checkout/,changes-to-mirror/ $ tla commit
This works fine if your changes are not enough to require more than one changeset. For longer detached sessions, you'll want to make a new local branch.
After her trip down the Pacific Coast, Alice takes the Zephyr to Chicago. It is a longer trip, and she found herself working in a local mirror of email@example.com on the foo--stable--2.4.2 code. After a few hours of work, she decides to move her changes to a new branch.
First, she makes a new archive and branch on her laptop:
$ tla make-archive -l firstname.lastname@example.org ~/arch $ tla my-default-archive email@example.com $ tla archive-setup foo--laptop-hacks--1.0
Next, she tags off the mirror branch to her new archive. She runs the tla logs command in shell backticks so she doesn't have to remember which patch level and version she was working in at the moment:
$ tla tag `tla logs -r -f | head -n 1` \ foo--laptop-hacks--1.0
Finally, Alice coerces the checked-out copy into believing it is the first revision in her new laptop-hacks branch:
$ tla sync-tree foo--laptop-hacks--1.0--base-0 $ tla set-tree-version foo--laptop-hacks--1.0
At this point, she has shifted her checked-out copy from the read-only mirror over to a read-write archive hosted on her laptop.
Setting up mirrors before long disconnected sessions is a lot like packing for a trip: you always forget the one thing you really needed. It would be frustrating to plug your laptop in to the light socket of your mountain cabin only to find that your checked-out copy of some crucial code came from an HTTP archive.
Fortunately, you can use some of the same techniques to move a checked-out copy to a new branch even if you can't reach the old read-only archive.
Alice checked out a copy of a project called bar while sitting in an Internet café in Chicago. On her return trip to California, she decides to work on the code. After another hour of prodigious efforts, she decides yet again that it is time to make her own branch in which to work.
Because the original archive is inaccessible, tagging off a branch is impossible. Fortunately, much of the changelog and history information is present in the checked-out tree, so Alice temporarily backs out her changes with tla undo and then forces the checked-out copy into her new branch:
$ tla archive-setup bar--train-ride--1.0 $ tla set-tree-version bar--train-ride--1.0 $ tla add-log-version bar--train-ride--1.0 $ tla import
Once this is done, Alice runs tla redo and then tla commit. Now the revision she grabbed in Chicago is bar--train-ride--1.0--base-0, and her changes are bar--train-ride--1.0--patch-1.
Although this method is not perfect, it still is possible to star-merge to and from the original branch without trouble. If Alice found her work on the bar project to be more involved, she most likely would merge with the upstream archive and make a proper branch when she found Internet access again.
You now know how to publish your archives to the Internet and how to work remotely with Arch. You even have a few tricks up your sleeve for when you make mistakes,
The third and final article in this series will show you how to administer a centralized official archive while retaining all of the benefits of Arch's distributed workings. You'll learn some tricks for scripting around your archives to create reports on development activity, as well as the creation of a test build infrastructure.
Nick Moffitt is a Linux professional living in the San Francisco Bay Area. He is the build engineer for the LNX-BBC Bootable Business Card distribution of GNU/Linux and the author of the GAR build system. When not hacking, he studies the history of urban public transportation.