Synchronizing Your Life
Once upon a time, one computer was all you needed. All of your documents lived on that computer, or a stack of floppies or CD-Roms nearby, and nowhere else. Those days are gone, much like the one-car, one-TV, and one-iPod days.
Today I have my home computer and my wife has hers. There's also my laptop, my daughter's laptop, my work computer, and my file server. At any time I could find myself sitting in front of any of these and wherever I happen to be sitting there is bound to be a file that is sitting on one of the others that I would prefer to be readily available. These files are mostly along the lines of current projects I'm working on. If inspiration strikes I want to be able to open up the appropriate file or create a new one and start writing without worry. I worry because keeping these files synchronized across all my logins on the various computers I might sit in front of in a single day is a big issue.
There are many ways of keeping files up-to-date across multiple computers. The simplest is to carry everything around with me on a USB key or other writable removable media. I do use this for some files, mainly those that I want to keep very secure. USB keys are sometimes inconvenient though. My file server, for example, is stuck away in a closet, with the keyboard, monitor, and mouse routed out to a little desk that sits outside the closet. Getting to the USB ports on the back of the server is not easy.
Another simple method is to copy the files back and forth using scp like so:
scp -rP /home/me/Documents firstname.lastname@example.org:/home/me/
This works but I quickly run into problems when I have modified one file on the home computer, and a different file on the laptop. When I next go to scp I am going to overwrite one of the files depending on which computer I initiate the scp from. To prevent this, I need to always scp at the end of every editing session, but I don't always remember to do that.
Another problem with scp is that it always copies everything, even if an identical copy exists at the destination. This is one of the problems that rsync solves quite well. The above scp command can be duplicated like so with rsync:
rsync -avP /home/me/Documents email@example.com:/home/me
With rsync, any files that already exist at the destination will not be transferred. This speeds up the transfer time considerably. However, there is still the problem of having modifications made on both sides. By default, the rsync program only looks to see if the files are different in size and timestamp. It doesn't care which file is newer, if it is different, it gets overwritten. You can pass the '--update' flag to rsync which will cause it to skip files on the destination if they are newer than the file on the source, but only so long as they are the same type of file. What this means is that if, for example, the source file is a regular file and the destination is a symlink, the destination file will be overwritten, regardless of timestamp. Even looking past its quirks, the --update flag does not solve the problem because all it does is skip files on the destination if they are newer, it doesn't pull those changes down to the source computer.
Another problem that both scp and rsync have is versioning. Once the files on the destination are overwritten there's no going back to what was there before.
In order to keep files in sync on multiple machines and keep a history of changes the obvious choice is to use one of the many version control systems that are out there. Git, and Bazaar are two popular choices. They have a steep learning curve, but once you get past that, they become very useful in many situations. Packages for both can be found in most package repositories. On Ubuntu the packages for git and Bazaar are called git-core and bzr respectively.
To use one of these to keep files in sync on multiple computers, the sequence of events goes something like the following. In the example I use git, but Bazaar is similar. One final note on the example is that Computer 1 has an ip address of 192.168.0.1 and computer 2 has an ip address of 192.168.0.2.
To get started with git on computer 1:
cd /home/me/Documents/shared git init git add * git commit -a
In the above commands I switch to the directory I want to put under version control and use the 'git init' command to turn the directory into a git repository. I then use 'git add *' to add everything in the directory to the new repository. Lastly I check everything in. Now on computer 2 I do the following:
cd /home/me/Documents git clone ssh://192.168.0.1/home/me/Documents/shared
The shared directory will be cloned from computer 1 to computer 2. I can now edit anything I want on computer 2. When I am done, I commit the changes on computer 2 like so:
git commit -a
Now when I get back to computer 1 I can pull down the changes I made on computer 2 like so:
cd /home/me/Documents/shared git pull ssh://192.168.0.2/home/me/Documents/shared
Now both computer 1 and computer 2 are again in sync. On the off chance that the same files have been edited on computer 1 and computer 2, git will let me know that there is a merge conflict. These conflicts are usually easy to fix and nothing is lost since a history of all changes is kept and I can revert back to any previous version at any time.
The bad thing, as you may have noticed, is that the process is labor and memory intensive. I say memory because I need to remember to commit after making changes and then when I am on a different computer I need to remember to pull down the changes from the computer that I was on. In the example above that's not a huge problem because there are only two computers, but with all of the computers I regularly use, remembering where I've been is a problem. I could pull from every other computer every time I sit down in front of one, but that is tedious and disruptive to my work flow. What I really want is for the synchronization to happen in the background.
I should mention one other way of using git or Bazaar to manage documents: to work with one repository and then rsync it to the different computers. The benefits of rsync still apply, and you get versioning thanks to git or Bazaar. The downsides of each method still exist though, including the problem of rsync assuming that the source is the correct version and the destination can be overwritten. With the addition of versioning, this method is an improvement over rsync alone, but not by much.
Wua.la was one option that I considered. It allows you to trade storage with others on the Internet securely and there is filesystem integration through a built-in nfs server. I wrote about Wua.la here: http://www.linuxjournal.com/content/online-storage-wuala. Even though at that time I used Wua.la's NFS integration, I don't do so now because I found it too buggy. So while I use Wua.la for backups, it's not something I trust for behind-the-scenes synchronization, and it does not do versioning.
What I want is something simple, integrated with my file manager (Nautilus) and which works in the background without me having to think about it. It should "just work".
There is one new program+service that, on first glance, fits the bill perfectly: Dropbox.
Dropbox allows you to store your files online and keep them synchronized between various computers. They provide clients for Windows, Macintosh, and Linux, so it is about as cross-platform as they come.
Setting up Dropbox on Linux involves installing their nautilus-dropbox plugin for Nautilus and the dropboxd daemon that communicates with the Dropbox servers. Packages that include both programs are available on the http://getdropbox.com website for Fedora 9, Ubuntu 7.10, and Ubuntu 8.04. The plugin is GPL'd, so the source to it can also be downloaded and compiled manually if you wish. The dependencies for the source include GTK 2.12 or higher, GLib 2.14 or higher, Nautilus 2.16 or higher, Libnotify 0.4.4 or higher, and Wget 1.10 or higher. The dropboxd daemon is closed-source and proprietary, unfortunately, so if you are not on an x86 or x86_64 platform, you are out of luck.
Once the package is installed, to get Dropbox working all you have to do is restart Nautilus with "killall nautilus" from a terminal window or you can log out and then back in.
With that done, a little icon will appear in the notification area and a configuration wizard will appear. After going through the simple signup process (or connecting to an existing account) a "Dropbox" folder will appear in your home directory and a brief tour will appear. There is also a Dropbox contextual menu that appears when right-clicking while in the Dropbox folder or sub folders.
Every file you put or create in the Dropbox folder is automatically synchronized to your Dropbox account on getdropbox.com and from there to every other computer that you have Dropbox running on. This synchronization is automatic and happens every time a file is saved, moved, or updated in any way.
To help you keep track of the status of files, Dropbox adds several emblems to Nautilus. Emblems are little icons that you can add to other icons to indicate the status of a file. Dropbox automatically adds these emblems and changes them as necessary. This makes it very easy to see at a glance which files have been successfully synchronized (green circle with a checkmark), and which files are in the process of being synced (blue circle with arrows). The notification area icon also animates to indicate status.
Since Dropbox is a web-enabled technology, there is of course a web front-end to your files. This comes in very handy when I am on a computer I don't own and need to access a document.
One other nice thing that Dropbox does is versioning. Using the web interface you can see previous versions and revert back to them.
On the surface, Dropbox is everything I am looking for. It keeps the files I'm working on in sync across all of the computers I use, it does this in the background and provides simple versioning in case I want to revert back to a previous version of a file. Dropbox is not without issues though.
One issue is that it does not tolerate case changes in filenames. I had one directory in my Dropbox directory named 'writing' that for some reason I wanted to rename to 'Writing'. When I did this Dropbox went crazy and started creating new directories in an attempt to solve the conflict. These new directories kept proliferating to the point where I had to stop Dropbox and delete all of them and rename the 'Writing' directory to 'My Writing'.
Another issue is that when I'm editing a file I tend to save often and occasionally my text editor will report when I try to save that "the file has been modified" since I last saved. I don't know what Dropbox has done, but it has obviously done something to make my editor think that the file has been changed in some way outside of its control. I haven't lost any work as far as I can tell, but messages like that worry me.
Program bugs aside, the biggest issue I have with Dropbox is that it is not fully open source. The nautilus plugin is, but the plugin is useless without the behind-the-scenes service. With the dropboxd service daemon not being open, what happens if Evenflow (the company behind Dropbox) goes belly-up? I have no idea what their financial situation is, but in today's economic climate, anything is possible. Also, this is my data, and while they say it is encrypted and protected, I don't trust them (or anyone else, for that matter). There are too many horror stories of supposedly private and secured data that has "gone missing" or been out-and-out stolen.
I suppose what I really want to be able to do is run my own Dropbox-like server on hardware that I have complete control over using encryption I trust.
So, while Dropbox is a wonderfully useful program, the issue of personal control and trust pretty much rules them out for the synchronization of important files across the various computers I use. Instead I use it for project files where the convenience outweighs the loss of total security and control.
For those files I don't trust with Dropbox, I use a mixture of the other methods, depending on the file and how paranoid I am about loosing it and whether or not I want versioning. The great thing is, there are a lot of options out there for making sure I have the files I want, when and where I want them.
- My Childhood in a Cigar Box
- Papa's Got a Brand New NAS
- Applied Expert Systems, Inc.'s CleverView for TCP/IP on Linux
- Tech Tip: Really Simple HTTP Server with Python
- Rogue Wave Software's TotalView for HPC and CodeDynamics
- Panther MPC, Inc.'s Panther Alpha
- Simplenote, Simply Awesome!
- NethServer: Linux without All That Linux Stuff
- Debugging Democracy