Synchronizing Your Life

Once upon a time, one computer was all you needed. All of your documents lived on that computer, or a stack of floppies or CD-Roms nearby, and nowhere else. Those days are gone, much like the one-car, one-TV, and one-iPod days.

Today I have my home computer and my wife has hers. There's also my laptop, my daughter's laptop, my work computer, and my file server. At any time I could find myself sitting in front of any of these and wherever I happen to be sitting there is bound to be a file that is sitting on one of the others that I would prefer to be readily available. These files are mostly along the lines of current projects I'm working on. If inspiration strikes I want to be able to open up the appropriate file or create a new one and start writing without worry. I worry because keeping these files synchronized across all my logins on the various computers I might sit in front of in a single day is a big issue.

There are many ways of keeping files up-to-date across multiple computers. The simplest is to carry everything around with me on a USB key or other writable removable media. I do use this for some files, mainly those that I want to keep very secure. USB keys are sometimes inconvenient though. My file server, for example, is stuck away in a closet, with the keyboard, monitor, and mouse routed out to a little desk that sits outside the closet. Getting to the USB ports on the back of the server is not easy.

Another simple method is to copy the files back and forth using scp like so:

scp -rP /home/me/Documents me@192.168.0.2:/home/me/

This works but I quickly run into problems when I have modified one file on the home computer, and a different file on the laptop. When I next go to scp I am going to overwrite one of the files depending on which computer I initiate the scp from. To prevent this, I need to always scp at the end of every editing session, but I don't always remember to do that.

Another problem with scp is that it always copies everything, even if an identical copy exists at the destination. This is one of the problems that rsync solves quite well. The above scp command can be duplicated like so with rsync:

rsync -avP /home/me/Documents me@192.168.1.2:/home/me

With rsync, any files that already exist at the destination will not be transferred. This speeds up the transfer time considerably. However, there is still the problem of having modifications made on both sides. By default, the rsync program only looks to see if the files are different in size and timestamp. It doesn't care which file is newer, if it is different, it gets overwritten. You can pass the '--update' flag to rsync which will cause it to skip files on the destination if they are newer than the file on the source, but only so long as they are the same type of file. What this means is that if, for example, the source file is a regular file and the destination is a symlink, the destination file will be overwritten, regardless of timestamp. Even looking past its quirks, the --update flag does not solve the problem because all it does is skip files on the destination if they are newer, it doesn't pull those changes down to the source computer.

Another problem that both scp and rsync have is versioning. Once the files on the destination are overwritten there's no going back to what was there before.

In order to keep files in sync on multiple machines and keep a history of changes the obvious choice is to use one of the many version control systems that are out there. Git, and Bazaar are two popular choices. They have a steep learning curve, but once you get past that, they become very useful in many situations. Packages for both can be found in most package repositories. On Ubuntu the packages for git and Bazaar are called git-core and bzr respectively.

To use one of these to keep files in sync on multiple computers, the sequence of events goes something like the following. In the example I use git, but Bazaar is similar. One final note on the example is that Computer 1 has an ip address of 192.168.0.1 and computer 2 has an ip address of 192.168.0.2.

To get started with git on computer 1:

cd /home/me/Documents/shared
git init
git add *
git commit -a

In the above commands I switch to the directory I want to put under version control and use the 'git init' command to turn the directory into a git repository. I then use 'git add *' to add everything in the directory to the new repository. Lastly I check everything in. Now on computer 2 I do the following:

cd /home/me/Documents
git clone ssh://192.168.0.1/home/me/Documents/shared

The shared directory will be cloned from computer 1 to computer 2. I can now edit anything I want on computer 2. When I am done, I commit the changes on computer 2 like so:

git commit -a

Now when I get back to computer 1 I can pull down the changes I made on computer 2 like so:

cd /home/me/Documents/shared
git pull ssh://192.168.0.2/home/me/Documents/shared

Now both computer 1 and computer 2 are again in sync. On the off chance that the same files have been edited on computer 1 and computer 2, git will let me know that there is a merge conflict. These conflicts are usually easy to fix and nothing is lost since a history of all changes is kept and I can revert back to any previous version at any time.

The bad thing, as you may have noticed, is that the process is labor and memory intensive. I say memory because I need to remember to commit after making changes and then when I am on a different computer I need to remember to pull down the changes from the computer that I was on. In the example above that's not a huge problem because there are only two computers, but with all of the computers I regularly use, remembering where I've been is a problem. I could pull from every other computer every time I sit down in front of one, but that is tedious and disruptive to my work flow. What I really want is for the synchronization to happen in the background.

I should mention one other way of using git or Bazaar to manage documents: to work with one repository and then rsync it to the different computers. The benefits of rsync still apply, and you get versioning thanks to git or Bazaar. The downsides of each method still exist though, including the problem of rsync assuming that the source is the correct version and the destination can be overwritten. With the addition of versioning, this method is an improvement over rsync alone, but not by much.

Wua.la was one option that I considered. It allows you to trade storage with others on the Internet securely and there is filesystem integration through a built-in nfs server. I wrote about Wua.la here: http://www.linuxjournal.com/content/online-storage-wuala. Even though at that time I used Wua.la's NFS integration, I don't do so now because I found it too buggy. So while I use Wua.la for backups, it's not something I trust for behind-the-scenes synchronization, and it does not do versioning.

What I want is something simple, integrated with my file manager (Nautilus) and which works in the background without me having to think about it. It should "just work".

There is one new program+service that, on first glance, fits the bill perfectly: Dropbox.

Dropbox allows you to store your files online and keep them synchronized between various computers. They provide clients for Windows, Macintosh, and Linux, so it is about as cross-platform as they come.

Setting up Dropbox on Linux involves installing their nautilus-dropbox plugin for Nautilus and the dropboxd daemon that communicates with the Dropbox servers. Packages that include both programs are available on the http://getdropbox.com website for Fedora 9, Ubuntu 7.10, and Ubuntu 8.04. The plugin is GPL'd, so the source to it can also be downloaded and compiled manually if you wish. The dependencies for the source include GTK 2.12 or higher, GLib 2.14 or higher, Nautilus 2.16 or higher, Libnotify 0.4.4 or higher, and Wget 1.10 or higher. The dropboxd daemon is closed-source and proprietary, unfortunately, so if you are not on an x86 or x86_64 platform, you are out of luck.

Installing Dropbox

Once the package is installed, to get Dropbox working all you have to do is restart Nautilus with "killall nautilus" from a terminal window or you can log out and then back in.

Dropbox Setup

With that done, a little icon will appear in the notification area and a configuration wizard will appear. After going through the simple signup process (or connecting to an existing account) a "Dropbox" folder will appear in your home directory and a brief tour will appear. There is also a Dropbox contextual menu that appears when right-clicking while in the Dropbox folder or sub folders.

Dropbox Welcome Tour

Every file you put or create in the Dropbox folder is automatically synchronized to your Dropbox account on getdropbox.com and from there to every other computer that you have Dropbox running on. This synchronization is automatic and happens every time a file is saved, moved, or updated in any way.

To help you keep track of the status of files, Dropbox adds several emblems to Nautilus. Emblems are little icons that you can add to other icons to indicate the status of a file. Dropbox automatically adds these emblems and changes them as necessary. This makes it very easy to see at a glance which files have been successfully synchronized (green circle with a checkmark), and which files are in the process of being synced (blue circle with arrows). The notification area icon also animates to indicate status.

Dropbox uses emblems to show you the status of items.

Since Dropbox is a web-enabled technology, there is of course a web front-end to your files. This comes in very handy when I am on a computer I don't own and need to access a document.

The Dropbox web interface is another way to access your files.

One other nice thing that Dropbox does is versioning. Using the web interface you can see previous versions and revert back to them.

The Dropbox web interface lets you revert to previous versions of files.

On the surface, Dropbox is everything I am looking for. It keeps the files I'm working on in sync across all of the computers I use, it does this in the background and provides simple versioning in case I want to revert back to a previous version of a file. Dropbox is not without issues though.

One issue is that it does not tolerate case changes in filenames. I had one directory in my Dropbox directory named 'writing' that for some reason I wanted to rename to 'Writing'. When I did this Dropbox went crazy and started creating new directories in an attempt to solve the conflict. These new directories kept proliferating to the point where I had to stop Dropbox and delete all of them and rename the 'Writing' directory to 'My Writing'.

Another issue is that when I'm editing a file I tend to save often and occasionally my text editor will report when I try to save that "the file has been modified" since I last saved. I don't know what Dropbox has done, but it has obviously done something to make my editor think that the file has been changed in some way outside of its control. I haven't lost any work as far as I can tell, but messages like that worry me.

Program bugs aside, the biggest issue I have with Dropbox is that it is not fully open source. The nautilus plugin is, but the plugin is useless without the behind-the-scenes service. With the dropboxd service daemon not being open, what happens if Evenflow (the company behind Dropbox) goes belly-up? I have no idea what their financial situation is, but in today's economic climate, anything is possible. Also, this is my data, and while they say it is encrypted and protected, I don't trust them (or anyone else, for that matter). There are too many horror stories of supposedly private and secured data that has "gone missing" or been out-and-out stolen.

I suppose what I really want to be able to do is run my own Dropbox-like server on hardware that I have complete control over using encryption I trust.

So, while Dropbox is a wonderfully useful program, the issue of personal control and trust pretty much rules them out for the synchronization of important files across the various computers I use. Instead I use it for project files where the convenience outweighs the loss of total security and control.

For those files I don't trust with Dropbox, I use a mixture of the other methods, depending on the file and how paranoid I am about loosing it and whether or not I want versioning. The great thing is, there are a lot of options out there for making sure I have the files I want, when and where I want them.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

For online backup also

online backup's picture

For online backup also consider http://www.backupvault.ie/, as it also provides best services for online backup.

Encrypted volumes

ph7's picture

I'm no IT pro, but encrypted volumes are a neat way to ensure maximum security with dropbox syncs. Forget any merging solution nor multiple users dealing with that file at the same time... Never heard of Truecrypt, but on OSX you can natively create small "sparseimage" files with the disk utility that are encrypted and really easy to mount/unmount. (like DMGs) Sparse image only take the place they need, but with a little overhead maybe due to encryption.

Dropbox is realy neat once you get used to it. I would really enjoy a similar solution that let you choose your storage/versioning server to allow faster sync (like LAN or VPNs on T1).

sync files... ifolder

B-rad's picture

I have searched high and low for a good product to use in this method. Most of them are windows only and you have to pay for the product or they just don't do the right thing.

If you have patience, time and pure will you can give Novell's iFolder product a try. www.ifolder.com They open sourced some of it, but it's not very polished. They don't appear to have updated the site in a long time. It has a Linux and Windows client. The backend server runs on Apache/Linux.

It doesn't do anything with version control, i don't have much use for that. But it does notify you about file conflicts. You install the client and select which folders you want to sync. Set the sync time and let'er rip. It chugs along in the background and syncs all my files between my two workstations and my server. Everything stays in my network, in my control and on 3 machines. It's not perfect, but it has been syncing my files for the past 3 years.

bb

sshfs & central file server

Ryan Roth's picture

I have using a central file server and then mounting my folders over sshfs using the FQDN. This allows me to have full access to all of my data on my server wherever I go as long I have internet. I modified this recently to sync my Documents folder with my server rather than mounting it so that even off line I could access the data. I wrote a couple of scripts for NetworkManager and it will auto sync my data whenever it has a connection.

Unison

Anonymous's picture

I have solved all your problems with simple tool called Unison.

sudo aptitude install unison

openvpn

Anonymous's picture

I don't see the point for synchronization if you have a (central) file server. I have a linode where I put my data and use openvpn on all my machines. The linode has webdav functionality protected by a selfsigned browser cert (SSLVerifyClient require and SSLVerifyDepth 1).

I do see a point for using tools like subversion, git and the likes when you work with multiple people on the same (source) files.

Lastly, I cannot grasp why people *voluntarily* hand out their data to strangers (companies) and blindly trust those companies. Especially the audience reading this type of magazine should know better and, more importantly, can do better.

Just my 2c's worth.

DIY with webDAV & Subversion

Paul Archer's picture

It's entirely possible to build something like dropbox pretty easily using webDAV & Subversion:
http://svnbook.red-bean.com/en/1.5/svn-book.html#svn.webdav

OSS, mixed-source, and the cloud

bren haes's picture

this is a bit lengthy, but bear with me as I make a few points distinguishing 100% OSS from mixed-sourced software.

Companies behind Dropbox and other apps (like zotero) may be releasing plugins for browsers, file managers, etc as open source, but the services stay closed. I'd almost rather they choose one or the other: freedom/open or proprietary/closed. This practice of hiding proprietary applications behind an open plugin is deceptive. the plugin download page may have "open source" written all over it, but never is it mentioned that the core of this system is closed. Still Mr. Bartholomew goes to great lengths talking about how to install and use dropbox, and how it has helped remedy some of his problems. He is promoting it. Even though, he does mention that the core/backend is not open. It comes as almost a side note, and too close to the end, and after he has spent almost 1/2 the article praising dropbox. But still at the end of the article he says its incomplete, and he doesn't trust it. If that's the truth, then why did he bring our attention to it? why not, just post a feature list, and ask if anyone knows of a 100% OSS product that fits it? Articles like this are not fostering open source.

Any part of a system or application that is closed, is a component that could vanish or become incompatible at anytime. then to get access to you pictures, documents, source code, helper scripts, recipes, addressbooks, etc... you must do whatever it is the corporation want to get access to YOUR data. the corp might just want you to use their new client, but that might have bugs in it that wont get fixed for a while, or it may be to processor intensive for you to run on your computer. The corp may start charging for what was once a free service. The corp may decide that your data, in some part, violates a copyright and simply delete it, even if it doesn't.

This practice of hiding proprietary applications behind an open plugin

We need to encourage the truly 100% open alternatives out there, and make them the best available. use the 100% open software, give feedback, help code, donate cool graphics, and stable plugins for browsers. Do this in you home, at your church, in your place of work. Talk about them at parties, and other community events.

--

I just read Spideroak's comment. thank you for letting us all know about that OSS option. i'll look into it tonight.

I've also used unison in the past, and I am learning about its descendant harmony/boomerang. I think if either of these can be combined with a version control system like bzr or git with some good plugins for access and conflict resolution, then we'd really have something.

Freedom -vs- Utility

Daniel Bartholomew's picture

Thank you for your comments.

You are correct in your assessment of open source applications that are tied to proprietary services: The proprietary bit could disappear at any time, and any data that is tied to it along with it. User beware.

That said, am I promoting Dropbox? Yes. Absolutely. The reason is because I don't know of any other application+service that does what Dropbox does that is as easy to setup and use. I find Dropbox useful in certain situations, and I'm betting that others will too, which is why I wrote the article. I did not write the article to foster or promote open source, as heretical as that may sound. The purpose of the article was to put forth some solutions to the problem of keeping data in sync across multiple computers.

Do I like that the backend is proprietary? No. Do I wish that there was some fully open source version that I could use on my own server and desktop computers? Yes. But those concerns are secondary to the purpose of the article.

Since you brought it up, are there any truly open alternatives to Dropbox? I don't know of any. Spideroak has been mentioned, but as far as I can see they have only released the code to certain tools and libraries that they use. This is at best the same effort that Evenflow has done in releasing the source to their Nautilus plugin. The backend services in both cases are controlled by their respective companies and you can lose your data at any time at their discretion. Make backups!

Does this lack of open alternatives mean we should stay away from the imperfect solutions that do exist and tell everyone else to stay away as well? No. In my mind an imperfect solution is better than no solution at all.

If anything, the existence of these services should inspire developers to come up with truly open alternatives. This is what happened in the case of Apache, MySQL, Samba, and a host of other open source projects. But until those alternatives arrive or are as easy to actually set up and use, there is nothing wrong in my mind with using the ones that exist today, even if they are flawed.

Of course, I may be completely wrong about there not being any completely open alternatives. If so, I welcome it. Please, bring out the alternatives! If they can pass my 10 minute up-and-running test, I might even use them. I'll promote them too. I'm already going to look into Spideroak, and I'm very willing to look at others.

making do

bren haes's picture

Thank you for the clairification. I'm very gun-ho about OSS (as if you can't tell), and I sometimes forget that not all solutions are avail as 100% OSS. I did research Spideroak (a bit), and discovered the same thing you did, and I was disappointed. Apparently, companies who provide this kind of solution, feel there is a need to hold back. Where I work, it is the perceived value of the company to investors/shareholders that influences management to disapprove of releasing products as OSS. I believe that they are afraid of the competition getting ahold of their 'capital' and beating them at their own game. However, they forget that by using our code the competition is required to release their addtions to the community, so we can all benfit. This is the next battleground for OSS. So, if you, dear reader, work for a company like this, I encourage you to keep a vocal and constant encouragment on decision-makers that everyone will benefit from opensource.

Another alternative?

Jon Chamberlain's picture

I like DropBox a lot - its very polished and reliable (so far). As you say, the problem is that your data ends up behind a proprietary 'lock and key'. The other thing for me is the limitation on 2GB, although being free it is understandable. ;-)

As well as DropBox, I'm also using Jungledisk, which is a front-end to the Amazon S3 cloud. In this case there is no limit on size, although you do pay Amazon for storage - I paid the grand total of 24 cents last month!

I should point out I'm only using it as a backup target for multiple machines, not to keep them all in sync.

Cheers,
Jon

File Server Connections

Anonymous's picture
> My file server, for example, is stuck away in a
> closet, with the keyboard, monitor, and mouse routed
> out to a little desk that sits outside the closet.
> Getting to the USB ports on the back of the server
> is not easy.

Well, then I'd suggest to unplug keyboard, monitor, and mouse, and then turn your server around. Nobody needs keyboard, monitor, mouse, and CD/DVD drives in a file server, anyway.

File server's purpose?

Anonymous's picture

Yes, either turn the machine around or use it as a real file server! My file server is actually a NAS running FreeNAS. I rsync to it every day so it always has the latest versions of my files. And of course, since it's NAS, it is accessible by all machines on my network.

So I guess I don't understand why you would ever need to use a USB stick to grab files off it unless for some reason you're not using it like a file server? There are still the versioning problems you mentioned and also when you're away from home, your documents aren't accessible by default but you could certainly open up your NAS so that they are!

File Servers

Daniel Bartholomew's picture

The USB key is not for pulling files off of the server, it is to access files that aren't on the server and that I don't trust to transfer over the network. There are some things I don't trust ssh for.

These files include things like my private pgp keys: basically stuff that I never want to reveal to anyone. That's why I keep them on my USB key inside an encrypted Truecrypt volume.

and finally?

Ionut's picture

You just gave some ideas about how to do the sync between *two* computers, but finally, you didn't give a final conclusion how the method works for the case with many computers as you said at the beginning, that you have. You discussed a lot about an online service (there are many of them), solution which doesn't work for people with a lot of files and limited bandwidth. Regarding your solution with a repo, I agree that I am using something like that, but if you have binary files (and in my home directory I have many), this doesn't work also well. Thanks anyway.

Thanks for the comment.

Daniel Bartholomew's picture

My rsync and scp examples only used two computers because those synchronization methods are designed to work on a primary to secondary computer basis. What I mean by this is you have the primary source and then you have a secondary source (or multiple secondary sources). The secondaries are kept in sync with the primary, not the other way around. If you try to have more than one primary you will quickly run into trouble and you will lose data. These two methods can work well in a two computer setup, but not so well in a three or more computer setup.

I used two computers in the git example because it is easier to explain. To expand to 'N' numbers of computers, you just follow the same steps.

Once the repository is initialized, and you've cloned from computer one to computer two, do the same thing on computer three and computer four (and on all other computers) that you did on computer two. The trick is to remember to do a git pull from every computer you have made changes on whenever you sit down to make new changes (wherever you happen to be sitting). Like I said in the article:

I could pull from every other computer every time I sit down in front of one, but that is tedious and disruptive to my work flow.

The key word I want to point out is "every". In order to make git work across 2, 3, 4, or more computers, you have to make sure to pull from all others that you've made changes on.

That is why I like Dropbox. You just install it on every computer that you want to synchronize, and it handles everything. You can make changes to any of your documents in the Dropbox folder on any computer and Dropbox will keep things in sync. That's how I am able to keep my project files in sync across all of the computers I mentioned. Sorry if that wasn't clear enough in the article.

Thanks.

Synchronizing your life

Anonymous's picture

Plone has versioning features. And it is a cms, It has a filesystem olugin.

you dont need nautilus

Anonymous's picture

i use awesome as my wm, thunar and the console. the solution: for install and after install..

~/.dropbox-dist :: ./dropboxd

handmade :)

Jose luis's picture

You can use incron to control when you change any file on a directory, on any change incron would execute a script which:
- exec git commit
- then make ssh to each machine and make a pull (authenticating with a rsa_key or dsa_key)

handmade but works :)

Spideroak

markba's picture

"...the biggest issue I have with Dropbox is that it is not fully open source"
You can use Spideroak: https://spideroak.com/
- completely Open Source
- client-side encryption to ensure privacy
- first 2 GB is free

Thanks for the tip!

Daniel Bartholomew's picture

I'll have to check them out and see how they compare.

Thanks!

ha,我爱马林

zunumi's picture

ha,我爱马林

I use a TrueCrypt volume on

Jason's picture

I use a TrueCrypt volume on Dropbox for a little extra security...

Truecrypt

Daniel Bartholomew's picture

I use truecrypt for my truly sensitive data. It's a good way to keep things secure.

unison

G's picture

I've found unison to be a nice tool to stay in sync between my laptop and desktop. But it's a simple scheme with 2 machines.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix