Quantcast
Username/Email:  Password: 

Synchronizing FTP Files with Perl

The author explains ftpsync, his script that automatically uploads changed files and directories from a local site.

Originally, this article was stored on a
minimalistic
web site
. I, along with the thousands of users who use that
site, use FTP to maintain the content. This used to mean that after
making changes to one or more of the pages on the site, I had to go
to my FTP client and upload the changed files.Here is where Perl's laziness took over. I wanted a script
that could automatically figure out what needed to be uploaded from
my local copy of the site, as well as what needed to be deleted. I
called this script ftpsync, and I present it here. As customary,
you can
download
the complete script
.Beginning the ScriptAt the start, I summon strict warnings to help me catch
errors in my code. You should always do the same in your scripts,
as errors and warnings end up saving a lot of debugging time should
you, say, misspell a variable name somewhere in your 5,000 lines of
code. Don't panic, this script is much shorter.Net::FTP is invoked in order to provide the FTP client with
the functionality this script requires. Mainly, the client needs to
traverse a hierarchy of files and directories in the FTP server. To
do the same in my local disk, File::Find is used. File::Find makes
writing code that traverses a filesystem a snap. I will discuss how
a bit later.In order to easily produce the script's documentation for
futures users, Pod::Usage is invoked. This module converts and
displays the POD documentation from the script. You should always
write POD documentation for your scripts. I then import Getopt::Std
to parse the command-line options.

                1   #!/usr/bin/perl
                2
                3   # This script is (c) 2002 Luis E. Muñoz, All Rights Reserved
                4   # This code can be used under the same terms as Perl itself.
                5   # with absolutely NO WARRANTY. Use at your own risk.
                6
                7   use strict;
                8   use warnings;
                9   use Net::FTP;
               10   use File::Find;
               11   use Pod::Usage;
               12   use Getopt::Std;

At line 14, I tell Perl about the variables where the
command-line options are going to be stored by Getopt::Std. This
module actually has two ways of returning the command-line
information: a hash and a bunch of separate variables named after
the options they represent. I tend to prefer the later method,
because it makes it easier to spot the places where the variables
are being used. But this is mostly a matter of personal
taste.

               14   use vars qw($opt_s $opt_k $opt_u $opt_l $opt_p $opt_r $opt_h
               15               $opt_d $opt_P $opt_i $opt_o);
               16
               17   getopts('i:o:l:s:u:p:r:hkvdP');

The call to getopts() at line 17 actually tells Getopt::Std
about which options to look for and whether those options require
an argument. Following an option letter with a colon character (:)
tells the module to take one argument for the corresponding
option.Had I wanted to leave the result of parsing the command-line
options in a hash instead of a lot of $opt_ vars, I could have used
the following code:

               14   use vars qw(%my_opts);
               15
               16   getopts('i:o:l:s:u:p:r:hkvdP', \%my_opts);

Usage Messages and Default ValuesIn my scripts, I reserve the -h option to provide some form
of help, as is the custom of many other Perl programmers. The if
block on lines 19 through 23 arranges for a call to pod2usage with
the proper arguments when the -h option is specified in the command
line.

               19   if ($opt_h)
               20   {
               21       pod2usage({-exitval => 2,
               22                  -verbose => 2});
               23   }

pod2usage is defined by Pod::Usage, and as shown on line 21,
it causes the script to terminate after dumping the complete
documentation and returns a value of 2 to the operating system. You
also can have the script dump only its SYNOPSIS section, which
includes custom messages along with documentation and a variety of
other things. Please consult the documentation of Pod::usage for
complete information.

               25   $opt_s ||= 'localhost';
               26   $opt_u ||= 'anonymous';
               27   $opt_p ||= 'someuser@';
               28   $opt_r ||= '/';
               29   $opt_l ||= '.';
               30   $opt_o ||= 0;
               31
               32   $opt_i = qr/$opt_i/ if $opt_i;

Lines 25 through 30 define some sensible default values for
the command-line options not specified by the user. I do so neatly
with the ||= operator. This operator evaluates its left-hand side
or lvalue. A false value causes its right-hand value or rvalue to
be assigned to the lvalue. It works like this: if the command-line
option is not specified, its corresponding $opt_ variable will be
undef, which evaluates to false. It is thus set to whatever is on
the right hand of the ||= operator. If the command line option is
specified, the $opt_ will evaluate to true, and its value left
unchanged.At line 32, I ask Perl to compile whatever is the argument to
-i that is stored in $opt_i as a regular expression or rule, as
they are beginning to be called. This is done with the qr//
operator. You can learn more about this operator by looking at the
documentation for perlop.A Brief Discussion of the HowIn order to execute the synchronization, the script collects
information from two filesystem trees, one local and one remote. I
chose to collect all the information from both trees and perform
the synchronization later. In my experience, this translates to
cleaner and more maintainable code than if I tried to do everything
in one pass. I'll store the data about the remote and local trees
in two hashes, which I declare and initialize at lines 36 and
37.

               36   my %rem = ();
               37   my %loc = ();

Once the information is safely captured in the corresponding
hashes, the code can focus on the differences and take the
appropriate actions.Finding Out about the Local FilesWhen matching the contents of the remote FTP site with those
on the local copy, it is important to compare apples to apples.
Because the filesystem layout is not always straightforward, I
chose to compare relative pathnames. Therefore, before looking for
the local files, I do a chdir() to the path the
user specified with the -l option, as seen on line 44.

               44   chdir $opt_l or die "Cannot change dir to $opt_l:   $!\n";

After this step, I use find(), as provided by File::Find, to
traverse the local tree. Below is the code from line 46 to 69. I
will explain this code bit by bit, I promise.

               46   find(
               47        {
               48            no_chdir       => 1,
               49            follow         => 0,   # No symlinks, please
               50            wanted         => sub
               51            {
               52                return if $File::Find::name eq '.';
               53                $File::Find::name =~ s!^\./!!;
               54                if ($opt_i and $File::Find::name =~ m/$opt_i/)
               55                {
               56                    print "local: IGNORING $File::Find::name\n"
               57                    return;
               58                }
               59                my $r = $loc{$File::Find::name} =
               60                {
               61                    mdtm => (stat($File::Find::name))[9],
               62                    size => (stat(_))[7],
               63                    type => -f _ ? 'f' : -d _ ? 'd'
               64                        : -l $File::Find::name ? 'l' : '?',
               65                };
               66                print "local: adding $File::Find::name (",
               67                "$r->{mdtm}, $r->{size}, $r->{type})\n" if $opt
               68            },
               69        }, '.' );

At line 48, I tell find() that I don't want the script to
chdir() into each directory along the way using the no_chdir
argument. I want to be absolutely sure that I am seeing path names
that are consistent and relative to whatever the user specified
with the -l option.At line 49, I prevent following symbolic links with the
follow argument. I do not want to deal with them mainly because of
the infinite loops they can introduce when pointing upwards in the
filesystem. I do not use symlinks in my web site, so I see no
problem with this.At line 50, finally, there's some interesting code. find()
accepts a user-supplied function that will be called for each
filesystem object found. I specify it using the wanted argument,
although there are various ways to invoke this function. Please see
perldoc File::Find for more information.The wanted argument requires a reference to a sub, which I
define in lines 50 through 68. Line 52 makes sure that I do not
include the current directory (.) in the collection. find() passes
the complete path name of each filesystem object in the
$File::Find::name scalar.The find() function, when used as shown, produces relative
path names, such as ./dir/foo.htm. However, I consider it easier to
work with names such as dir/foo.htm. Both are legal forms of
relative paths. To accomplish this, line 53 removes the leading ./
from all the pathnames.After this is done, a check is performed on lines 54 through
58 to see if the current pathname matches the ignored regular
expression specified with -i. This is a matter of a simple pattern
match that exits the sub { .. } if a match is detected, after
providing an informational message when the -v option is
specified.By line 59 all tests have passed, so we should collect the
information from this file. I'll collect three elements of
information: the modification time or mdtm, the file size and the
file type. The first two are collected with the stat() call, which
returns a list of values. Notice that I use the special stat(_)
construct in line 62.It turns out that stat() is a somewhat expensive operation,
so it is better to do as few of them as possible. When Perl sees
bare _ as an argument to a function or operator
that would cause a call to stat(), it uses the results of the last
one performed. Therefore, the above construct causes only a single
stat() call, even when Perl's stat() function is used more than
once. The same applies to the -x file operators that I use in lines
63 and 64 to assign a type to this file.All of this information, kept in a reference to an anonymous
hash, is stored in %loc. Additionally, a copy of the last entry is
kept in the $r lexical scalar to provide the informational message
in lines 66 and 67, if the -v option was specified. So, when find()
calls this sub for each object found in the filesystem, %loc will
be populated with the data for the whole tree.Connecting to the FTP SiteI chose to connect to the FTP site after collecting the local
file information for various reasons. The first and most important
is that, as a rule of thumb, whenever you write network-client
code, keep your impact on the server to a minimum. In this way, I
spare the connection of the time it takes to collect the local
information.

               74   my $ftp = new Net::FTP ($opt_s,
               75                           Debug           => $opt_d,
               76                           Passive         => $opt_P,
               77                           );
               78
               79   die "Failed to connect to server '$opt_s': $!\n" unless $ftp
               80   die "Failed to login as $opt_u\n" unless $ftp->login($opt_u,
               81   die "Cannot change directory to $opt_r\n" unless $ftp->cwd($
               82   warn "Failed to set binary mode\n" unless $ftp->binary();
               83
               84   print "connected\n" if $opt_v;

Lines 74 through 77 handle the connection to the server,
which in case of failure trigger the error message from line 79. At
line 80, authentication with the credentials passed in the command
line is attempted, again producing a fatal error in case of
failure. A number of parameters for the FTP connection can be
controlled by passing arguments to the new() method. The definitive
information is in the module's documentation.At line 81 we change the remote directory to whatever the
user supplied. Finally, at line 82, we request binary mode for any
subsequent transfers. Line 84 prints a progress message if the -v
option is supplied in the command line.Finding Out about Remote FilesCollecting the same data from the remote files through FTP is
a more difficult task, because there's no equivalent to the stat()
call through FTP. Therefore, I resort to parsing the result of a
directory listing. I chose to support a UNIX-style directory
listing, as it is quite and also because it is what my FTP server
uses.Basically, the code from line 86 through line 132 emulates
the work done by File::Find, which was shown previously with a
recursive function, scan_ftp. This function takes a Net::FTP
object, a pathname to work with and a reference to %rem to register
the relevant information. I'll comment on each chunk of the code
below.

               86   sub scan_ftp
               87   {
               88       my $ftp     = shift;
               89       my $path    = shift;
               90       my $rrem    = shift;
               91
               92       my $rdir = $ftp->dir($path);
               93
               94       return unless $rdir and @$rdir;
               95
               96       for my $f (@$rdir)
               97       {
               98           next if $f =~ m/^d.+\s\.\.?$/;
               99
              100           my $n = (split(/\s+/, $f, 9))[8];
              101           next unless defined $n;
              102
              103           my $name = '';
              104           $name = $path . '/' if $path;
              105           $name .= $n;
              106
              107           if ($opt_i and $name =~ m/$opt_i/)
              108           {
              109               print "ftp: IGNORING $name\n" if $opt_d;
              110               next;
              111           }
              112
              113           next if exists $rrem->{$name};
              114
              115           my $mdtm = ($ftp->mdtm($name) || 0) + $opt_o;
              116           my $size = $ftp->size($name) || 0;
              117           my $type = substr($f, 0, 1);
              118
              119           $type =~ s/-/f/;
              120
              121           warn "ftp: adding $name ($mdtm, $size, $type)\n" if 
              122
              123           $rrem->{$name} =
              124           {
              125               mdtm => $mdtm,
              126               size => $size,
              127               type => $type,
              128           };
              129
              130           scan_ftp($ftp, $name, $rrem) if $type eq 'd';
              131       }
              132   }
              133
              134   scan_ftp($ftp, '', \%rem);

At line 92 I request a directory listing of the assigned
$path, which is stored as a reference to a list in $rdir. Each
entry in @$rdir contains one line of output from the FTP server.
With line 94, we abandon early in case we get an empty or invalid
answer.Between lines 96 and 132 is the loop where each line of the
command output is analyzed. The test at line 98 makes sure no time
is wasted in analyzing the current (.) and parent (..) directories.
Later, in line 100, an attempt is made to obtain the file name,
typically at the ninth column of output. If no name can be found,
the entry is skipped in line 101. As you can see, the split at line
100 limits the number of columns returned. I did this in case
someday I need to parse symlinks too, which are reported by many
servers as Foo -> Bar.In lines 103 through 105, I construct the complete pathname
using the $path passed to this function along with the recently
parsed directory entry. The if block starting at line 107 checks
for an entry that should be ignored. If so, lines 109 and 110 print
a suitable message and then skip to the next entry.Line 113 checks if we already have seen this entry. This
might happen if the remote FTP tree has a loop I couldn't detect. I
use the exists construct as a costume to avoid autovivifying an
entry in the %rem hash, although in this case it could have been
removed.Lines 115 to 117 are responsible for capturing the relevant
information for each item found in this phase. We use the MDTM FTP
command to obtain the modification time, the SIZE command to find
the size of the file in octets and, finally, the first letter of
the directory listing to guess the type. The MDTM method of
Net::FTP automatically translates the resulting date to the number
of seconds since the Epoch UTC, the same as other time-related
functions, such as time() and stat() return.Here, however, I allow for the -o argument, a time offset, to
be added to the result. This allows easy correction of time skew.
It also has proven useful to compensate for time-zone differences,
because sometimes the time returned by the FTP server is not in
UTC.I probably should do an MDTM only on files and not on
directories or any other kind of object. But I chose to leave it as
is because I want to check with various FTP servers, to see if they
return a meaningful timestamp for directories and other objects. In
any case, this code could be rewritten to avoid the useless MDTM
seen below.

              115           my $type = substr($f, 0, 1);
              116           my $mdtm = ($type eq 'f' ? $ftp->mdtm($name) || 0 : 
              117           my $size = $ftp->size($name) || 0;

Lines 123 through 128 store the information gathered about
this entry. In line 130, the recursive call is made if the current
entry is a directory. At line 134, I start the recursion.The SynchronizationOnce all the data has been nicely collected in %loc and %rem,
all that is left is to deal with the differences. The simple
synchronization I use uploads only the local files that are missing
or are too old on the remote side, and then remote files that are
not present locally are deleted. The code below takes care of the
uploading.

              138   for my $l (sort { length($a) <=> length($b) } keys %loc)
              139   {
              140       warn "Symbolic link $l not supported\n"
              141           if $loc{$l}->{type} eq 'l';
              142
              143       if ($loc{$l}->{type} eq 'd')
              144       {
              145           next if exists $rem{$l};
              146           print "$l dir missing in the FTP repository\n" if $o
              147           $opt_k ? print "MKDIR $l\n" : $ftp->mkdir($l)
              148               or die "Failed to MKDIR $l\n";
              149       }
              150       else
              151       {
              152           next if exists $rem{$l} and $rem{$l}->{mdtm} >= $loc
              153           print "$l file missing or older in the FTP repositor
              154               if $opt_v;
              155           $opt_k ? print "PUT $l $l\n" : $ftp->put($l, $l)
              156               or die "Failed to PUT $l\n";
              157       }
              158   }

The loop at line 138 iterates through all the local files
that must be matched. I do this from the shortest pathname to the
longest, so we can create any required directories in order. Lines
140 and 141 produce an adequate warning and skip any symbolic links
that might have been found.The code from line 143 through line 157 is worth some
explanation. The script expects to see two different classes of
filesystem objects, files and directories. In the case of a
directory, handled in lines 143 to 149, any remote object with the
same name causes the entry to be skipped. Otherwise, a proper
message is produced in lines 146 and 147, and if needed, the remote
directory is created. In case of failure, the script simply die()s
at line 148 to avoid causing more trouble.If the local object is not a directory, the skip condition
also states that the remote object is older. This check is done in
line 151. Then, similar code is executed from lines 152 to 157,
using the put() method of Net::FTP to upload the missing file. The
entire process is repeated for every local object.Other sets of tests are possible, but I chose to leave them
out. For instance, a local file could have the same name that a
remote directory has. In this case, it is not clear to me what to
do: remove the remote directory and upload the local file?; die()?
I'll leave that decision as an exercise for the reader.Next, the remote files are analyzed. In this case, the only
possible action is to delete the remote files if their local
counterpart is not present. The loop beginning in line 162 ensures
that the files are scanned in the exact reverse order of the loop
for the previous code snippet. This allows the DELE FTP command,
issued by the delete() method of the Net::FTP module, to remove all
files and empty directories automatically.

              162   for my $r (sort { length($b) <=> length($a) } keys %rem)
              163   {
              164       if ($rem{$r}->{type} eq 'l')
              165       {
              166           warn "Symbolic link $r not supported\n";
              167           next;
              168       }
              169
              170       next if exists $loc{$r};
              171
              172       print "$r file missing locally\n" if $opt_v;
              173       $opt_k ? print "DELETE $r\n" : $ftp->delete($r)
              174           or die "Failed to DELETE $r\n";
              175   }

As with local files, lines 164 through 168 issue an adequate
message for symlinks and skip them. Line 170 skips any remote file
whose local counterpart is in %loc. In lines 172 through 174, a
message is printed, and the FTP command to delete the file is
either echoed as an informational message or carried out. If an
error is detected, the script die()s preventively.ConclusionsThis script does not try to be a complete synchronization
solution, and certain cases are not handled at all. However, this
tool has served my needs well. Now, I can hack away in my local
copy of my web pages, and later simply run a command such
as:

            bash-2.05a$ ./perl/ftpsync -s my.ftp -u lem -p 37337 \
               -l /my/local/site -i 'CVS|(^\.)|/\.|(~$)' -v -o 14400
            connected
            perl/index-en.htm file missing or older in the FTP repository
            create_this dir missing in the FTP repository
            Untitled.gif file missing locally

and have an updated web site a couple of minutes later,
without having to remember which files I touched. The regexp I feed
to -i should prevent any CVS control files, which begins with a
dot, and any Emacs backups from being touched at all. I also
specify an offset of 14,400 seconds (4 hours) to compensate for the
fact that this FTP server is running in my local time zone instead
of in UTC, as it should be.

email: luismunoz@cpan.org

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

When I try to run this

Anonymous's picture

When I try to run this script, I got this error:

/var# /usr/bin/perl /var/backupfiles.pl
"my" variable $l masks earlier declaration in same scope at /var/backupfiles.pl line 170.
"my" variable $r masks earlier declaration in same scope at /var/backupfiles.pl line 218.
Global symbol "$type" requires explicit package name at /var/backupfiles.pl line 124.
syntax error at /var/backupfiles.pl line 126, near "/f/;"
Execution of /var/backupfiles.pl aborted due to compilation errors.

I hope someone can help me with this.

Thanks

warning - :Cannot cwd to ....svn: Permission denied

John Plumridge's picture

Hi, a very useful script, and neat.
I got something like this warning, when i accidently left a .svn dir in a duplicated image directory in my site folder, and it halted the process of transfer options = ( -cpv). SO i deleted these .svn dir on th elocal machine, and checked the remote for their absence, but the warning, ":Cannot cwd to /public_html/_img/icons/.svn/public_html/_img/icons/.svn: Permission denied" persists! these .svn directories dont exist any longer.
Why is this? Please email me if you know!
regards,
John

Just use lftp

Martin Guy's picture

Thanks for the perl tutorial, but there's already a utility to do this: the "mirror" command of lftp (or "mirror -R" if you want to upload).

M

Bug report: Doesn't like directories with spaces

David's picture

ftpsync seems to have a problem handling directory names like "test 1" and "test 2" with embedded spaces in the names. It will always mark all files in these directories as needing to be uploaded.

Unfortunately, my Perl-fu is insufficient to post a fix for this.

feature

Anonymous's picture

Some feature I'd like to see:
- Autodetection of the time difference. That is best done by uploading a dummy file then reading the timestamp.

Implemented some new useful features

Peter Orvos's picture

Dear All,

I found this utility useful, however I needed a synchronizer over FTP that could download files as necessary. Therefore I implemented the -D switch for Download mode.

I had to work with an FTP server running on windows, that was neither supported by this script, until now... :)

And the last utility I implemented was the -R n option, where n represents the maximum depth of directory recursing, 0 for no recurse, and unlimited if omitted.

Use the code if You find it useful:
http://home.mit.bme.hu/~orvos/progs/ftpsync

Regards.

FTPsync

Jonathan Hipkiss's picture

Just downloaded this, it's excellent, does exactly what it says on the tin!
Thanks.

Other use aside from web sync

Yaten Kou's picture

Hi again!
I forgot to share my experience on where I used Luis' script ^_^

I didn't actually use it for synchronizing my website, instead, I used it to synchronize my backups.

I just thought of sharing it coz someone might have the same needs as I am, it might save them some time.

Aside from Luis' scripts, I made 2 shell scripts, backup.sh and task.sh, both can be found at :
Mirror #1
Mirror #2

Basically, my objective is to backup important files on our server farm and send it to our backup server. Something like:
web01
web02
db01
db02
mail01
etc etc
then a backup server where all the files will be synced at.

each server, I have 3 files, the two files I mentioned above and Luis' ftpsync.pl

task.sh basically does the listed task:

#!/bin/bash

backupd=./backup.sh # path to my backupd script

$backupd -y -s /data/scripts -d default "*" # backup all files under /data/scripts
$backupd -y -s /etc -d default "*" # backup all files under etc
$backupd -y -s /usr/local/apache -d default "conf/" # backup conf directory of /usr/local/apache
$backupd -y -s /var/qmail -d default "alias/ bin/ boot/ control/ scripts/ users/" # backup selected directories of /var/qmail
./ftpsync -v -s [backup_server] -u [username] -p [password] -r [remote_dir] -l [local_dir]

exit

task.sh is usually being invoked through cron

backup.sh is the one that actually does the compressing, download the code to see how it works in detail. Basically, it just compress the list of files and checks the destination if it already exist using md5.

After saving all the backup on a certain backup directory, the backup directory is then synced with the backup server using Luis' ftpsync.pl.

That's just it ^_^ Just thought of giving back a little of something to the community than being the beneficiary all my life. ^_^

Best regards,

yaten

Very useful!

Yaten Kou's picture

Hi!
Great scripts! very useful! I'm using them now ^_^
Thanks to Luis Muñoz for the great script!
Thanks to Peter Orvos for the additional options!

I've also added a simple switch (-e), the script now defaults to 'no erase' so as to prevent accidents of deleting files. The script won't delete files locally or remotely without specifying the -e option.

Here's the link to the new script which contains Luis and Peter's scripts plus my -e option.

Mirror #1
Mirror #2

Best regards!

yaten

last version

M. Maahn's picture

Just a repost of the last version with the -e option, since the server is down:

#################################################################

#!/usr/bin/perl

# This script is (c) 2002 Luis E. Munoz, All Rights Reserved, original author
# (c) 2005 Peter Orvos, All Rights Reserved, added options D and R
# (c) 2006 Yaten Kou, All Rights Reserved, added option e and defaults no delete to avoid accidents
# This code can be used under the same terms as Perl itself. It comes
# with absolutely NO WARRANTY. Use at your own risk.

# see ./ftpsync -h for help

use strict;
use warnings;
use Net::FTP;
use File::Find;
use Pod::Usage;
use Getopt::Std;

use vars qw($opt_s $opt_k $opt_u $opt_l $opt_p $opt_r $opt_h $opt_v
$opt_d $opt_P $opt_i $opt_o $opt_D $opt_R $opt_e);

getopts('i:o:l:s:u:p:r:R:hkvdPDe');

if ($opt_h)
{
pod2usage({-exitval => 2,
-verbose => 2});
}
# Defaults are set here
$opt_s ||= 'localhost';
$opt_u ||= 'anonymous';
$opt_p ||= 'someuser@';
$opt_r ||= '/';
$opt_l ||= '.';
$opt_o ||= 0;
$opt_R = -1 if !defined $opt_R;

$opt_i = qr/$opt_i/ if $opt_i;

$|++; # Autoflush STDIN

my %rem = ();
my %loc = ();

print "Using time offset of $opt_o seconds\n" if $opt_v and $opt_o;

# Phase 0: Scan local path and see what we
# have

chdir $opt_l or die "Cannot change dir to $opt_l: $!\n";

find(
{
no_chdir => 1,
follow => 0, # No symlinks, please
wanted => sub
{
return if $File::Find::name eq '.';
$File::Find::name =~ s!^\./!!;
if ($opt_i and $File::Find::name =~ m/$opt_i/)
{
print "local: IGNORING $File::Find::name\n" if $opt_d;
return;
}
stat($File::Find::name);
my $type = -f _ ? 'f' : -d _ ? 'd' : -l $File::Find::name ? 'l' : '?';
my @dirs = split /\//, $File::Find::name if $opt_R >= 0;
if ($opt_R >= 0 && $opt_R + ($type eq 'd' ? 0 : 1) < @dirs) {
print "local: IGNORING(depth) $File::Find::name\n" if $opt_d;
return;
}
my $r = $loc{$File::Find::name} =
{
mdtm => (stat(_))[9],
size => (stat(_))[7],
type => $type,
};
print "local: adding $File::Find::name (",
"$r->{mdtm}, $r->{size}, $r->{type})\n" if $opt_d;
},
}, '.' );

# Phase 1: Build a representation of what ss
# in the remote site

my $ftp = new Net::FTP ($opt_s,
Debug => $opt_d,
Passive => $opt_P,
);

die "Failed to connect to server '$opt_s': $!\n" unless $ftp;
die "Failed to login as $opt_u\n" unless $ftp->login($opt_u, $opt_p);
die "Cannot change directory to $opt_r\n" unless $ftp->cwd($opt_r);
warn "Failed to set binary mode\n" unless $ftp->binary();

print "connected\n" if $opt_v;

sub scan_ftp
{
my $ftp = shift;
my $path = shift;
my $rrem = shift;
my $mdepth = shift;

my $rdir = length($path) ? $ftp->dir($path) : $ftp->dir();

return unless $rdir and @$rdir;

for my $f (@$rdir)
{
next if $f =~ m/^d.+\s\.\.?$/;

my @line = split(/\s+/, $f, 9);
my $n = (@line == 4) ? $line[3] : $line[8]; # Compatibility with windows FTP
next unless defined $n;

my $name = '';
$name = $path . '/' if $path;
$name .= $n;

if ($opt_i and $name =~ m/$opt_i/)
{
print "ftp: IGNORING $name\n" if $opt_d;
next;
}

next if exists $rrem->{$name};

my $mdtm = ($ftp->mdtm($name) || 0) + $opt_o;
my $size = $ftp->size($name) || 0;
my $type = (@line == 4) ? ($line[2] =~/\/i ? 'd' : 'f') : substr($f, 0, 1); # Compatibility with windows FTP

$type =~ s/-/f/;

warn "ftp: adding $name ($mdtm, $size, $type)\n" if $opt_d;

$rrem->{$name} =
{
mdtm => $mdtm,
size => $size,
type => $type,
} if $type ne 'd' || $mdepth != 0;

scan_ftp($ftp, $name, $rrem, $mdepth-1) if $type eq 'd' && $mdepth != 0;
}
}

scan_ftp($ftp, '', \%rem, $opt_R);

if ($opt_D) {
# Phase 2: Download "missing files"
for my $l (sort { length($a) <=> length($b) } keys %rem)
{
warn "Symbolic link $l not supported\n"
if $rem{$l}->{type} eq 'l';

if ($rem{$l}->{type} eq 'd')
{
next if exists $loc{$l};
print "$l dir missing in the local repository\n" if $opt_v;
$opt_k ? print "mkdir $l\n" : mkdir($l)
or die "Failed to MKDIR $l\n";
}
else
{
next if exists $loc{$l} and $rem{$l}->{mdtm} <= $loc{$l}->{mdtm};
print "$l file missing or older in the local repository\n"
if $opt_v;
$opt_k ? print "GET $l $l\n" : $ftp->get($l, $l)
or die "Failed to GET $l\n";
}
}
}
else
{
# Phase 2: Upload "missing files"
for my $l (sort { length($a) <=> length($b) } keys %loc)
{
warn "Symbolic link $l not supported\n"
if $loc{$l}->{type} eq 'l';

if ($loc{$l}->{type} eq 'd')
{
next if exists $rem{$l};
print "$l dir missing in the FTP repository\n" if $opt_v;
$opt_k ? print "MKDIR $l\n" : $ftp->mkdir($l)
or die "Failed to MKDIR $l\n";
}
else
{
next if exists $rem{$l} and $rem{$l}->{mdtm} >= $loc{$l}->{mdtm};
print "$l file missing or older in the FTP repository\n"
if $opt_v;
$opt_k ? print "PUT $l $l\n" : $ftp->put($l, $l)
or die "Failed to PUT $l\n";
}
}
}
# Phase 3: Delete missing files

exit if ! $opt_e;
if ($opt_D) {
for my $r (sort { length($b) <=> length($a) } keys %loc)
{
if ($loc{$r}->{type} eq 'l')
{
warn "Symbolic link $r not supported\n";
next;
}

next if exists $rem{$r};

print "$r file missing from the FTP repository\n" if $opt_v;
if ($loc{$r}->{type} eq 'd') {
$opt_k ? print "rmdir $r\n" : rmdir($r)
or die "Failed to DELETE $r\n";
} else {
$opt_k ? print "rm $r\n" : unlink($r)
or die "Failed to DELETE $r\n";
}
}
}
else
{
for my $r (sort { length($b) <=> length($a) } keys %rem)
{
if ($rem{$r}->{type} eq 'l')
{
warn "Symbolic link $r not supported\n";
next;
}

next if exists $loc{$r};

print "$r file missing locally\n" if $opt_v;
$opt_k ? print "DELETE $r\n" : $ftp->delete($r)
or die "Failed to DELETE $r\n";
}
}

__END__

=pod

=head1 NAME

ftpsync - Sync a hierarchy of local files with a remote FTP repository

=head1 SYNOPSIS

ftpsync [-h] [-v] [-d] [-k] [-P] [-s server] [-u username] [-p password] [-r remote] [-l local] [-i ignore] [-o offset]

=head1 ARGUMENTS

The recognized flags are described below:

=over 2

=item B<-h>

Produce this documentation.

=item B<-v>

Produce verbose messages while running.

=item B<-d>

Put the C object in debug mode and also emit some debugging
information about what is being done.

=item B<-k>

Just kidding. Only announce what would be done but make no change in
neither local nor remote files.

=item B<-P>

Set passive mode.

=item B<-D>

Download directory, rather than upload (default).

=item B<-i ignore>

Specifies a regexp. Files matching this regexp will be left alone.

=item B<-s server>

Specify the FTP server to use. Defaults to C.

=item B<-u username>

Specify the username. Defaults to 'anonymous'.

=item B<-p password>

Password used for connection. Defaults to an anonymous pseudo-email
address.

=item B<-r remote>

Specifies the remote directory to match against the local directory.

=item B<-l local>

Specifies the local directory to match against the remote directory.

=item B<-R max_recurse_depth>

Maximal depth of recursive directory synchron. 0 is for no recurse, -1 is for unlimited (default).

=item B<-o offset>

Allows the specification of a time offset between the FTP server and
the local host. This makes it easier to correct time skew or
differences in time zones.

=item B<-e>

Erases remote files that does not exists locally or
erases local files that does not exist remotely if used with -D option.

=back

=head1 DESCRIPTION

This is an example script that should be usable as is for simple
website maintenance. It synchronizes a hierarchy of local files /
directories with a subtree of an FTP server.

The synchronyzation is quite simplistic. It was written to explain how
to C and C.

Always use the C<-k> option before using it in production, to avoid
data loss.

=head1 BUGS

The synchronization is not quite complete. This script does not deal
with symbolic links. Many cases are not handled to keep the code short
and understandable.

=head1 AUTHORS

Luis E. Munoz , original author
Peter Orvos , added options D and R
Yaten Kou , added option e and defaults no delete to avoid accidents

=head1 SEE ALSO

Perl(1).

=cut

###################################################

download and upload depending on what's newer

Kalen's picture

this is really cool. almost exactly what i need but i'd like something that operates in both modes at the same time. so if the file on remote is newer than local, download. if the local file is newer, then upload. i must be missing something, b/c this should be a no-brainer??

mdtm not supported

JD's picture

The mdtm operation is unsupported on my ftp servers, I rewrote the scan_ftp function using the File::Listing module and it works fine now.
Thanx for your script, it was very helpful to start working on this subject.

sub scan_ftp2
{
my $ftp = shift;
my $path = shift;
my $rrem = shift;

my $rdir = length($path) ? parse_dir($ftp->dir($path)) : parse_dir($ftp->dir());

return unless $rdir and @$rdir;

foreach my $entry (@$rdir) {
my ($n, $type, $size, $mtime, $mode) = @$entry;

my $name = '';
$name = $path . '/' if $path;
$name .= $n;

if ($opt_i and $name =~ m/$opt_i/)
{
print "ftp: IGNORING $name\n" if $opt_v;
next;
}

next if exists $rrem->{$name};

my $mdtm = ($mtime || 0) + $opt_o;
$size = $size || 0;

warn "ftp: adding $name ($mdtm, $size, $type)\n" if $opt_d;

$rrem->{$name} =
{
mdtm => $mdtm,
size => $size,
type => $type,
};

scan_ftp2($ftp, $name, $rrem) if $type eq 'd';
} ## foreach
}

method contains a call to parse_dir which is not included

Cman's picture

Would it be possible to post also all the code at which this method depends on?
For example, "scan_ftp2" does a call to "parse_dir" which is not included in the code snipped.

Regards, Corry.

using mdtm to delete files older than n number of days

Anonymous's picture

I am working an a perl script that would connect to an ftp site and delete any files older than the number of days that I pass it. For example, if I pass it 30, it should delete any files older than 30 days. From the net::ftp doc I found the mdtm method that can return the last modification time of a file. How can I use this to delete a file? I'm at beginners level in perl.

Thanks for this wonderful site and line by line explanation. It helps tremendously.

Null $path to ftp->dir()

Anonymous's picture

The first time through scan_ftp() with an empty $path variable, the ftp->dir() in my Net::FTP package would send 'LIST ' (note trailing space) and get an unrecongized command error back. My horrible hack to fix was to change line 92 to:

my $rdir = (length($path)) ? $ftp->dir($path) : $ftp->dir();

This causes 'LIST' to be sent if length of $path is zero and all started working properly. (Perl 5.6.1 netlib/ftp 2.67 netlib 1.13 on Debian). I was using the -r ftpsync option to set a remote path, FWIW.

Re: Null $path to ftp->dir()

Anonymous's picture

Thanks a lot for the bug report/fix. I was already working on it as Tim Rowe pointed me to it. This does not break my FTP server, so I didn't catch it in time.

There's an updated version of this article at http://mipagina.cantv.net/lem/perl/ftpsync.htm. You can download a fixed version of the script there but please, keep coming to linuxjournal.com :)

Best regards and thanks a lot to all for the feedback.

-lem

Re: Synchronizing FTP Files with Perl

Anonymous's picture

great article Luis...

the use of underscore was new to me ;)

and thanks to Anonymous poster for putting me on to rsync

Re: Synchronizing FTP Files with Perl

wdtj's picture

Where is also a common perl script called mirror that does this, plus more.

It is a good example of perl though, and we can always use more perl examples.

Good article.

I found mirror a few years

Rick's picture

I found mirror a few years ago and have been using it since then. Unfortunately, I have been struggling with it lately. The first problem I ran into was that the ls -lRat was stopping after 5000 files. Now, I'm syncing many more files than that, and I've been having problems with Mirror. It does things like flag directories on the remote destination server (I'm uploading) for deletion, when the sources exist and haven't changed ever.

I've been starting down the path of making my own mirror script, and this code looks very promising. I was getting ready to do the work in shell, using ncftp, and I think this will work much better.

Thanks to all; I hope that I'm able to contribute something to the code as well.

Regards,
Rick

Re: Synchronizing FTP Files with Perl

Anonymous's picture

That's what I thought as well until I went to actually try Mirror. That is a complex program to setup and the way to sync files is not very clear. It turns out you have to set the get_files option to false to get mirror to put files from the local dirs to the server. Sure it's logical, just not intuitive. Don't get me wrong, I think it's a great tool, just not very well-documented. Seems that it was written with one-way mirroring in mind more than file synchronizing.

Re: Synchronizing FTP Files with Perl

Anonymous's picture

Wow! Thanks for this tip. I just deleted "mirror" from my system because I thought it could not PUT, only GET. (Typical developer myopia to spend hundreds of hours writing something comprehensive ike mirror and then skip writing a few decent introductory paragraphs outlining context and broad capabilities. Geez...)

Re: Synchronizing FTP Files with Perl

Anonymous's picture

what's wrong with rsync over ssh again?

Re: Synchronizing FTP Files with Perl

Anonymous's picture

As my knowledge rsync doesn't support nested directories. Only one level is supported:
http://optics.ph.unimelb.edu.au/help/rsync/rsync_pc1.html

this is just wrong. rsync is

Anonymous's picture

this is just wrong. rsync is recursive.
further more the beautifull about rsync is that it only transfers parts of files that have been modified:
lets say you have a 600MB file and you change only 2 bytes in the middle of it. rsync will transfert only one block (of 4Ko I guess).

Re: Synchronizing FTP Files with Perl

Anonymous's picture

If you don't have shell access to the remote machine you need something like this when you are PUTting files to it, since rsync will not be available.

Re: Synchronizing FTP Files with Perl

Anonymous's picture

Actually nothing if your hosting provider supports them.

recursive?

Börkur's picture

Is it hard to add recursive option to this?

If not please post the code ;)

already done, read first

Anonymous's picture

already done, read first comments

Problem with MKDIR after this line print "$l dir missing in the

Rajesh Muthu's picture

Hi
I'm new to perl . I want to backup the files from my production server which runs on linux to a window based ftp server. I'm using Filezilla FTP Server.
Question 1:
Is there files/folders size restriction ?
Question 2:
I have lots of files. But everyday I get only few new files that I have to backup to window based ftp server. Will there be performance issue if number of files is in large number?

Question 3:
Every time I run the script, it breaks up with this message. I use -v option
connected
temp/in dir missing in the FTP repository
Failed to MKDIR temp/in

But that folder exist in both local and FTP site ?
What could be the problem

Any help will be appreciated

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.
  • Use to create page breaks.

More information about formatting options