I wrote a shell script the other day to sync remote files using rsync
. Thought I’d share it since it took me some time to get it exactly how I wanted.
rsync -rtvP --delete --include=$PATTERN* --exclude=* -e "ssh -i $SSH_KEY -p $SSH_PORT" $USERNAME@$DOMAIN:$SERVER_PATH/ $BACKUP_PATH/ 2> $ERROR_LOG
Per the man
page, rsync
is:
a fast, versatile, remote (and local) file-copying tool
It can also synchronize folders, so it’s more than just a file copying tool like scp
. Furthermore, rsync
has a significant number of options, so the documentation is quite lengthy.
To explain the code snippet above, I’ll start with the options in order of use and why I used them. You’ll also notice that I used $VARIABLES
throughout the script. The definitions of these variables (among a few others) were included in the original script, but their values were both private and irrelevant so I’ve simply excluded them.
-rtvP
The -r
or --recursive
option allows you to recurse folders and specify them as the source or destination. Don’t forget to add a trailing slash (/) to your path.
The -t
or --times
option preserves the modification times on files when transferred. It’s often appropriate and preferred to use the -a
or --archive
option which is the same as using -rlptgoD
. These options combined will recurse, copy symlinks as symlinks, and preserve permissions, modification times, group, owner, device files and special files (respectively). Perfect for archiving, but not what I wanted at the time.
The -v
or --verbose
option just causes rsync to be more ‘chatty’ and tell you what it’s doing.
The -P
option is the same as adding --partial --progress
. In essence, rsync
will keep partial files if the transfer is interrupted and tell you the progress of the file transfer via standard output (your terminal screen…unless you redirect it). I wanted both these options so I chose -P
.
–delete
Delete any files from the destination that do NOT exist in the source. There are a variety of other delete options to pick from should you need them.
–include=$PATTERN* –exclude=*
The --include
and --exclude
options take patterns that are matched against files in the source. The source folder included a number of files; however, I only wanted files that matched a specific pattern. In this case, all the files I wanted were prepended with something like ‘backup’, so that’s the value I assigned to $PATTERN. The filenames also included variable data like a timestamp, so in addition to the prefix I used the wildcard (*) to match any suffix.
If I hadn’t added the --exclude
option, I still would have transferred all the files from the source folder. --include
only explicitly says what should be included. It is NOT exclusive. Thus, I added --exclude=*
which matches all other files. These filter rules are executed in order and build on one another. Theoretically, you could use multiple --include
and --exclude
options as needed. man rsync
for more info.
-e “ssh -i $SSH_KEY -p $SSH_PORT”
The -e
or --rsh=COMMAND
option allows you to specify what remote shell want to use. I believe ssh
is the default on most distributions. However, I also wanted to specify the private key I would use to authenticate with the remote server and what port I would use. -e
allows me to specify these configurations.
$USERNAME@$DOMAIN:$SERVER_PATH/
The source path. Since it’s on a remote host, I’ve specified credentials and the hostname. Notice the trailing slash for my folder.
$BACKUP_PATH/
The destination path. Notice the trailing slash for my folder.
2> $ERROR_LOG
I chose to redirect all errors from sterr
to a specific document.
Final Thoughts
If you want to test the command to make sure it works, just add the --dry-run
option. I highly recommend it.
I’d also recommend creating a shell script file where you can define all your variables. It makes your script more readable and easier to edit in the future. Then you can add the script to your personal bin
of scripts.