Contents
Abstract
Today, many people use several computers—one computer at home, one or several computers at the workplace, and possibly a laptop or PDA on the road. Many files are needed on all these computers. You may want to be able to work with all computers and modify the files and subsequently have the latest version of the data available on all computers.
Data synchronization is no problem for computers that are permanently linked by means of a fast network. In this case, use a network file system, like NFS, and store the files on a server, enabling all hosts to access the same data via the network. This approach is impossible if the network connection is poor or not permanent. When you are on the road with a laptop, copies of all needed files must be on the local hard disk. However, it is then necessary to synchronize modified files. When you modify a file on one computer, make sure a copy of the file is updated on all other computers. For occasional copies, this can be done manually with scp or rsync. However, if many files are involved, the procedure can be complicated and requires great care to avoid errors, such as overwriting a new file with an old file.
![]() | Risk of Data Loss |
|---|---|
Before you start managing your data with a synchronization system, you should be well acquainted with the program used and test its functionality. A backup is indispensable for important files. | |
The time-consuming and error-prone task of manually synchronizing data can be avoided by using one of the programs that use various methods to automate this job. The following summaries are merely intended to convey a general understanding of how these programs work and how they can be used. If you plan to use them, read the program documentation.
CVS, which is mostly used for managing program source versions, offers the possibility to keep copies of the files on multiple computers. Accordingly, it is also suitable for data synchronization. CVS maintains a central repository on the server in which the files and changes to files are saved. Changes that are performed locally are committed to the repository and can be retrieved from other computers by means of an update. Both procedures must be initiated by the user.
CVS is very resilient to errors when changes occur on several computers. The changes are merged and, if changes took place in the same lines, a conflict is reported. When a conflict occurs, the database remains in a consistent state. The conflict is only visible for resolution on the client host.
When no version control is needed but large directory structures need to be synchronized over slow network connections, the tool rsync offers well-developed mechanisms for transmitting only changes within files. This not only concerns text files, but also binary files. To detect the differences between files, rsync subdivides the files into blocks and computes checksums over them.
The effort put into the detection of the changes comes at a price. The systems to synchronize should be scaled generously for the usage of rsync. RAM is especially important.
There are some important factors to consider when deciding which program to use.
Two different models are commonly used for distributing data. In the first model, all clients synchronize their files with a central server. The server must be accessible by all clients at least occasionally. This model is used by CVS.
The other possibility is to let all networked hosts synchronize their data between each other as peers. rsync actually works in client mode, but any client can also act as a server.
CVS and rsync are also available for many other operating systems, including various Unix and Windows systems.
In CVS, the data synchronization is started manually by the user. This allows fine control over the data to synchronize and easy conflict handling. However, if the synchronization intervals are too long, conflicts are more likely to occur.
Conflicts only rarely occur in CVS, even when several people work on one large program project. This is because the documents are merged on the basis of individual lines. When a conflict occurs, only one client is affected. Usually conflicts in CVS can easily be resolved.
There is no conflict handling in rsync. The user is responsible for not accidentally overwriting files and manually resolving all possible conflicts. To be on safe side, a versioning system like RCS can be additionally employed.
In CVS, new directories and files must be added
explicitly using the command
cvs add.
This results in greater user control over the files to synchronize.
On the other hand, new files are often overlooked, especially when
the question marks in the output of
cvs update are ignored
due to the large number of files.
An additional feature of CVS is that old file versions can be reconstructed. A brief editing remark can be inserted for each change and the development of the files can easily be traced later based on the content and the remarks. This is a valuable aid for theses and program texts.
A sufficient amount of free space for all distributed data is required on the hard disks of all involved hosts. CVS require additional space for the repository database on the server. The file history is also stored on the server, requiring even more space. When files in text format are changed, only the modified lines need to be saved. Binary files require additional space amounting to the size of the file every time the file is changed.
Experienced users normally run CVS from the command line. However, graphical user interfaces are available for Linux, such as cervisia, and for other operating systems, like wincvs. Many development tools, such as kdevelop, and text editors, such as Emacs, provide support for CVS. The resolution of conflicts is often much easier to perform with these front-ends.
rsync is rather easy to use and is also suitable
for newcomers. CVS is somewhat more difficult to
operate. Users should understand the interaction between the
repository and local data. Changes to the data should first be
merged locally with the repository. This is done with the command
cvs update. Then the
data must be sent back to the repository with the command
cvs commit. Once this
procedure has been understood, newcomers are also able to use CVS
with ease.
During transmission, the data should ideally be protected against interception and manipulation. CVS and rsync can easily be used via ssh (secure shell), providing security against attacks of this kind. Running CVS via rsh (remote shell) should be avoided. Accessing CVS with the pserver mechanism in insecure networks is likewise not advisable.
CVS has been used by developers for a long time to manage program projects and is extremely stable. Because the development history is saved, CVS even provides protection against certain user errors, such as unintentional deletion of a file.
Table 39.1. Features of the File Synchronization Tools: -- = very poor, - = poor or not available, o = medium, + = good, ++ = excellent, x = available ¶
|
CVS |
rsync | |
|---|---|---|
|
Client/Server |
C-S |
C-S |
|
Portability |
Lin,Un*x,Win |
Lin,Un*x,Win |
|
Interactivity |
x |
x |
|
Speed |
o |
+ |
|
Conflicts |
++ |
o |
|
File Sel. |
Sel./file, dir. |
Dir. |
|
History |
x |
- |
|
Hard Disk Space |
-- |
o |
|
GUI |
o |
- |
|
Difficulty |
o |
+ |
|
Attacks |
+ (ssh) |
+(ssh) |
|
Data Loss |
++ |
+ |
CVS is suitable for synchronization purposes if individual files are edited frequently and are stored in a file format, such as ASCII text or program source text. The use of CVS for synchronizing data in other formats, such as JPEG files, is possible, but leads to large amounts of data, because all variants of a file are stored permanently on the CVS server. In such cases, most of the capabilities of CVS cannot be used. The use of CVS for synchronizing files is only possible if all workstations can access the same server.
The server is the host on which all valid files are located, including the latest versions of all files. Any stationary workstation can be used as a server. If possible, the data of the CVS repository should be included in regular backups.
When configuring a CVS server, it might be a good idea to
grant users access to the server via SSH. If the user is known to
the server as tux and the CVS software is installed on
the server as well as on the client, the following environment
variables must be set on the client side:
CVS_RSH=ssh CVSROOT=tux@server:/serverdir
The command
cvs init can be used to
initialize the CVS server from the client side. This needs to be
done only once.
Finally, the synchronization must be assigned a name. Select
or create a directory on the client exclusively to contain files to
manage with CVS (the directory can also be empty). The name of the
directory is also the name of the synchronization. In this example,
the directory is called synchome. Change to this
directory and enter the following command to set the synchronization
name to synchome:
cvs import synchome tux wilber
Many CVS commands require a comment. For this purpose, CVS
starts an editor (the editor defined in the environment variable
$EDITOR or vi if no editor was defined). The
editor call can be circumvented by entering the comment in advance
on the command line, such as in the following example:
cvs import -m 'this is a test' synchome tux wilber
The synchronization repository can now be checked out from all
hosts with cvs co
synchome. This creates a new subdirectory
synchome on the client. To commit your changes
to the server, change to the directory synchome
(or one of its subdirectories) and enter
cvs commit.
By default, all files (including subdirectories) are committed
to the server. To commit only individual files or directories,
specify them as in cvs commit
file1 directory1. New files and directories must be added
to the repository with a command like
cvs add file1
directory1 before they are committed to the server.
Subsequently, commit the newly added files and directories with
cvs commit file1
directory1.
If you change to another workstation, check out the synchronization repository if this has not been done during an earlier session at the same workstation.
Start the synchronization with the server with
cvs update. Update
individual files or directories as in
cvs update file1
directory1. To see the difference between the current files
and the versions stored on the server, use the command
cvs diff or
cvs diff file1
directory1. Use cvs -nq
update to see which files would be affected by an update.
Here are some of the status symbols displayed during an update:
The local version was updated. This affects all files that are provided by the server and missing on the local system.
The local version was modified. If there were changes on the server, it was possible to merge the differences in the local copy.
The local version was patched with the version on the server.
The local file conflicts with current version in the repository.
This file does not exist in CVS.
The status M indicates a locally modified
file. Either commit the local copy to the server or remove the local
file and run the update again. In this case, the missing file is
retrieved from the server. If you commit a locally modified file and
the file was changed in the same line and committed, you might get a
conflict, indicated with C.
In this case, look at the conflict marks (»> and
«<) in the file and decide between the two versions. As this
can be a rather unpleasant job, you might decide to abandon your
changes, delete the local file, and enter
cvs up to retrieve the
current version from the server.
This section merely offers a brief introduction to the many possibilities of CVS. Extensive documentation is available at the following URLs:
Rsync: http://www.gnu.org/manual
rsync is useful when large amounts of data need to be transmitted regularly while not changing too much. This is, for example, often the case when creating backups. Another application concerns staging servers. These are servers that store complete directory trees of Web servers that are regularly mirrored onto a Web server in a DMZ.
rsync can be operated in two different modes. It can be used to archive or copy data. To accomplish this, only a remote shell, like ssh, is required on the target system. However, rsync can also be used as a daemon to provide directories to the network.
The basic mode of operation of rsync does not require any special configuration. rsync directly allows mirroring complete directories onto another system. As an example, the following command creates a backup of the home directory of tux on a backup server named sun:
rsync -baz -e ssh /home/tux/ tux@sun:backup
The following command is used to play the directory back:
rsync -az -e ssh tux@sun:backup /home/tux/
Up to this point, the handling does not differ much from that of a regular copying tool, like scp.
rsync should be operated in “rsync” mode to make
all its features fully available. This is done by starting the
rsyncd daemon on one of the systems. Configure it in the file
/etc/rsyncd.conf. For example, to make the
directory /srv/ftp available with rsync, use
the following configuration:
gid = nobody
uid = nobody
read only = true
use chroot = no
transfer logging = true
log format = %h %o %f %l %b
log file = /var/log/rsyncd.log
[FTP]
path = /srv/ftp
comment = An Example
Then start rsyncd with
rcrsyncd start. rsyncd
can also be started automatically during the boot process. Set this
up by activating this service in the runlevel editor provided by
YaST or by manually entering the command
insserv rsyncd. rsyncd
can alternatively be started by xinetd. This is, however, only
recommended for servers that rarely use rsyncd.
The example also creates a log file listing all connections.
This file is stored in /var/log/rsyncd.log.
It is then possible to test the transfer from a client system. Do this with the following command:
rsync -avz sun::FTP
This command lists all files present in the directory
/srv/ftp of the server. This request is also
logged in the log file /var/log/rsyncd.log. To
start an actual transfer, provide a target directory. Use
. for the current directory. For example:
rsync -avz sun::FTP .
By default, no files are deleted while synchronizing with
rsync. If this should be forced, the additional option
--delete must be stated. To ensure that no newer
files are deleted, the option --update can be used
instead. Any conflicts that arise must be resolved manually.
Important information about rsync is provided in the man pages
man rsync and
man rsyncd.conf. A
technical reference about the operating principles of rsync is
featured in
/usr/share/doc/packages/rsync/tech_report.ps.
Find the latest news about rsync on the project Web site at http://rsync.samba.org/.
If you want Subversion or other tools, download the the SDK. Find it at http://developer.novell.com/wiki/index.php/SUSE_LINUX_SDK.