Re: [BackupPC-users] Linux backups with rsync vs tar
2011-09-02 10:52:10
charlesboyo <backuppc-forum AT backupcentral DOT com>
wrote on 08/31/2011 05:53:43 AM:
> I'm using BackupPC to take
daily backups of a maildir totaling 250
> GB with average file sizes of 500 MB (text mailboxes, these files
> change everyday).
> Currently, my setup take full backups once a week and incremental
> backups every day between the full backups. The servers are directly
> connected with a cross-cable, allowing 100 Mbps.
I have a very similar setup with several
servers. They are often connected using 100Mb/s just because the
clients haven't upgraded to Gb switches. Also, they back up IBM Lotus
Domino servers. In Domino, each mail user has their own mail database
which is typically Gigabytes big (except with this thing called DAMO, but
even then they're still hundreds of MB big). This is pretty comparable
to your environment, though my *total* size is not usually 250GB of just
mail data... I have file servers that are bigger, but not mail servers.
(I have some servers that back up Microsoft
Exchange servers. This is even worse: one monolithic file for
the *ENTIRE* mailstore. U G L Y... And incrementals *ARE* fulls!
:) )
> However, these backups take about 8 hours to complete, averaging 8
> Mbps and the BackupPC server is CPU-bound through-out the entire
> process.
Fulls or incrementals or both? If
truly 90% of your files are changing daily, I'm going to assume both. There
will be *very* little difference between a full backup and an incremental.
> Thus I have reason to suspect the
rsync overhead as being guilty.
> Note that I have disabled hard links, implemented checksum caching,
> increased the block size to 512 KB and enable --whole-file to no avail.
I have done zero tuning of the rsync
command: I use 100% stock BackupPC command line for it.
> 1. since over 90% of the files change every day and "incremental"
> backups involve transferring the whole file to the BackupPC server,
> won't it make better sense to just run a full backup everyday?
Incremental backups end up with a whole
new file, but when using rsync it does not do it by transferring the whole
file. The rsync protocol works on sending just the changed parts
of the file. HOWEVER, the whole file is read on *BOTH* ends of the
connection, so it doesn't save you a *BIT* of disk I/O: it only saves
you NETWORK I/O. Seeing as you have only 100Mb/s between them, that
will improve performance, but not tremendously dramatically, and like you
have found it exacts a CPU hit in order to do this.
You may find that trading CPU performance
for network performance may not be a good trade in your case. Having
said that, I run BackupPC on about the slowest systems you can actually
buy new today: VIA EPIA EN 1500 system boards with 512MB RAM. Terrible
performance, but meet my BackupPC needs just *fine*.
Hard numbers on the nearest Domino server
to me: 60GB total backed up for full, 18GB for incremental (this
is a DAOS server). Fulls take about 150 minutes, incrementals take
about 40. 1/4 the data, 1/4 the time. And that's on the miserable
hardware I described.
Scaling that up to your sizes, that
would take about 600 minutes, or 10 hours. So, the 8 hours that you're
seeing sounds reasonable.
The number one question I have is: is
this really a problem? If you have a backup window that allows this,
I would not worry about it. If you do *not*, then rsync might not
be for you.
To address a couple of things said in
other replies:
1) Avoiding building a file list is
pointless. It takes my servers just a couple of minutes. It
may certainly use RAM, but that is only an issue if you have millions of
files. And in that case, simply add more RAM. I'm a glutton
for punishment running with 512MB of RAM (and actually, I use 2GB in new
servers now: I just like to twist Les' tail! :) ).
2) Les' point about the format of the
files (one monolithic file for each mailbox vs. one file per e-mail) is
dead on. That allows 99% of the files to remain untouched once they're
backed up *once*. That will *vastly* reduce the backup times. (That
DAOS thing does a similar thing for Domino by breaking out attachments
into individual files, and hashing and pooling them in a manner very similarly
to a BackupPC pool, BTW. Before DAOS, my fulls and incrementals were
indistinguishable, now they're 4:1 size-wise. Plus a 50% reduction
in total disk usage. But I digress.)
However, be aware that now you substitute
the "my backups are taking a long time and don't pool" problem
with a "now I have to manage several *MILLION* files!" problem.
fsck can become a major issue in that case--with 250GB of e-mail,
even ls can be a major issue! Both have advantages and disadvantages.
Just be aware that it's not a clear win either way.
And you might not have a choice, making
the argument moot.
Now, for tar. Take my information
with a grain of salt: I have *never* run tar with BackupPC...
> 2. from Pavel's questions, he observed
that BackupPC is unable to
> recover from interrupted tar transfer. Such interruptions simply
> cannot happen in my case. Should I switch to tar?
Is that a trick question? "This
cannot happen. Should I do this?" Umm. No -- GIVEN
the conditions you yourself set. :)
http://en.wikipedia.org/wiki/Tautology_%28logic%29
> And in the
> unlikely event that the transferred does get interrupted, what
> mechanisms do I need to implement to resume/recover from the failure?
To repeat another response: restart
the backup...
> 3. What is the recommended process for switching from rsync to tar
-
> since the format/attributes are reportedly incompatible? I would
> like to preserve existing compressed backups as much as possible.
Your old backups should be 100% fine.
They will remain in the pool just fine, etc. I do not believe
that files transferred by rsync will pool with files transferred by tar
(due to the attribute issue you mention); however, for you that's
a moot point: 90% of your files don't pool, anyway.
As an aside, BackupPC (well, the pooling)
buys you virtually *nothing* in your application. With a fast enough
network connection, rsync buys everyone almost *nothing*, too. You
are using two tools that have very distinct advantages, but you're using
them in an environment that largely ignores their advantages.
This is not a *bad* thing. Every
single one of my backup servers is based on BackupPC, and all but maybe
2 shares are backed up using rsync. (The only exceptions I can think
of are where I'm backing up data on a NAS, and I can't or won't run rsyncd
on the NAS so I have to use SMB). Whether it's an advantage or disadvantage,
that's the setup I use. I vastly prefer consistency over performance.
But I can live with 8 hour backup windows.
If you can't, then you may have to make
different decisions. That's the fun of being the Administrator! :)
Timothy J. Massey
------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev _______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [BackupPC-users] Linux backups with rsync vs tar,
Timothy J Massey <=
|
|
|