BackupPC-users

Re: [BackupPC-users] Linux backups with rsync vs tar

2011-08-31 12:08:57
Subject: Re: [BackupPC-users] Linux backups with rsync vs tar
From: Les Mikesell <lesmikesell AT gmail DOT com>
To: backuppc-users AT lists.sourceforge DOT net
Date: Wed, 31 Aug 2011 11:07:41 -0500
On Wed, Aug 31, 2011 at 4:53 AM, charlesboyo
<backuppc-forum AT backupcentral DOT com> wrote:
>
> I'm using BackupPC to take daily backups of a maildir totaling 250 GB with 
> average file sizes of 500 MB (text mailboxes, these files change everyday).

'Maildir' usually refers to a format where each message is in its own
file.  However, this sounds like a directory of mailbox format files
where the file consists of many messages appended together and
modified for every change.   Maildir format is much more 'backup
friendly' because older messages don't change that often.

> However, these backups take about 8 hours to complete, averaging 8 Mbps and 
> the BackupPC server is CPU-bound through-out the entire process. Thus I have 
> reason to suspect the rsync overhead as being guilty.
> Note that I have disabled hard links, implemented checksum caching, increased 
> the block size to 512 KB and enable --whole-file to no avail.

Rsync isn't that great with large files that change.  Normally it will
copy parts of the file from the previous version to merge with the
changes being sent, resulting in a lot of extra disk traffic (and
linux normally reports disk wait time as cpu time).   The --whole-file
option should change that behavior but I'm not sure how it is
implemented in backuppc's version of rsync.

> With this background, I will appreciate answers to the following questions:
>
> 1. since over 90% of the files change every day and "incremental" backups 
> involve transferring the whole file to the BackupPC server, won't it make 
> better sense to just run a full backup everyday?

Changing to maildir format storage might change that.   'Full' backups
with rsync tend to be slow because it is still going to read the whole
directory on the target and it will do a full read on all of the files
that are still in common with the previous run to do a checksum
verification.

> 2. from Pavel's questions, he observed that BackupPC is unable to recover 
> from interrupted tar transfer. Such interruptions simply cannot happen in my 
> case. Should I switch to tar? And in the unlikely event that the transferred 
> does get interrupted, what mechanisms do I need to implement to 
> resume/recover from the failure?

Yes, tar will be faster for mailbox format were the files are all
going to be changed.  Recovery is only an issue where bandwidth limits
make the time a problem.  If you have a problem your recovery is to
just do another run.

> 3. What is the recommended process for switching from rsync to tar - since 
> the format/attributes are reportedly incompatible? I would like to preserve 
> existing compressed backups as much as possible.

Not sure about that.  With mailbox format you aren't going to have
much pooling anyway.

-- 
  Les Mikesell
    lesmikesell AT gmail DOT com

------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>