BackupPC-users

Re: [BackupPC-users] Block-level rsync-like hashing dd?

2011-04-12 16:11:53
Subject: Re: [BackupPC-users] Block-level rsync-like hashing dd?
From: Timothy J Massey <tmassey AT obscorp DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Tue, 12 Apr 2011 16:08:17 -0400
Timothy J Massey <tmassey AT obscorp DOT com> wrote on 04/12/2011 03:48:11 PM:

> Saturn2888 <backuppc-forum AT backupcentral DOT com> wrote on 04/12/2011 12:11:49 AM:
>
> > Les, that's a pretty good idea, running two, but I cannot do that
> > with these systems sadly. It'd be really nice to not have to take
> > the machine down to do the backup though. So I guess that's my
> > question, I'm looking for a way to backup the pool over Ethernet
> > while it's running.
>
> If the pool is changing (as in, the filesystem is mounted and
> BackupPC is running), then you *must* have some sort of snapshot
> capability, and in any case, you're going to have to bounce BackupPC
> at least *briefly*.
>
> Therefore, you have two options:  1) LVM Snapshot/dd or 2) Break a
> RAID mirror while unmounted.
>
> > While it's possible to dd an lvm snapshot, that's 1TB/day which is
> > quite a huge amount of time and bandwidth consumed. Is there no
> > other solution like ddsnap?
>
> Once you've got the LVM snapshot, you can do whatever the heck you
> want to with it:  any magic tool you wish for handling the dd.


To expand on this a little further:  I am not aware of any dd-like tool that is going to help you to avoid reading at *least* the entire active contents of the drive.  In other words, if the drive is a 1TB filesystem with 600GB of data in it, I do not know of any tool that is going to prevent you from having to *read* at least 600GB of data, and that's if you use something like parted (a la Clonezilla) to read only the "used" portions of the filesystem.

In the case of something like ddsnap, it's going to have to read both the entire *source* and entire *destination* device from end-to-end (in order to hash it and compare), and *then* write the changed data onto the destnation.

> > I mean, since my BackupPC pool is on an
> > lvm, doing a dd itself isn't the problem. It's that it's 1TB/day or
> > more depending on my file system size. It'd be really nice to have a
> > solution to sync only the changes of that day which then, I can
> > snapshot on the other machine.


And how will the drive *know* which of that 1TB will have changed?  There's nothing magically flagging changed sectors for you...  The only way it's going to is by *reading* the entire 1TB (or just the occupied data maybe) and comparing it to the destination.

Block-based hashes have one reason for existence:  to save *BANDWIDTH* between hosts.  They only help when the bandwidth is the limiting factor.  For devices within the same computer using SATA-2, you're talking about 300MB/s bandwidth!  For devices between two computers with Gigabit, you're talking 100MB/s bandwidth.  Even if you have a single SATA hard drive on each end, you should be able to get at least 20-30MB/s transfer rate.  None of those are your limit.

So, in short, with all of the tools I can think of, you're going to read at *least* the entire occupied portion of the drive.  So who cares if you then write out that entire portion to the destination?  While hard drive write performance is slower than read performance, you've still got sufficient bandwidth to do a measly 1TB in 24 hours...  That's about 12MB/sec in 24 hours.  If you can't sustain 12MB/sec, then you need to get better hardware.  Seriously:  no amount of software magic is going to help you then.  But it's a low number:  VIA EPIA motherboards, miserable as they are, can sustain this no problem...

Timothy J. Massey
 
Out of the Box Solutions, Inc.
Creative IT Solutions Made Simple!

http://www.OutOfTheBoxSolutions.com
tmassey AT obscorp DOT com
      22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/