BackupPC-users

Re: [BackupPC-users] BackupPC recovery from unreliable disk

2011-12-22 18:52:46
Subject: Re: [BackupPC-users] BackupPC recovery from unreliable disk
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 22 Dec 2011 18:49:04 -0500
JP Vossen wrote at about 21:50:29 -0500 on Wednesday, December 21, 2011:
 > I'm running Debian Squeeze stock backuppc-3.1.0-9 on a server and I'm
 > getting kernel messages [1] and SMART errors [2] about the WD 2TB SATA
 > disk.  Fine, I RMA'd it and have the new one...  Now what?  I know I can 
 > either 'dd' or start fresh.  But...
 > 
 > 
 > If I start fresh, I know everything will be work and be valid, but I
 > lose my historical backups when I wipe the bad disk and RMA it.
 > 
 > 
 > If I 'ddrescue' BAD --> GOOD, I'll worry about the integity of the
 > BackupPC store.  As I understand it, the incoming files are hashed and
 > stored, but the store itself is never checked (true?).  So when I do
 > backups, if an incoming file hash matches a file already in the store,
 > the incoming file is "de-duped" and dropped.  But what if the file
 > actually in the store is corrupt due to the bad disk?

If the hash of a new file matches the hash of an existing pool file
then the contents are compared since there is always the possibility
of a hash collision since the file hash is a partial file md5sum that
is based on the first and last 128K slice plus the filesize.

 > 
 > Am I correct?  If so, is there a way to have BackupPC validate that the
 > files in the pool actually match their hash and weren't mangled by the disk?

Of course, there is no guarantee that the pool files themselves are
not corrupt. Checking the files against their pool file name hash can
rule out some file corruption but if the file size is unchanged and
the corruption is not in the first or last 128K slice then the hash
will be unchanged so any corruption won't be detectable.

That being said, I have written several routines to both check and fix
the pool for corruption relative to the partial file md5sum pool file
name hash. Please search the archives where I have discussed and
posted the code...

Note that there have been bugs in BackupPC itself and also in various
pool libraries (specifically on arm5 processors) that cause relatively
innocuous errors in the pool file names relative to the actual
intended partial file md5sum hash.

 > 
 > 
 > Any other solution I'm missing?
 > 
 > Thanks,
 > JP
 > ___________________________________________
 > [1] Example kernel errors:
 > 
 > Security Events for kernel
 > =-=-=-=-=-=-=-=-=-=-=-=-=-
 > kernel: [4020993.728571] end_request: I/O error, dev sda, sector 81203507
 > kernel: [4021009.712952] end_request: I/O error, dev sda, sector 81203507
 > 
 > System Events
 > =-=-=-=-=-=-=
 > kernel: [4020983.471256] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0
 > action 0x0
 > kernel: [4020983.471290] ata3.00: BMDMA stat 0x25
 > kernel: [4020983.471315] ata3.00: failed command: READ DMA
 > kernel: [4020983.471347] ata3.00: cmd
 > c8/00:18:33:11:d7/00:00:00:00:00/e4 tag 0 dma 12288 in
 > kernel: [4020983.471351]          res
 > 51/40:07:33:11:d7/40:00:28:00:00/e4 Emask 0x9 (media error)
 > kernel: [4020983.471424] ata3.00: status: { DRDY ERR }
 > kernel: [4020983.471446] ata3.00: error: { UNC }
 > kernel: [4020983.501157] ata3.00: configured for UDMA/133
 > 
 > 
 > [2] Example SMART error:
 > 
 > Error 1704 occurred at disk power-on lifetime: 10149 hours (422 days +
 > 21 hours)
 >    When the command that caused the error occurred, the device was
 > active or idle.
 > 
 >    After command completion occurred, registers were:
 >    ER ST SC SN CL CH DH
 >    -- -- -- -- -- -- --
 >    40 51 40 45 66 01 e0  Error: UNC 64 sectors at LBA = 0x00016645 = 91717
 > 
 >    Commands leading to the command that caused the error were:
 >    CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 >    -- -- -- -- -- -- -- --  ----------------  --------------------
 >    c8 00 40 3f 66 01 e0 08  46d+13:36:50.242  READ DMA
 >    ec 00 00 00 00 00 a0 08  46d+13:36:50.233  IDENTIFY DEVICE
 >    ef 03 46 00 00 00 a0 08  46d+13:36:50.225  SET FEATURES [Set transfer
 > mode]
 > 
 > ----------------------------|:::======|-------------------------------
 > JP Vossen, CISSP            |:::======|      http://bashcookbook.com/
 > My Account, My Opinions     |=========|      http://www.jpsdomain.org/
 > ----------------------------|=========|-------------------------------
 > "Microsoft Tax" = the additional hardware & yearly fees for the add-on
 > software required to protect Windows from its own poorly designed and
 > implemented self, while the overhead incidentally flattens Moore's Law.
 > 
 > ------------------------------------------------------------------------------
 > Write once. Port to many.
 > Get the SDK and tools to simplify cross-platform app development. Create 
 > new or port existing apps to sell to consumers worldwide. Explore the 
 > Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
 > http://p.sf.net/sfu/intel-appdev
 > _______________________________________________
 > BackupPC-users mailing list
 > BackupPC-users AT lists.sourceforge DOT net
 > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:    http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/