BackupPC-users

Re: [BackupPC-users] errors in cpool after e2fsck corrections

2009-01-18 12:22:09
Subject: Re: [BackupPC-users] errors in cpool after e2fsck corrections
From: Matthias Meyer <matthias.meyer AT gmx DOT li>
To: backuppc-users AT lists.sourceforge DOT net
Date: Sun, 18 Jan 2009 18:17:59 +0100
Holger Parplies wrote:

> Hi,
> 
> Matthias Meyer wrote on 2009-01-18 15:33:30 +0100 [Re: [BackupPC-users]
> errors in cpool after e2fsck corrections]:
>> Johan Ehnberg wrote:
>> > Quoting Matthias Meyer <matthias.meyer AT gmx DOT li>:
>> > 
>> >> After a system crash and tons of errors in my ext3 filesystem I have
>> >> to run e2fsck.
>> >> During this I lost some GB of data in /var/lib/backuppc.
>> >> [...]
>> >> I believe the reason is that
>> >> /var/lib/backuppc/cpool/8/4/5/845a684e4a8c9fe22d11484dc13e24fc
>> >> is a directory and not a file. Probably during e2fsck created.
> 
> no, e2fsck does not *create* directories; this is a clear evidence of
> on-disk data corruption.
I had a lot of HTREE errors, duplicate claimed inodes and not claimed inodes. 
Maybe the files would be directories before e2fsck run.
> 
>> >> Should I delete all directories in /var/lib/backuppc/cpool/?/?/?/*
>> >> or would BackupPC_nightly do this job?
> 
> I doubt so. BackupPC_nightly is not designed to fix broken file systems.
Ok. I will delete this directories. In your responsibility (Just a joke :-)
> 
>> > Sorry to hear about that. I would recommend the following:
>> > - Consider all the backed up data corrupt (don't build any new backups
>> > on it) - Start a fresh pool, saving the old one for the duration of
>> > your normal cycle - Look for the reason for the crash/corruption and
>> > prevent it from happening
>> [...]
>> I would believe the filesystem should be ok in the meantime. e2fsck needs
>> to run 3 or 4 times and need in total more than 2 days. After this
>> lost+found contains approximately 10% of my data :-( No chance to
>> reconstruct all of them.
>> 
>> 1) So you would recommend:
>> mv /var/lib/backuppc/cpool /var/lib/backuppc/cpool.sav
>> mkdir /var/lib/backuppc/cpool
> 
> No. The point seems to be *getting rid of the corrupt file system*. You
> don't know what exactly was corrupted on-disk. You have definite evidence
> that a lot was - and possibly still is (3 or 4 e2fscks to find all
> problems? What should a subsequent check find that a previous one
> didn't?). You can trust in everything being ok now, but you might as well
> trust in not needing your backups in the first place. You can't really
> verify it.
But if a new backup of a file occurs, it should be ok and it should be possible 
to restore this file.
> 
> The key phrase is
> 
>> > - Look for the reason for the crash/corruption and prevent it
>> > from happening
> 
> - this can likely mean exchanging the disk (cables, mainboard, memory,
> power supply ...). You don't give any details about your system crash or
> hardware setup, so there is little point in guessing what might have gone
> wrong.
Debian stable in VMware. 4 SATA Disks with Adaptec 1420A
in software raid5 and LVM2. Both controlled within the vmware.
extracts from /var/log/messages:
Jan 13 17:38:04 FileServer -- MARK --
Jan 13 17:49:30 FileServer kernel: mptscsih: ioc0: attempting task abort! 
(sc=c15be840)
Jan 13 17:49:30 FileServer kernel: sd 0:0:1:0:
Jan 13 17:49:30 FileServer kernel:         command: cdb[0]=0x2a: 2a 00 00 00 ce 
49 00 00 30 00
Jan 13 17:49:30 FileServer kernel: mptbase: ioc0: IOCStatus(0x004b): SCSI IOC 
Terminated
Jan 13 17:49:30 FileServer kernel: mptscsih: ioc0: task abort: SUCCESS 
(sc=c15be840)
Jan 13 18:18:05 FileServer -- MARK --
Jan 13 22:18:13 FileServer -- MARK --
Jan 13 22:18:14 FileServer kernel: rpc-srv/tcp: nfsd: got error -104 when 
sending 32900 bytes - shutting down socket
Jan 13 22:38:13 FileServer -- MARK --
Jan 13 22:38:57 FileServer shutdown[24187]: shutting down for system reboot
Jan 13 22:42:02 FileServer kernel: NFSD: starting 90-second grace period
Jan 13 23:01:52 FileServer -- MARK --
Jan 14 02:41:54 FileServer -- MARK --
Jan 14 02:49:29 FileServer kernel: mptscsih: ioc0: attempting task abort! 
(sc=c15c0720)
Jan 14 02:49:29 FileServer kernel: sd 0:0:3:0:
Jan 14 02:49:29 FileServer kernel:         command: cdb[0]=0x2a: 2a 00 0b 8d 0a 
b9 00 00 08 00
Jan 14 02:49:29 FileServer kernel: mptbase: ioc0: IOCStatus(0x004b): SCSI IOC 
Terminated
Jan 14 02:49:29 FileServer kernel: mptscsih: ioc0: task abort: SUCCESS 
(sc=c15c0720)
Jan 14 02:49:29 FileServer kernel: mptscsih: ioc0: attempting task abort! 
(sc=c15c0960)
Jan 14 02:49:29 FileServer kernel: sd 0:0:2:0:
Jan 14 02:49:29 FileServer kernel:         command: cdb[0]=0x2a: 2a 00 0c 63 38 
91 00 00 10 00
Jan 14 02:49:29 FileServer kernel: mptbase: ioc0: IOCStatus(0x004b): SCSI IOC 
Terminated
Jan 14 02:49:29 FileServer kernel: mptscsih: ioc0: task abort: SUCCESS 
(sc=c15c0960)
Jan 14 02:49:29 FileServer kernel: mptscsih: ioc0: attempting task abort! 
(sc=c15c0180)
Jan 14 02:49:30 FileServer kernel: sd 0:0:1:0:
Jan 14 02:49:30 FileServer kernel:         command: cdb[0]=0x2a: 2a 00 0c 63 38 
81 00 00 10 00
Jan 14 02:49:30 FileServer kernel: mptbase: ioc0: IOCStatus(0x004b): SCSI IOC 
Terminated
Jan 14 02:49:30 FileServer kernel: mptscsih: ioc0: task abort: SUCCESS 
(sc=c15c0180)
Jan 14 02:52:45 FileServer kernel: mptscsih: ioc0: attempting task abort! 
(sc=da8bf3e0)
Jan 14 02:52:45 FileServer kernel: sd 0:0:1:0:
Jan 14 02:52:45 FileServer kernel:         command: cdb[0]=0x2a: 2a 00 09 cc 5e 
b9 00 00 08 00
Jan 14 02:52:45 FileServer kernel:         command: cdb[0]=0x2a: 2a 00 09 cc 5e 
b9 00 00 08 00
Jan 14 02:52:45 FileServer kernel: mptbase: ioc0: IOCStatus(0x004b): SCSI IOC 
Terminated
Jan 14 02:52:45 FileServer kernel: mptscsih: ioc0: task abort: SUCCESS 
(sc=da8bf3e0)
Jan 14 03:21:54 FileServer -- MARK --
Jan 14 07:42:24 FileServer -- MARK --
Jan 14 07:48:20 FileServer kernel: mptscsih: ioc0: attempting task abort! 
(sc=c15c0060)
Jan 14 07:48:20 FileServer kernel: sd 0:0:2:0:
Jan 14 07:48:20 FileServer kernel:         command: cdb[0]=0x2a: 2a 00 0b 8d 23 
59 00 00 08 00
Jan 14 07:48:20 FileServer kernel: mptbase: ioc0: IOCStatus(0x004b): SCSI IOC 
Terminated
Jan 14 07:48:20 FileServer kernel: mptscsih: ioc0: task abort: SUCCESS 
(sc=c15c0060)
Jan 14 08:02:24 FileServer -- MARK --
Jan 14 12:42:28 FileServer -- MARK --
Jan 14 20:23:48 FileServer syslogd 1.4.1#18: restart.
Jan 14 20:23:48 FileServer kernel: klogd 1.4.1#18, log source = /proc/kmsg 
started.
Jan 14 20:23:48 FileServer kernel: Linux version 2.6.18 (root AT 
FileServer.PrivateLAN DOT at) (gcc version 4.1.3 20070812 (prerelease) (Debian 
4.1.2-15$
Jan 14 20:23:48 FileServer kernel: BIOS-provided physical RAM map:
Jan 14 20:23:48 FileServer kernel:  BIOS-e820: 0000000000000000 - 
000000000009f800 (usable)

I would believe it is a problem with the SATA cable. I discuss that in a
german debian mailinglist. But if somebody have a hint for me - thanks a lot!
> 
> Either your backup data and history are vitally important to you, in which
> case you don't want to trust the current state of your pool file system
> for future backups, or they aren't, in which case you can get rid of them
> and save yourself future headaches. If you can avoid it, you probably
> don't want to overwrite your current pool for a while, in case you need to
> restore something. Making an archive of the last backup(s) seems unlikely
> to get every file content right, so you could need to resort to versions
> of files in older backups ...
> 
>> [...]
>> During the deletion of old backups also old, (maybee corrupt) files in
>> cpool will be deleted. So possible corrupt files in cpool will disappear
>> automaticly during the next month.
> 
> Yes, but do you know the implementation of the ext[23] file system well
> enough to tell what will happen to possible corruption of file system
> metadata?
No, I surly not know enough. But e2fsck tells me that all is allright with the
filesystem. So I have files which claimed wrong blocks or inodes. I can not 
trust
the content of the files. But BackupPC will verify each new backuped file 
against the
cpool. Byte by byte.
So I believe BackupPC will verify my files in the next weeks :-)
> 
> Regards,
> Holger
> 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users AT lists.sourceforge DOT net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/

-- 
Don't Panic


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/