BackupPC-users

Re: [BackupPC-users] Disk space used far higher than reported pool size

2013-10-31 13:58:54
Subject: Re: [BackupPC-users] Disk space used far higher than reported pool size
From: Timothy J Massey <tmassey AT obscorp DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 31 Oct 2013 13:51:27 -0400
"Craig O'Brien" <cobrien AT fishman DOT com> wrote on 10/31/2013 01:33:30 PM:

> > Just out of curiosity, why hadn't you already done that?!? 

>
> I didn't know which host was the problem and didn't think of it.
> Although I'll readily admit it seems painfully obvious to me now. :)


Just so you're sufficiently humble...  :)

For everyone's future reference:  ALWAYS check the server error log *and* the per-host logs...  :)

> >The big question is, though, why they aren't linking.  I'd really
> start at the bottom of the stack (the physical drives) and work your
> way up.  Check dmesg for any hardware errors.  

>
> bash-4.1$ grep -i backup /var/log/dmesg*

> bash-4.1$

Nice try, but won't help:  you need to be looking for the correct sd or ata device that is used.

Don't bother with a grep like that.  do a dmesg > dmesg.txt and then vi (or whatever) dmesg.txt and look for scary errors...  Look particularly for sda (or sdb or whatever), or ata0 (or 1 or whatever) messages, or possibly scsi messages (yes, SATA is SCSI to Linux) too.

But if they're there, these should not be hard to find:  there tends to be *LOTS* of them.

> bash-4.1$ grep -i backup /var/log/messages*

Mine comes back with nothing.

> messages-20131006:Sep 30 13:53:24 servername kernel: BackupPC_dump
> [15365]: segfault at a80 ip 000000310f695002 sp 00007fff438c9770
> error 4 in libperl.so[310f600000+162000]

> messages-20131006:Sep 30 13:53:27 servername abrtd: Package
> 'BackupPC' isn't signed with proper key

> messages-20131020:Oct 19 01:24:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:24:54 servername kernel: BackupPC_dump D
> 0000000000000001     0 11922  10626 0x00000080

> messages-20131020:Oct 19 01:30:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:30:54 servername kernel: BackupPC_dump D
> 0000000000000001     0 11922  10626 0x00000080

> messages-20131020:Oct 19 01:32:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:32:54 servername kernel: BackupPC_dump D
> 0000000000000001     0 11922  10626 0x00000080

> messages-20131020:Oct 19 01:32:54 servername kernel: INFO: task
> BackupPC_nightl:18390 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:32:54 servername kernel: BackupPC_nigh D
> 0000000000000001     0 18390   1262 0x00000080

> messages-20131020:Oct 19 01:48:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:48:54 servername kernel: BackupPC_dump D
> 0000000000000003     0 11922  10626 0x00000080

> messages-20131020:Oct 19 01:52:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:52:54 servername kernel: BackupPC_dump D
> 0000000000000001     0 11922  10626 0x00000080

> messages-20131020:Oct 19 01:52:54 servername kernel: INFO: task
> BackupPC_nightl:18390 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:52:54 servername kernel: BackupPC_nigh D
> 0000000000000001     0 18390   1262 0x00000080

> messages-20131020:Oct 19 01:56:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 01:56:54 servername kernel: BackupPC_dump D
> 0000000000000003     0 11922  10626 0x00000080

> messages-20131020:Oct 19 02:10:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 02:10:54 servername kernel: BackupPC_dump D
> 0000000000000001     0 11922  10626 0x00000080

> messages-20131020:Oct 19 02:12:54 servername kernel: INFO: task
> BackupPC_dump:11922 blocked for more than 120 seconds.

> messages-20131020:Oct 19 02:12:54 servername kernel: BackupPC_dump D
> 0000000000000001     0 11922  10626 0x00000080

> messages-20131027:Oct 23 09:00:02 servername abrtd: Package
> 'BackupPC' isn't signed with proper key


I'd try Googling those:  they have no meaning for me (and my servers don't have them).

What distro are you using?  (I use CentOS/RHEL)

> > fsck the filesystem. 
>
> bash-4.1$ fsck /dev/sda1

> fsck from util-linux-ng 2.17.2
> e2fsck 1.41.12 (17-May-2010)
> /dev/sda1: clean, 20074506/2929688576 files, 2775975889/2929686016 blocks
> bash-4.1$

Definitely a good sign.

> >Did I read correctly that this is connected vis NFSv4?  I sure hope
> not...  (I'm willing to admit it's a phobia, but there's no *WAY* I
> would trust my backup to work across NFS...) 

>
> The drives are local SATA ones that I set up in a raid 5, directly
> mounted. Def not NFS. I had an unrelated drive mounted via NFS, but
> that had nothing to do with my backup system and that's probably the
> source of confusion.


md raid5?  What's the status of /dev/mdstat ?

> So the du command finished, here's the result:
>
> bash-4.1$ du -hs /backup/pool /backup/cpool /backup/pc/fileserver/*

> 1.4T    /backup/pc/fileserver/529
> 1.4T    /backup/pc/fileserver/534
> 1.4T    /backup/pc/fileserver/540
> 1.4T    /backup/pc/fileserver/544
> 1.3T    /backup/pc/fileserver/549

First, you may want to delete one or more of these to free up space.  Second, these are all 5 backups apart.  5 is an odd number.  If they were fulls I would expect them to be *7* days apart, unless you have something crazy like it taking 3 days to run a full backup or something.  But I'm going to assume that those are full backups.

Next, examine the logs for those backups and find out what went wrong.  It's probably the error message that you already copied, which Jeff commented on.  How many errors are we talking about?

Find which files are causing the problem.  Is it just a few large files, or a lot of little ones?

It's possible that those files have become corrupted *within* the pool and that's what's causing problems.  If it's not an underlying device/filesystem problem, then it might be the compression as Jeff mentioned.  (Reason #53 why I have *nothing* to do with compression with my backups!)  You may be able to delete these files out of the pool and BackupPC will re-create them when you do your next backup.

In short, though, it seems that your pool is corrupted.  I tend to be *VERY* conservative when it comes to my backups.  When I don't need them, they are completely valueless.  But when I need them, they are GOLD.  So right this second, while you don't need them, I would suggest biting the bullet and rebuilding the pool.  (In my book, rebuilding the pool means starting from scratch:  re-create the array, reformat the partition and reinstall BackupPC.)

Of course, I wouldn't do that without some *other* sort of backup.  But it seems you have less than 3TB of *total* data (for a single copy).  I'd buy an external drive and do a backup of each and every system  (using some other tool such as NTBackup or Windows Server backup for Windows, and a complete manual rsync for Linux) to it before I destroyed my BackupPC.

But that's me.  I'm extremely conservative with backup.

Unfortunately, now that you've localized the problem, I am unlikely to be able to help.  I have no knowledge related to the error messages you've reported, and you (and others) can operate Google as well as I can...

Tim Massey
 
Out of the Box Solutions, Inc.
Creative IT Solutions Made Simple!

http://www.OutOfTheBoxSolutions.com
tmassey AT obscorp DOT com
      22108 Harper Ave.
St. Clair Shores, MI 48080
Office: (800)750-4OBS (4627)
Cell: (586)945-8796

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>