BackupPC-users

Re: [BackupPC-users] Stop TrachClean / Return directories backup list

2013-03-20 20:23:04
Subject: Re: [BackupPC-users] Stop TrachClean / Return directories backup list
From: Phil Kennedy <Phillip.kennedy AT yankeeairmuseum DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 20 Mar 2013 20:21:47 -0400
On 3/20/2013 7:57 PM, Adam Goryachev wrote:
> On 21/03/13 10:41, Phil Kennedy wrote:
>> Self-replying to add a little detail here;
>>
>> The system that failed is a Red Hat Enterprise system (5.9 at the
>> moment.) The system has the backuppc (3.2.0) pool living on a Promise
>> vTrak system in a Software RAID 10 (softlinked to a partition called
>> /backup). That part is fine.
>>
>> The OS drives were supposed to have been configured as a software RAID 1
>> but (and for the life of me, I cannot figure out how this could happen
>> aside from malice or gross incompetence) the secondary drive (/dev/sdb)
> gross incompetence on whose part? The sysadmin that left some time ago,
> or the sysadmin that has been responsible for the system since the
> previous admin left?

In this case, the previous admin has been gone less than a year. As that 
year has progressed, I've found a lot of the stuff that he "documented" 
he did so by dumping out a stream of consciousnesses the day he put 
things together, then never updated it later as the system grew. The 
longer I go, the more things I find that were ticking time bombs waiting 
to go off.
>> apparently hadn't synced with the primary drive (/dev/sda) since August
>> of 2009. Literally everything (passwd, group, grub, fstab, the works)
>> there was almost four years old. Unfortunately for me, the primary drive
>> failed taking the current (though supposedly, mirrored) configs with it.
>> The system obviously has undergone a great deal of expansion and
>> tweaking in the interim. Thus, the config files in /etc/backuppc were
>> essentially the defaults.
>>
>> Now, the folders within the directories are there. There are directories
>> under /backup/pc/hostname/ by those directories do not show in the menu
>> when you try to to browse via the web interface.
>>
>> I'll set TreashCleanSleepSec to a ridiculous number as was suggested.
>> Obviously, once I realized that the system may have been eating or had
>> eaten data, I stopped the backuppc service. There is some data currently
>> in /backup/trash (though murphey's law says it won't be any of the more
>> important data that may be missing). If I can ID which system the data
>> came from, I can probably just move the data back under its host
>> directory, correct? under ../trash the directories are named something
>> like 1363794493_24518_0, if I move them under ../pc/hostname/ and give
>> them a name like 100, it should show as backup 100, correct? (assuming
>> all permissions are correct?)
> That sounds right to me, technically you should also rebuild the backups
> file, but this is not required to be able to browse the backup (in my
> experience).
>> Thanks for the pointers. This event has furthered my belief that
>> software RAID is crap.
> Interesting, in my opinion, it would further my belief that software
> RAID is fantastic. Hardware RAID usually uses different tools for each
> RAID controller brand (or model), which can be frustratingly difficult
> to get a proper current status from it. In addition, they often (in my
> experience) can have a failed drive without anyone becoming aware of it
> (no user close enough to hear the alarm, no alarm sounding, hard to get
> status, etc), and as always, if the controller fails you are SOL as far
> as getting a working system again.
>
> A simple cat /proc/mdstat would have shown the current status of your
> software RAID array, installing mdadm and configuring would allow it to
> automatically send you alert emails when any drive was missing from the
> array, etc.
We're using mdadm on the array holding the pool. Six months ago, that 
array failed when three drives fails two of which the hardware RAID 
believed failed at the same time. The previous admin believed 
(adamantly) that firmware updates were a waste of time. In this case, a 
year after the array was installed, Promise released firmware that would 
have mitigated the failure we suffered.
>
> Personally, I use a complete (free open source) monitoring system
> (www.xymon.com) with a plugin which will monitor my software raid array,
> this then alerts me via SMS of any failures. It may be overkill in your
> situation, but I would strongly suggest a minimum of mdadm to send
> emails on failure, though you should also consider other failures that
> might bite you in the future (such as backuppc dying and never running a
> backup, not discovered until you need to restore something, or many
> other possible issues). IMHO, a server which is not monitored is a
> disaster that you just don't know about yet.
>
> PS, that is not to say that you will never experience a disaster just
> because you monitor a system, there is always some new way things can
> break which the monitoring system did not test for, but these are much
> more rare, and once you experience them once, you can write the
> monitoring script for it (very easy with xymon).
>
> Regards,
> Adam

Previous admin had us using monitoring with Zenoss (with SNMP hardware 
monitoring.) I'm in the process of moving us to Nagios with a much, MUCH 
deeper level of monitoring. I've inherited mess that I was led to 
believe was a finely tuned machine. Somehow I doubt my experiences are 
unique....
~Phil


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/