BackupPC-users

Re: [BackupPC-users] Stop TrachClean / Return directories backup list

2013-03-20 19:58:41
Subject: Re: [BackupPC-users] Stop TrachClean / Return directories backup list
From: Adam Goryachev <mailinglists AT websitemanagers.com DOT au>
To: backuppc-users AT lists.sourceforge DOT net
Date: Thu, 21 Mar 2013 10:57:17 +1100
On 21/03/13 10:41, Phil Kennedy wrote:
> Self-replying to add a little detail here;
>
> The system that failed is a Red Hat Enterprise system (5.9 at the 
> moment.) The system has the backuppc (3.2.0) pool living on a Promise 
> vTrak system in a Software RAID 10 (softlinked to a partition called 
> /backup). That part is fine.
>
> The OS drives were supposed to have been configured as a software RAID 1 
> but (and for the life of me, I cannot figure out how this could happen 
> aside from malice or gross incompetence) the secondary drive (/dev/sdb) 
gross incompetence on whose part? The sysadmin that left some time ago,
or the sysadmin that has been responsible for the system since the
previous admin left?
> apparently hadn't synced with the primary drive (/dev/sda) since August 
> of 2009. Literally everything (passwd, group, grub, fstab, the works) 
> there was almost four years old. Unfortunately for me, the primary drive 
> failed taking the current (though supposedly, mirrored) configs with it. 
> The system obviously has undergone a great deal of expansion and 
> tweaking in the interim. Thus, the config files in /etc/backuppc were 
> essentially the defaults.
>
> Now, the folders within the directories are there. There are directories 
> under /backup/pc/hostname/ by those directories do not show in the menu 
> when you try to to browse via the web interface.
>
> I'll set TreashCleanSleepSec to a ridiculous number as was suggested. 
> Obviously, once I realized that the system may have been eating or had 
> eaten data, I stopped the backuppc service. There is some data currently 
> in /backup/trash (though murphey's law says it won't be any of the more 
> important data that may be missing). If I can ID which system the data 
> came from, I can probably just move the data back under its host 
> directory, correct? under ../trash the directories are named something 
> like 1363794493_24518_0, if I move them under ../pc/hostname/ and give 
> them a name like 100, it should show as backup 100, correct? (assuming 
> all permissions are correct?)
That sounds right to me, technically you should also rebuild the backups
file, but this is not required to be able to browse the backup (in my
experience).
> Thanks for the pointers. This event has furthered my belief that 
> software RAID is crap.
Interesting, in my opinion, it would further my belief that software
RAID is fantastic. Hardware RAID usually uses different tools for each
RAID controller brand (or model), which can be frustratingly difficult
to get a proper current status from it. In addition, they often (in my
experience) can have a failed drive without anyone becoming aware of it
(no user close enough to hear the alarm, no alarm sounding, hard to get
status, etc), and as always, if the controller fails you are SOL as far
as getting a working system again.

A simple cat /proc/mdstat would have shown the current status of your
software RAID array, installing mdadm and configuring would allow it to
automatically send you alert emails when any drive was missing from the
array, etc.

Personally, I use a complete (free open source) monitoring system
(www.xymon.com) with a plugin which will monitor my software raid array,
this then alerts me via SMS of any failures. It may be overkill in your
situation, but I would strongly suggest a minimum of mdadm to send
emails on failure, though you should also consider other failures that
might bite you in the future (such as backuppc dying and never running a
backup, not discovered until you need to restore something, or many
other possible issues). IMHO, a server which is not monitored is a
disaster that you just don't know about yet.

PS, that is not to say that you will never experience a disaster just
because you monitor a system, there is always some new way things can
break which the monitoring system did not test for, but these are much
more rare, and once you experience them once, you can write the
monitoring script for it (very easy with xymon).

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/