BackupPC-users

Re: [BackupPC-users] Stop TrachClean / Return directories backup list

2013-03-22 13:47:56
Subject: Re: [BackupPC-users] Stop TrachClean / Return directories backup list
From: Arnold Krille <arnold AT arnoldarts DOT de>
To: <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 22 Mar 2013 18:46:15 +0100
Am 2013-03-21 01:21, schrieb Phil Kennedy:
> Previous admin had us using monitoring with Zenoss (with SNMP hardware
> monitoring.) I'm in the process of moving us to Nagios with a much, 
> MUCH
> deeper level of monitoring. I've inherited mess that I was led to
> believe was a finely tuned machine. Somehow I doubt my experiences are
> unique....

Been there, seen that.

Actually I am in the same position and working my way trough lots of 
interesting-configured machines and legacy stuff (to say it nicely).

But blaming the data-loss on the software-raid and not on the missing 
monitoring/care, is just plain wrong.
As others noted, with a hardware-raid you trade in a "proven" name for 
much less possibilities of monitoring. Reading /dev/mdstat or the output 
of "mdadm -D <raid>" works as soon as you have an ssh-connection. And no 
monitoring solution can claim to be usable without monitoring 
/dev/mdstat. For hw-raids on the other hand you get a different tool for 
each brand (and possibly series), a different way of alarming and then 
incompatibilities with controllers, harddisks and firmware-revisions. 
Once you lost data due to a hw-controller only notifying with a led on 
the back before failing completely and newer controller-releases not 
understanding the old revisions disk-format, you will be very glad when 
you can simply plug your sw-raids disk into any other linux machine and 
access the data. When you use the old meta-data format, you don't even 
need sw-raid support, you can simply mount the partitions to restore 
your data.

Your story reminds me of one of my first all-nighter here with a 
customers server. The former admin had set up the customers new 
storage-server just fine. But then we wondered why only one of the three 
disks was used for the raid. Mainly because he 'didn't yet find the time 
to sync'. But also because he did the setup on a broken disk with 
read-errors (in currently unused space) and syncing the complete disk to 
the other raid members would freeze the system and abort the sync... 
Something I noticed just seconds after I started the sync via ssh...

Anyway, stay calm, if in doubt stop all automatic backups on the 
machine concerned (there is a config option for that), and don't blame 
human deficits on the software/hardware in use.

Good luck,

Arnold

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/