Re: [BackupPC-users] Stop TrachClean / Return directories backup list
2013-03-22 13:47:56
Am 2013-03-21 01:21, schrieb Phil Kennedy:
> Previous admin had us using monitoring with Zenoss (with SNMP hardware
> monitoring.) I'm in the process of moving us to Nagios with a much,
> MUCH
> deeper level of monitoring. I've inherited mess that I was led to
> believe was a finely tuned machine. Somehow I doubt my experiences are
> unique....
Been there, seen that.
Actually I am in the same position and working my way trough lots of
interesting-configured machines and legacy stuff (to say it nicely).
But blaming the data-loss on the software-raid and not on the missing
monitoring/care, is just plain wrong.
As others noted, with a hardware-raid you trade in a "proven" name for
much less possibilities of monitoring. Reading /dev/mdstat or the output
of "mdadm -D <raid>" works as soon as you have an ssh-connection. And no
monitoring solution can claim to be usable without monitoring
/dev/mdstat. For hw-raids on the other hand you get a different tool for
each brand (and possibly series), a different way of alarming and then
incompatibilities with controllers, harddisks and firmware-revisions.
Once you lost data due to a hw-controller only notifying with a led on
the back before failing completely and newer controller-releases not
understanding the old revisions disk-format, you will be very glad when
you can simply plug your sw-raids disk into any other linux machine and
access the data. When you use the old meta-data format, you don't even
need sw-raid support, you can simply mount the partitions to restore
your data.
Your story reminds me of one of my first all-nighter here with a
customers server. The former admin had set up the customers new
storage-server just fine. But then we wondered why only one of the three
disks was used for the raid. Mainly because he 'didn't yet find the time
to sync'. But also because he did the setup on a broken disk with
read-errors (in currently unused space) and syncing the complete disk to
the other raid members would freeze the system and abort the sync...
Something I noticed just seconds after I started the sync via ssh...
Anyway, stay calm, if in doubt stop all automatic backups on the
machine concerned (there is a config option for that), and don't blame
human deficits on the software/hardware in use.
Good luck,
Arnold
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|
|
|