I have a server here which I've been backing up with Amanda for some time.
It's a Linux machine, and has a load of disks on a RAID controller.
Recently, the machine lost a disk. The RAID array sorted out the problem,
but the machine has been decidely less than stable since then - dies
about every 4 days. I've run numerous hardware tests, and found and
corrected a parity error on one of the RAID-5 arrays, but my backups
are still not working properly. I'm getting mail from Amanda like
the one below on a nightly basis.
I'm now of the opinion that either the parity error I corrected
was not the cause of the problems, and there's something else in
the machine that has been physically damaged, or the disk failure
managed to somehow corrupt some part of the system software on the
machine, which I've not identified. Visible problems are the
Amanda errors, and my IMAP server (cyrus) which dumps sig 11's in
the log file from time to time. Cyrus runs a master process which
notices the dead processes and respawns them, so this isn't visible
to the users. As sig 11's are often memory related, I've run
memtest86 on the machine. After 5 hours, and 3 passes, it reported
no errors.
Are there any Amanda experts who can say, from the report below,
whether this is a problem that's likely to stem from stuffed up
software/configs, or from a hardware problem, and what hardware
problems might cause these sort of symptoms.
Mike.
===============================================================
*** THE DUMPS DID NOT FINISH PROPERLY!
These dumps were to tape ACU-STD-15.
Tonight's dumps should go onto 1 tape: ACU-STD-1.
FAILURE AND STRANGE DUMP SUMMARY:
driver: FATAL infofile update failed (yildun,hda7)
taper: FATAL syncpipe_get: w: unexpected EOF
castor sda7 RESULTS MISSING
castor sda8 RESULTS MISSING
castor sda9 RESULTS MISSING
castor sdb1 RESULTS MISSING
castor sdb2 RESULTS MISSING
pollux sda10 RESULTS MISSING
pollux sda13 RESULTS MISSING
yildun hda5 RESULTS MISSING
yildun hda6 RESULTS MISSING
yildun hda9 RESULTS MISSING
STATISTICS:
Total Full Daily
-------- -------- --------
Dump Time (hrs:min) 0:04 0:00 0:00 (0:03 start)
Output Size (meg) 259.8 239.7 20.1
Original Size (meg) 259.8 239.7 20.1
Avg Compressed Size (%) -- -- --
Tape Used (%) 0.7 0.7 0.1 (level:#disks ...)
Filesystems Dumped 10 2 8 (1:8)
Avg Dump Rate (k/s) 2597.0 3824.1 537.9
Avg Tp Write Rate (k/s) 10087.9 82901.7 878.9
NOTES:
planner: Forcing full dump of yildun:hda6 as directed.
planner: Forcing full dump of yildun:hda7 as directed.
DUMP SUMMARY:
DUMPER STATS TAPER STATS
HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s
-------------------------- -------------------------------------- --------------
castor sda1 1 288 288 -- 0:03 91.2 0:02 130.7
castor sda10 1 128 128 -- 0:04 32.6 0:02 65.9
castor sda6 1 64 64 -- 0:03 20.7 0:02 39.5
castor sda7 MISSING --------------------------------------------
castor sda8 MISSING --------------------------------------------
castor sda9 MISSING --------------------------------------------
castor sdb1 MISSING --------------------------------------------
castor sdb2 MISSING --------------------------------------------
pollux sda1 1 32 32 -- 0:00 167.3 0:02 26.1
pollux sda10 MISSING --------------------------------------------
pollux sda11 1 736 736 -- 0:01 520.7 0:04 186.3
pollux sda12 1 6304 6304 -- 0:02 2624.0 0:03 2032.7
pollux sda13 MISSING --------------------------------------------
pollux sda5 1 256 256 -- 0:02 123.0 0:03 106.6
yildun hda1 0 3872 3872 -- 0:01 5232.7 0:03 1318.3
yildun hda5 MISSING --------------------------------------------
yildun hda6 MISSING --------------------------------------------
yildun hda7 0 241600 241600 -- 1:03 3807.6 FAILED ------
yildun hda8 1 12768 12768 -- 0:22 580.4 0:04 3442.9
yildun hda9 MISSING --------------------------------------------
(brought to you by Amanda version 2.4.1p1)
|