Here is the scenario. I have a cluster of machines at home thta I've had
Amanda working on for years. I even had to go through the protocol
conversion. Ot's been working well for all these years.
Last weelend I decided to move the Amanda tape/index server from an HP-UX
10.20 9000/835 to and Athalon 1.2GHZ machien with a 400G holding disk.
There is one machien that I can't get to accept authnetiactian yet (an NIS
problem I think), but other than that amcheck runs fine. I _do_ have some
data size vs tape size issues since one of the things that drove me to do
this was the addition of 2 more 40G drives that are over 50% full, but I
think I've got that under control, as I am adding more tapes to the
tapecyclem and double dumpcycle.
However, over and above those issues, I'm seeing _a lot_ of srnage
failures. Some of these are on machines that I have yet to get 2.43.B4 to
compile on, but the example below is from a Debina GNU Linux machine that
_is_ runing 2.4.3B4:
sendbackup: debug 1 pid 8969 ruid 106 euid 106: start at Sat Jan 18 14:01:25
2003
/usr/local/amanda/libexec/sendbackup: version 2.4.3b4
parsed request as: program `DUMP'
disk `hda1'
device `hda1'
level 2
since 2003:1:18:12:50:58
options `|;auth=bsd;compress-best;index;'
sendbackup: try_socksize: send buffer size is 65536
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.32793
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.32794
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.32795
sendbackup: time 0.000: waiting for connect on 32793, then 32794, then 32795
sendbackup: time 0.002: stream_accept: connection from 205.159.77.224.2575
sendbackup: time 0.003: stream_accept: connection from 205.159.77.224.2576
sendbackup: time 0.004: stream_accept: connection from 205.159.77.224.2577
sendbackup: time 0.004: got all connections
sendbackup: time 0.004: spawning /bin/gzip in pipeline
sendbackup: argument list: /bin/gzip --best
sendbackup-dump: time 0.005: pid 8971: /bin/gzip --best
sendbackup: time 0.061: spawning /sbin/dump in pipeline
sendbackup: argument list: dump 2usf 1048576 - /dev/hda1
sendbackup: time 0.078: started index creator: "/sbin/restore -tvf - 2>&1 | sed
-e '
s/^leaf[ ]*[0-9]*[ ]*\.//
t
/^dir[ ]/ {
s/^dir[ ]*[0-9]*[ ]*\.//
s%$%/%
t
}
d
'"
sendbackup: time 0.088: 91: normal(|): DUMP: Date of this level 2 dump: Sat
Jan 18 14:01:25 2003
sendbackup: time 0.089: 91: normal(|): DUMP: Date of last level 1 dump: Sat
Jan 18 07:50:59 2003
sendbackup: time 0.090: 91: normal(|): DUMP: Dumping /dev/hda1 (/) to
standard output
sendbackup: time 0.091: 91: normal(|): DUMP: Added inode 7 to exclude list
(resize inode)
sendbackup: time 0.264: 91: normal(|): DUMP: Label: none
sendbackup: time 0.265: 91: normal(|): DUMP: mapping (Pass I) [regular
files]
sendbackup: time 167.711: 91: normal(|): DUMP: mapping (Pass II)
[directories]
sendbackup: time 217.670: 91: normal(|): DUMP: estimated 124486 tape blocks.
sendbackup: time 217.706: 91: normal(|): DUMP: Volume 1 started with block
1 at: Sat Jan 18 14:05:03 2003
sendbackup: time 217.845: 91: normal(|): DUMP: dumping (Pass III)
[directories]
sendbackup: time 218.193: 91: normal(|): DUMP: dumping (Pass IV) [regular
files]
sendbackup: time 517.247: 91: normal(|): DUMP: 52.86% done at 219 kB/s,
finished in 0:04
sendbackup: time 817.596: 91: normal(|): DUMP: 80.75% done at 167 kB/s,
finished in 0:02
sendbackup: time 943.223: index tee cannot write [Broken pipe]
sendbackup: time 943.223: pid 8972 finish time Sat Jan 18 14:17:09 2003
sendbackup: time 943.829: 112: normal(|):
sendbackup: time 943.830: 115: strange(?): gzip: stdout: Connection reset by
peer
sendbackup: time 943.831: 115: strange(?): sendbackup: index tee cannot write
[Broken pipe]
sendbackup: time 943.832: 91: normal(|): DUMP: Broken pipe
sendbackup: time 943.833: 91: normal(|): DUMP: The ENTIRE dump is aborted.
sendbackup: time 944.265: error [compress returned 1, /sbin/dump returned 3]
sendbackup: time 944.265: pid 8969 finish time Sat Jan 18 14:17:10 2003
Thes failures don't seem localized to any one machien or filesystem. Can
anyone sugest what steps to take to debug this?
--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
-- Benjamin Franklin
|