Amanda-Users

dumps failing recently (data write: Connection reset by peer)

2007-05-08 09:21:57
Subject: dumps failing recently (data write: Connection reset by peer)
From: Steven Settlemyre <settlemy AT asel.udel DOT edu>
To: amanda-users AT amanda DOT org
Date: Tue, 08 May 2007 09:19:51 -0400
I haven't changed my configs for months and things were running great until last week. Since last tues, none of my dailies have finished, and last night a monthly failed.

Looking through the logs I see the problem always seems to start with "data write: Connection reset by peer" and "Don't know how to send ABORT command to chunker". I'm having a hard time interpreting the logs and can't seem to find too much in the archives about this. Was wondering if someone could walk me through an explanation of the problem and how to avoid it in the future.

My monthlies run tape spanning on 3 40G tapes.

Here is the email output generated:

*** THE DUMPS DID NOT FINISH PROPERLY!

These dumps were to tape Monthly21.
The next 3 tapes Amanda expects to use are: Monthly01, Monthly02, Monthly03.
The next 3 new tapes already labelled are: Monthly19, Monthly20, Monthly22.

FAILURE AND STRANGE DUMP SUMMARY:
wagstaff /usr/local lev 1 FAILED [data write: Connection reset by peer] lollipop /files1 lev 0 FAILED [data write: Connection reset by peer] helios /files3 lev 1 FAILED [data write: Connection reset by peer]
 helios      /                             RESULTS MISSING
 helios      /files2                       RESULTS MISSING
 helios      /usr                          RESULTS MISSING
 helios      /usr/local                    RESULTS MISSING
 helios      /var                          RESULTS MISSING
 lollipop      /                             RESULTS MISSING
 lollipop      /usr                          RESULTS MISSING
 lollipop      /usr/local                    RESULTS MISSING
 wagstaff      /files3                       RESULTS MISSING
 wagstaff      /files4                       RESULTS MISSING
 wagstaff      /files5                       RESULTS MISSING
 wagstaff      /files6/vol/Voiceware         RESULTS MISSING
 wizard        /files2                       RESULTS MISSING
 snapserver  /hd/vol_mnt0/shares/TermLab   RESULTS MISSING
 snapserver  /hd/vol_mnt0/shares/bcl       RESULTS MISSING
 snapserver  /hd/vol_mnt0/shares/biochem   RESULTS MISSING
 snapserver  /hd/vol_mnt0/shares/confocal  RESULTS MISSING
 driver: FATAL Don't know how to send ABORT command to chunker
 chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
 chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
 chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]


STATISTICS:
                         Total       Full      Incr.
                       --------   --------   --------
Estimate Time (hrs:min)    0:08
Run Time (hrs:min)         1:01
Dump Time (hrs:min)        1:55       1:40       0:16
Output Size (meg)        8519.7     7729.7      790.1
Original Size (meg)     13146.3    11595.5     1550.8
Avg Compressed Size (%)    64.8       66.7       50.9   (level:#disks ...)
Filesystems Dumped           35         12         23   (1:23)
Avg Dump Rate (k/s)      1261.0     1323.3      863.1

Tape Time (hrs:min)        0:53       0:44       0:09
Tape Size (meg)          8521.6     7730.3      791.3
Tape Used (%)              21.1       19.0        2.1   (level:#disks ...)
Filesystems Taped            35         12         23   (1:23)
  (level:#chunks ...)
Chunks Taped                 35         12         23   (1:23)
Avg Tp Write Rate (k/s)  2724.3     3000.8     1433.6

USAGE BY TAPE:
 Label           Time      Size      %    Nb    Nc
 Monthly21       0:53  8726112k   21.1    35    35


FAILED AND STRANGE DUMP DETAILS:

/--  wagstaff /usr/local lev 1 FAILED [data write: Connection reset by peer]
sendbackup: start [wagstaff level 1]
sendbackup: info BACKUP=/usr/sbin/ufsdump
sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc |/usr/sbin/ufsrestore -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
|   DUMP: Writing 32 Kilobyte records
|   DUMP: Date of this level 1 dump: Tue May 08 01:11:26 2007
|   DUMP: Date of last level 0 dump: Mon Apr 30 23:54:14 2007
| DUMP: Dumping /dev/rdsk/c0t0d0s7 (wagstaff:/usr/local) to standard output.
|   DUMP: Mapping (Pass I) [regular files]
|   DUMP: Mapping (Pass II) [directories]
|   DUMP: Mapping (Pass II) [directories]
|   DUMP: Mapping (Pass II) [directories]
|   DUMP: Estimated 13585968 blocks (6633.77MB) on 0.10 tapes.
|   DUMP: Dumping (Pass III) [directories]
|   DUMP: Dumping (Pass IV) [regular files]
|   DUMP: 16.49% done, finished in 0:50
|   DUMP: 28.34% done, finished in 0:57
|   DUMP: 38.89% done, finished in 1:12
\--------

/--  lollipop /files1 lev 0 FAILED [data write: Connection reset by peer]
sendbackup: start [lollipop level 0]
sendbackup: info BACKUP=/usr/sbin/ufsdump
sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc |/usr/sbin/ufsrestore -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
|   DUMP: Writing 32 Kilobyte records
|   DUMP: Date of this level 0 dump: Mon May 07 23:52:56 2007
|   DUMP: Date of last level 0 dump: the epoch
|   DUMP: Dumping /dev/rdsk/c0t2d0s2 (lollipop:/files1) to standard output.
|   DUMP: Mapping (Pass I) [regular files]
|   DUMP: Mapping (Pass II) [directories]
|   DUMP: Estimated 34371404 blocks (16782.91MB) on 0.25 tapes.
|   DUMP: Dumping (Pass III) [directories]
|   DUMP: Dumping (Pass IV) [regular files]
|   DUMP: 6.08% done, finished in 2:34
|   DUMP: 13.06% done, finished in 2:13
|   DUMP: 19.93% done, finished in 2:00
|   DUMP: 26.85% done, finished in 1:49
|   DUMP: 33.93% done, finished in 1:37
|   DUMP: 41.38% done, finished in 1:25
|   DUMP: 49.81% done, finished in 1:10
|   DUMP: 57.36% done, finished in 0:59
|   DUMP: 60.37% done, finished in 0:59
|   DUMP: 64.71% done, finished in 0:54
|   DUMP: 72.54% done, finished in 0:41
|   DUMP: 78.46% done, finished in 0:32
|   DUMP: 84.45% done, finished in 0:23
|   DUMP: 89.91% done, finished in 0:15
|   DUMP: 96.51% done, finished in 0:05
\--------

/--  helios /files3 lev 1 FAILED [data write: Connection reset by peer]
sendbackup: start [helios level 1]
sendbackup: info BACKUP=/bin/tar
sendbackup: info RECOVER_CMD=/bin/gzip -dc |/bin/tar -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
\--------


NOTES:
planner: wagstaff /files6/vol/speech7 20070507 0 [dumps too big, 9328 KB, full dump delayed] planner: oz /files1 20070507 0 [dumps too big, 11716 KB, full dump delayed] planner: wagstaff /files6/vol/bvd 20070507 0 [dumps too big, 12856 KB, full dump delayed]
 planner: oz / 20070507 0 [dumps too big, 21760 KB, full dump delayed]
planner: wagstaff / 20070507 0 [dumps too big, 23105 KB, full dump delayed]
 planner: helios / 20070507 0 [dumps too big, 38583 KB, full dump delayed]
 planner: wizard / 20070507 0 [dumps too big, 44165 KB, full dump delayed]
planner: snapserver /hd/vol_mnt0/shares/surgery 20070507 0 [dumps too big, 91538 KB, full dump delayed] planner: wagstaff /usr 20070507 0 [dumps too big, 112839 KB, full dump delayed] planner: lollipop /usr 20070507 0 [dumps too big, 131680 KB, full dump delayed] planner: wizard /usr 20070507 0 [dumps too big, 162844 KB, full dump delayed] planner: martin /var 20070507 0 [dumps too big, 178985 KB, full dump delayed] planner: wizard /var 20070507 0 [dumps too big, 210323 KB, full dump delayed] planner: professor /usr/local 20070507 0 [dumps too big, 219910 KB, full dump delayed] planner: snapserver /hd/vol_mnt0/shares/NeuroGen 20070507 0 [dumps too big, 247911 KB, full dump delayed] planner: wagstaff /files6/vol/spdata1 20070507 0 [dumps too big, 358932 KB, full dump delayed] planner: lollipop /usr/local 20070507 0 [dumps too big, 436518 KB, full dump delayed] planner: wizard /usr/local 20070507 0 [dumps too big, 458028 KB, full dump delayed] planner: snapserver /hd/vol_mnt0/shares/immuno 20070507 0 [dumps too big, 494376 KB, full dump delayed]
 planner: oz /usr 20070507 0 [dumps too big, 520179 KB, full dump delayed]
planner: wagstaff /files6/vol/spdata2 20070507 0 [dumps too big, 522063 KB, full dump delayed] planner: snapserver /hd/vol_mnt0/shares/MID 20070507 0 [dumps too big, 522216 KB, full dump delayed] planner: professor /usr 20070507 0 [dumps too big, 565153 KB, full dump delayed] planner: helios /usr 20070507 0 [dumps too big, 639655 KB, full dump delayed] planner: helios /var 20070507 0 [dumps too big, 1609562 KB, full dump delayed] planner: helios /usr/local 20070507 0 [dumps too big, 1930526 KB, full dump delayed] planner: helios /files2 20070507 0 [dumps too big, 3326170 KB, full dump delayed] planner: snapserver /hd/vol_mnt0/shares/BioInf 20070507 0 [dumps too big, 4663399 KB, full dump delayed] planner: wagstaff /usr/local 20070507 0 [dumps too big, 5087313 KB, full dump delayed] planner: snapserver /hd/vol_mnt0/shares/urology 20070507 0 [dumps too big, 5786512 KB, full dump delayed] planner: helios /files3 20070507 0 [dumps too big, 8327643 KB, full dump delayed] planner: wagstaff /files1 20070507 0 [dumps too big, 11203120 KB, full dump delayed]
 taper: tape Monthly21 kb 8726112 fm 35 [OK]


DUMP SUMMARY:
DUMPER STATS TAPER STATS HOSTNAME DISK L ORIG-kB OUT-kB COMP% MMM:SS KB/s MMM:SS KB/s -------------------------- ------------------------------------------- ------------- helios / MISSING ------------------------------------------------- helios /files2 MISSING ------------------------------------------------- helios /files3 1 FAILED -------------------------------------------------- helios /usr MISSING ------------------------------------------------- helios /usr/local MISSING ------------------------------------------------- helios /var MISSING ------------------------------------------------- lollipop / MISSING ------------------------------------------------- lollipop /files1 0 FAILED -------------------------------------------------- lollipop /usr MISSING ------------------------------------------------- lollipop /usr/local MISSING ------------------------------------------------- martin /var 1 33930 15808 46.6 0:03 5776.6 0:04 4107.7
oz /           1        290         96   33.1    0:00  404.7   0:01   83.3
oz /files1     1       3640       1344   36.9    0:00 4641.9   0:04  304.1
oz /usr        1      60180      18432   30.6    0:05 3425.3   0:10 1822.2
oz /var        0     796760     339968   42.7    1:34 3632.0   3:12 1771.7
professor.as / 0 104980 46016 43.8 0:22 2099.4 0:09 4941.8 professor.as /usr 1 6590 640 9.7 0:01 449.2 0:01 534.7 professor.as /usr/local 1 1640 192 11.7 0:00 422.0 0:09 21.8 snapserver -res/BioInf 1 2380 416 17.5 0:13 27.5 0:05 90.0 snapserver -shares/MID 1 310 96 31.0 0:04 11.8 0:01 83.7 snapserver -es/MolGene 0 10 64 640.0 0:00 32.8 0:03 23.2 snapserver -s/NeuroGen 1 80 64 80.0 0:01 10.5 0:01 56.0 snapserver -es/TermLab MISSING ------------------------------------------------- snapserver -ares/admin 0 10 64 640.0 0:00 38.7 0:01 45.1 snapserver -shares/bcl MISSING ------------------------------------------------- snapserver -es/biochem MISSING ------------------------------------------------- snapserver -s/confocal MISSING ------------------------------------------------- snapserver -ares/histo 0 10 64 640.0 0:00 31.3 0:01 58.4 snapserver -res/immuno 1 3400 416 12.2 0:04 87.6 0:01 292.0 snapserver -ares/mcard 0 10 64 640.0 0:00 36.2 0:01 54.6 snapserver -ares/mysql 0 10 64 640.0 0:00 41.4 0:02 26.7 snapserver -s/neurosci 0 4421720 2413152 54.6 12:45 3155.2 10:18 3905.8 snapserver -es/surgery 1 10 64 640.0 0:00 39.4 0:07 8.6 snapserver -s/sysadmin 0 2010 2048 101.9 0:00 6036.8 0:03 586.2 snapserver -es/urology 1 160880 113472 70.5 0:19 5824.1 0:50 2275.4 wagstaff / 1 95 64 67.4 0:04 2.9 0:01 56.7 wagstaff /files1 1 824351 564832 68.5 8:59 1048.6 3:52 2436.2 wagstaff /files3 MISSING ------------------------------------------------- wagstaff /files4 MISSING ------------------------------------------------- wagstaff /files5 MISSING ------------------------------------------------- wagstaff -/Voiceware MISSING ------------------------------------------------- wagstaff -s6/vol/bvd 1 10 64 640.0 0:00 4.5 0:01 56.7 wagstaff -ol/spdata1 1 1890 416 22.0 0:09 42.3 0:01 359.9 wagstaff -ol/spdata2 1 480 160 33.3 0:02 50.8 0:04 36.5 wagstaff -ol/speech7 1 40 64 160.0 0:00 14.4 0:01 57.9 wagstaff /usr 1 31 64 206.5 0:05 0.2 0:01 55.9 wagstaff /usr/local 1 FAILED -------------------------------------------------- wizard / 1 95 64 67.4 0:04 2.9 0:02 35.1 wizard /files1 0 4576895 4086720 89.3 63:58 1064.8 14:28 4706.1 wizard /files2 MISSING ------------------------------------------------- wizard /usr 1 63 64 101.6 0:06 0.2 0:01 58.0 wizard /usr/local 1 369407 21600 5.8 2:39 135.8 2:23 151.2 wizard /var 1 118239 71840 60.8 2:38 454.1 1:22 872.8 wizard /var/log 0 110271 72032 65.3 2:10 554.3 11:28 104.7 wizard /var/mail 0 1861087 955584 51.3 18:53 843.7 4:10 3820.7

(brought to you by Amanda version 2.5.0p2)