Amanda-Users

Re: Please help... no backups since a week ago (data write: Connection reset by peer)

2007-05-10 20:15:22
Subject: Re: Please help... no backups since a week ago (data write: Connection reset by peer)
From: Gene Heskett <gene.heskett AT verizon DOT net>
To: Steven Settlemyre <settlemy AT asel.udel DOT edu>
Date: Wed, 09 May 2007 16:03:46 -0400
On Wednesday 09 May 2007, Steven Settlemyre wrote:
>Can someone please help me?
>
>Steven Settlemyre wrote:
>> I haven't changed my configs for months and things were running great
>> until last week. Since last tues, none of my dailies have finished,
>> and last night a monthly failed.
>>
>> Looking through the logs I see the problem always seems to start with
>> "data write: Connection reset by peer" and "Don't know how to send
>> ABORT command to chunker". I'm having a hard time interpreting the
>> logs and can't seem to find too much in the archives about this. Was
>> wondering if someone could walk me through an explanation of the
>> problem and how to avoid it in the future.
>>
>> My monthlies run tape spanning on 3 40G tapes.
>>
>> Here is the email output generated:
>>
>> *** THE DUMPS DID NOT FINISH PROPERLY!
>>
>> These dumps were to tape Monthly21.
>> The next 3 tapes Amanda expects to use are: Monthly01, Monthly02,
>> Monthly03.
>> The next 3 new tapes already labelled are: Monthly19, Monthly20,
>> Monthly22.
>>
>> FAILURE AND STRANGE DUMP SUMMARY:
>>  wagstaff      /usr/local                    lev 1  FAILED [data
>> write: Connection reset by peer]
>>  lollipop      /files1                       lev 0  FAILED [data
>> write: Connection reset by peer]
>>  helios      /files3                       lev 1  FAILED [data write:
>> Connection reset by peer]
>>  helios      /                             RESULTS MISSING
>>  helios      /files2                       RESULTS MISSING
>>  helios      /usr                          RESULTS MISSING
>>  helios      /usr/local                    RESULTS MISSING
>>  helios      /var                          RESULTS MISSING
>>  lollipop      /                             RESULTS MISSING
>>  lollipop      /usr                          RESULTS MISSING
>>  lollipop      /usr/local                    RESULTS MISSING
>>  wagstaff      /files3                       RESULTS MISSING
>>  wagstaff      /files4                       RESULTS MISSING
>>  wagstaff      /files5                       RESULTS MISSING
>>  wagstaff      /files6/vol/Voiceware         RESULTS MISSING
>>  wizard        /files2                       RESULTS MISSING
>>  snapserver  /hd/vol_mnt0/shares/TermLab   RESULTS MISSING
>>  snapserver  /hd/vol_mnt0/shares/bcl       RESULTS MISSING
>>  snapserver  /hd/vol_mnt0/shares/biochem   RESULTS MISSING
>>  snapserver  /hd/vol_mnt0/shares/confocal  RESULTS MISSING
>>  driver: FATAL Don't know how to send ABORT command to chunker
>>  chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
>>  chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
>>  chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
>>
>>
>> STATISTICS:
>>                          Total       Full      Incr.
>>                        --------   --------   --------
>> Estimate Time (hrs:min)    0:08
>> Run Time (hrs:min)         1:01
>> Dump Time (hrs:min)        1:55       1:40       0:16
>> Output Size (meg)        8519.7     7729.7      790.1
>> Original Size (meg)     13146.3    11595.5     1550.8
>> Avg Compressed Size (%)    64.8       66.7       50.9   (level:#disks
>> ...)
>> Filesystems Dumped           35         12         23   (1:23)
>> Avg Dump Rate (k/s)      1261.0     1323.3      863.1
>>
>> Tape Time (hrs:min)        0:53       0:44       0:09
>> Tape Size (meg)          8521.6     7730.3      791.3
>> Tape Used (%)              21.1       19.0        2.1   (level:#disks
>> ...)
>> Filesystems Taped            35         12         23   (1:23)
>>   (level:#chunks ...)
>> Chunks Taped                 35         12         23   (1:23)
>> Avg Tp Write Rate (k/s)  2724.3     3000.8     1433.6
>>
>> USAGE BY TAPE:
>>  Label           Time      Size      %    Nb    Nc
>>  Monthly21       0:53  8726112k   21.1    35    35
>>
>>
>> FAILED AND STRANGE DUMP DETAILS:
>>
>> /--  wagstaff /usr/local lev 1 FAILED [data write: Connection reset by
>> peer]
>> sendbackup: start [wagstaff level 1]
>> sendbackup: info BACKUP=/usr/sbin/ufsdump
>> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
>>
>> |/usr/sbin/ufsrestore -f... -
>>
>> sendbackup: info COMPRESS_SUFFIX=.gz
>> sendbackup: info end
>>
>> |   DUMP: Writing 32 Kilobyte records
>> |   DUMP: Date of this level 1 dump: Tue May 08 01:11:26 2007
>> |   DUMP: Date of last level 0 dump: Mon Apr 30 23:54:14 2007
>> |   DUMP: Dumping /dev/rdsk/c0t0d0s7 (wagstaff:/usr/local) to standard
>>
>> output.
>>
>> |   DUMP: Mapping (Pass I) [regular files]
>> |   DUMP: Mapping (Pass II) [directories]
>> |   DUMP: Mapping (Pass II) [directories]
>> |   DUMP: Mapping (Pass II) [directories]
>> |   DUMP: Estimated 13585968 blocks (6633.77MB) on 0.10 tapes.
>> |   DUMP: Dumping (Pass III) [directories]
>> |   DUMP: Dumping (Pass IV) [regular files]
>> |   DUMP: 16.49% done, finished in 0:50
>> |   DUMP: 28.34% done, finished in 0:57
>> |   DUMP: 38.89% done, finished in 1:12
>>
>> \--------
>>
>> /--  lollipop /files1 lev 0 FAILED [data write: Connection reset by peer]
>> sendbackup: start [lollipop level 0]
>> sendbackup: info BACKUP=/usr/sbin/ufsdump
>> sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc |/usr/sbin/ufsrestore
>> -f... -
>> sendbackup: info COMPRESS_SUFFIX=.gz
>> sendbackup: info end
>>
>> |   DUMP: Writing 32 Kilobyte records
>> |   DUMP: Date of this level 0 dump: Mon May 07 23:52:56 2007
>> |   DUMP: Date of last level 0 dump: the epoch
>> |   DUMP: Dumping /dev/rdsk/c0t2d0s2 (lollipop:/files1) to standard
>>
>> output.
>>
>> |   DUMP: Mapping (Pass I) [regular files]
>> |   DUMP: Mapping (Pass II) [directories]
>> |   DUMP: Estimated 34371404 blocks (16782.91MB) on 0.25 tapes.
>> |   DUMP: Dumping (Pass III) [directories]
>> |   DUMP: Dumping (Pass IV) [regular files]
>> |   DUMP: 6.08% done, finished in 2:34
>> |   DUMP: 13.06% done, finished in 2:13
>> |   DUMP: 19.93% done, finished in 2:00
>> |   DUMP: 26.85% done, finished in 1:49
>> |   DUMP: 33.93% done, finished in 1:37
>> |   DUMP: 41.38% done, finished in 1:25
>> |   DUMP: 49.81% done, finished in 1:10
>> |   DUMP: 57.36% done, finished in 0:59
>> |   DUMP: 60.37% done, finished in 0:59
>> |   DUMP: 64.71% done, finished in 0:54
>> |   DUMP: 72.54% done, finished in 0:41
>> |   DUMP: 78.46% done, finished in 0:32
>> |   DUMP: 84.45% done, finished in 0:23
>> |   DUMP: 89.91% done, finished in 0:15
>> |   DUMP: 96.51% done, finished in 0:05
>>
>> \--------
>>
>> /--  helios /files3 lev 1 FAILED [data write: Connection reset by peer]
>> sendbackup: start [helios level 1]
>> sendbackup: info BACKUP=/bin/tar
>> sendbackup: info RECOVER_CMD=/bin/gzip -dc |/bin/tar -f... -
>> sendbackup: info COMPRESS_SUFFIX=.gz
>> sendbackup: info end
>> \--------
>>
>>
>> NOTES:
>>  planner: wagstaff /files6/vol/speech7 20070507 0 [dumps too big, 9328
>> KB, full dump delayed]
>>  planner: oz /files1 20070507 0 [dumps too big, 11716 KB, full dump
>> delayed]
>>  planner: wagstaff /files6/vol/bvd 20070507 0 [dumps too big, 12856
>> KB, full dump delayed]
>>  planner: oz / 20070507 0 [dumps too big, 21760 KB, full dump delayed]
>>  planner: wagstaff / 20070507 0 [dumps too big, 23105 KB, full dump
>> delayed]
>>  planner: helios / 20070507 0 [dumps too big, 38583 KB, full dump
>> delayed]
>>  planner: wizard / 20070507 0 [dumps too big, 44165 KB, full dump
>> delayed]
>>  planner: snapserver /hd/vol_mnt0/shares/surgery 20070507 0 [dumps too
>> big, 91538 KB, full dump delayed]
>>  planner: wagstaff /usr 20070507 0 [dumps too big, 112839 KB, full
>> dump delayed]
>>  planner: lollipop /usr 20070507 0 [dumps too big, 131680 KB, full
>> dump delayed]
>>  planner: wizard /usr 20070507 0 [dumps too big, 162844 KB, full dump
>> delayed]
>>  planner: martin /var 20070507 0 [dumps too big, 178985 KB, full dump
>> delayed]
>>  planner: wizard /var 20070507 0 [dumps too big, 210323 KB, full dump
>> delayed]
>>  planner: professor /usr/local 20070507 0 [dumps too big, 219910 KB,
>> full dump delayed]
>>  planner: snapserver /hd/vol_mnt0/shares/NeuroGen 20070507 0 [dumps
>> too big, 247911 KB, full dump delayed]
>>  planner: wagstaff /files6/vol/spdata1 20070507 0 [dumps too big,
>> 358932 KB, full dump delayed]
>>  planner: lollipop /usr/local 20070507 0 [dumps too big, 436518 KB,
>> full dump delayed]
>>  planner: wizard /usr/local 20070507 0 [dumps too big, 458028 KB, full
>> dump delayed]
>>  planner: snapserver /hd/vol_mnt0/shares/immuno 20070507 0 [dumps too
>> big, 494376 KB, full dump delayed]
>>  planner: oz /usr 20070507 0 [dumps too big, 520179 KB, full dump
>> delayed]
>>  planner: wagstaff /files6/vol/spdata2 20070507 0 [dumps too big,
>> 522063 KB, full dump delayed]
>>  planner: snapserver /hd/vol_mnt0/shares/MID 20070507 0 [dumps too
>> big, 522216 KB, full dump delayed]
>>  planner: professor /usr 20070507 0 [dumps too big, 565153 KB, full
>> dump delayed]
>>  planner: helios /usr 20070507 0 [dumps too big, 639655 KB, full dump
>> delayed]
>>  planner: helios /var 20070507 0 [dumps too big, 1609562 KB, full dump
>> delayed]
>>  planner: helios /usr/local 20070507 0 [dumps too big, 1930526 KB,
>> full dump delayed]
>>  planner: helios /files2 20070507 0 [dumps too big, 3326170 KB, full
>> dump delayed]
>>  planner: snapserver /hd/vol_mnt0/shares/BioInf 20070507 0 [dumps too
>> big, 4663399 KB, full dump delayed]
>>  planner: wagstaff /usr/local 20070507 0 [dumps too big, 5087313 KB,
>> full dump delayed]
>>  planner: snapserver /hd/vol_mnt0/shares/urology 20070507 0 [dumps too
>> big, 5786512 KB, full dump delayed]
>>  planner: helios /files3 20070507 0 [dumps too big, 8327643 KB, full
>> dump delayed]
>>  planner: wagstaff /files1 20070507 0 [dumps too big, 11203120 KB,
>> full dump delayed]
>>  taper: tape Monthly21 kb 8726112 fm 35 [OK]
>>
>>
>> DUMP SUMMARY:
>>                                          DUMPER STATS
>> TAPER STATS
>> HOSTNAME     DISK        L    ORIG-kB     OUT-kB  COMP%  MMM:SS   KB/s
>> MMM:SS   KB/s
>> -------------------------- -------------------------------------------
>> -------------
>> helios /             MISSING
>> -------------------------------------------------
>> helios /files2       MISSING
>> -------------------------------------------------
>> helios /files3     1 FAILED
>> --------------------------------------------------
>> helios /usr          MISSING
>> -------------------------------------------------
>> helios /usr/local    MISSING
>> -------------------------------------------------
>> helios /var          MISSING
>> -------------------------------------------------
>> lollipop /             MISSING
>> -------------------------------------------------
>> lollipop /files1     0 FAILED
>> --------------------------------------------------
>> lollipop /usr          MISSING
>> -------------------------------------------------
>> lollipop /usr/local    MISSING
>> -------------------------------------------------
>> martin /var        1      33930      15808   46.6    0:03 5776.6
>> 0:04 4107.7
>> oz /           1        290         96   33.1    0:00  404.7   0:01
>> 83.3
>> oz /files1     1       3640       1344   36.9    0:00 4641.9   0:04
>> 304.1
>> oz /usr        1      60180      18432   30.6    0:05 3425.3   0:10
>> 1822.2
>> oz /var        0     796760     339968   42.7    1:34 3632.0   3:12
>> 1771.7
>> professor.as /           0     104980      46016   43.8    0:22
>> 2099.4   0:09 4941.8
>> professor.as /usr        1       6590        640    9.7    0:01
>> 449.2   0:01  534.7
>> professor.as /usr/local  1       1640        192   11.7    0:00
>> 422.0   0:09   21.8
>> snapserver -res/BioInf 1       2380        416   17.5    0:13   27.5
>> 0:05   90.0
>> snapserver -shares/MID 1        310         96   31.0    0:04   11.8
>> 0:01   83.7
>> snapserver -es/MolGene 0         10         64  640.0    0:00   32.8
>> 0:03   23.2
>> snapserver -s/NeuroGen 1         80         64   80.0    0:01   10.5
>> 0:01   56.0
>> snapserver -es/TermLab   MISSING
>> -------------------------------------------------
>> snapserver -ares/admin 0         10         64  640.0    0:00   38.7
>> 0:01   45.1
>> snapserver -shares/bcl   MISSING
>> -------------------------------------------------
>> snapserver -es/biochem   MISSING
>> -------------------------------------------------
>> snapserver -s/confocal   MISSING
>> -------------------------------------------------
>> snapserver -ares/histo 0         10         64  640.0    0:00   31.3
>> 0:01   58.4
>> snapserver -res/immuno 1       3400        416   12.2    0:04   87.6
>> 0:01  292.0
>> snapserver -ares/mcard 0         10         64  640.0    0:00   36.2
>> 0:01   54.6
>> snapserver -ares/mysql 0         10         64  640.0    0:00   41.4
>> 0:02   26.7
>> snapserver -s/neurosci 0    4421720    2413152   54.6   12:45 3155.2
>> 10:18 3905.8
>> snapserver -es/surgery 1         10         64  640.0    0:00   39.4
>> 0:07    8.6
>> snapserver -s/sysadmin 0       2010       2048  101.9    0:00 6036.8
>> 0:03  586.2
>> snapserver -es/urology 1     160880     113472   70.5    0:19 5824.1
>> 0:50 2275.4
>> wagstaff /           1         95         64   67.4    0:04    2.9
>> 0:01   56.7
>> wagstaff /files1     1     824351     564832   68.5    8:59 1048.6
>> 3:52 2436.2
>> wagstaff /files3       MISSING
>> -------------------------------------------------
>> wagstaff /files4       MISSING
>> -------------------------------------------------
>> wagstaff /files5       MISSING
>> -------------------------------------------------
>> wagstaff -/Voiceware   MISSING
>> -------------------------------------------------
>> wagstaff -s6/vol/bvd 1         10         64  640.0    0:00    4.5
>> 0:01   56.7
>> wagstaff -ol/spdata1 1       1890        416   22.0    0:09   42.3
>> 0:01  359.9
>> wagstaff -ol/spdata2 1        480        160   33.3    0:02   50.8
>> 0:04   36.5
>> wagstaff -ol/speech7 1         40         64  160.0    0:00   14.4
>> 0:01   57.9
>> wagstaff /usr        1         31         64  206.5    0:05    0.2
>> 0:01   55.9
>> wagstaff /usr/local  1 FAILED
>> --------------------------------------------------
>> wizard /           1         95         64   67.4    0:04    2.9
>> 0:02   35.1
>> wizard /files1     0    4576895    4086720   89.3   63:58 1064.8
>> 14:28 4706.1
>> wizard /files2       MISSING
>> -------------------------------------------------
>> wizard /usr        1         63         64  101.6    0:06    0.2
>> 0:01   58.0
>> wizard /usr/local  1     369407      21600    5.8    2:39  135.8
>> 2:23  151.2
>> wizard /var        1     118239      71840   60.8    2:38  454.1
>> 1:22  872.8
>> wizard /var/log    0     110271      72032   65.3    2:10  554.3
>> 11:28  104.7
>> wizard /var/mail   0    1861087     955584   51.3   18:53  843.7
>> 4:10 3820.7
>>
>> (brought to you by Amanda version 2.5.0p2)

This version is getting a wee bit long in the tooth but I think I'd look in 
the logs for timeout messages..  One of the reasons for a lack of reply might 
be that no one is ATM, running a duplicate of your config as deduced from the 
limited info posted.  More details might help.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
You have all the characteristics of a popular politician: a horrible voice,
bad breeding, and a vulgar manner.
                -- Aristophanes