On Thursday 10 May 2007, Steven Settlemyre wrote:
> From amdump log:
>
>This similar flow happens every time it fails. At first I was worried
>about the "not enough diskspace" message, but this is just because my
>holding disk filled up. I see the same thing happening in successful runs.
>
>driver: state time 8459.717 free kps: 37761 space: 55584 taper: idle
>idle-dumpers: 7 qlen tapeq: 0 runq: 18 roomq: 2 wakeup: 0 d
>river-idle: no-diskspace
>driver: interface-state time 8459.717 if : free 37761
>driver: hdisk-state time 8459.717 hdisk 0: free 55584 dumpers 1
>driver: result time 8459.727 from chunker3: RQ-MORE-DISK 03-00004
>find diskspace: not enough diskspace. Left with 508960 K
>find diskspace: not enough diskspace. Left with 71872 K
>find diskspace: not enough diskspace. Left with 17664 K
>find diskspace: not enough diskspace. Left with 508960 K
>driver: Don't know how to send ABORT command to chunker
>taper: DONE [idle wait: 7182.514 secs]
>chunker: error [bad command after RQ-MORE-DISK: "QUIT"]
>chunker: time 7311.263: error [bad command after RQ-MORE-DISK: "QUIT"]
>chunker: time 7311.263: pid 1129 finish time Thu May 10 02:06:02 2007
>chunker: error [bad command after RQ-MORE-DISK: "QUIT"]
>chunker: time 7888.517: error [bad command after RQ-MORE-DISK: "QUIT"]
>chunker: time 7888.517: pid 31879 finish time Thu May 10 02:06:02 2007
>chunker: error [bad command after RQ-MORE-DISK: "QUIT"]
>chunker: time 7888.485: error [bad command after RQ-MORE-DISK: "QUIT"]
>chunker: time 7888.485: pid 31880 finish time Thu May 10 02:06:02 2007
>taper: writing end marker. [VOL14 OK kb 2300352 fm 28]
>dumper: kill index command
>dumper: kill index command
>dumper: kill index command
>amdump: end at Thu May 10 02:06:02 EDT 2007
>Scanning /holding/amanda...
> 20070509234502: found Amanda directory.
>
Humm, I see no such diskspace related messages in last nights amdump.1 file
here.
How much holding disk do you have, and how much of it is reserved for
emergency incrementals? By default, its 100%, so if you don't have
that "reserved" set at all...
I think one should have enough holding disk that is not reserved to cover the
two largest disklist entries you have. In my case that would be around 15GB
of non-reserved space, but then I have about 80GB available here, more than
adequate till I fill it up. Again.. :-)
>Gene Heskett wrote:
>> On Wednesday 09 May 2007, Steven Settlemyre wrote:
>>> Can someone please help me?
>>>
>>> Steven Settlemyre wrote:
>>>> I haven't changed my configs for months and things were running great
>>>> until last week. Since last tues, none of my dailies have finished,
>>>> and last night a monthly failed.
>>>>
>>>> Looking through the logs I see the problem always seems to start with
>>>> "data write: Connection reset by peer" and "Don't know how to send
>>>> ABORT command to chunker". I'm having a hard time interpreting the
>>>> logs and can't seem to find too much in the archives about this. Was
>>>> wondering if someone could walk me through an explanation of the
>>>> problem and how to avoid it in the future.
>>>>
>>>> My monthlies run tape spanning on 3 40G tapes.
>>>>
>>>> Here is the email output generated:
>>>>
>>>> *** THE DUMPS DID NOT FINISH PROPERLY!
>>>>
>>>> These dumps were to tape Monthly21.
>>>> The next 3 tapes Amanda expects to use are: Monthly01, Monthly02,
>>>> Monthly03.
>>>> The next 3 new tapes already labelled are: Monthly19, Monthly20,
>>>> Monthly22.
>>>>
>>>> FAILURE AND STRANGE DUMP SUMMARY:
>>>> wagstaff /usr/local lev 1 FAILED [data
>>>> write: Connection reset by peer]
>>>> lollipop /files1 lev 0 FAILED [data
>>>> write: Connection reset by peer]
>>>> helios /files3 lev 1 FAILED [data write:
>>>> Connection reset by peer]
>>>> helios / RESULTS MISSING
>>>> helios /files2 RESULTS MISSING
>>>> helios /usr RESULTS MISSING
>>>> helios /usr/local RESULTS MISSING
>>>> helios /var RESULTS MISSING
>>>> lollipop / RESULTS MISSING
>>>> lollipop /usr RESULTS MISSING
>>>> lollipop /usr/local RESULTS MISSING
>>>> wagstaff /files3 RESULTS MISSING
>>>> wagstaff /files4 RESULTS MISSING
>>>> wagstaff /files5 RESULTS MISSING
>>>> wagstaff /files6/vol/Voiceware RESULTS MISSING
>>>> wizard /files2 RESULTS MISSING
>>>> snapserver /hd/vol_mnt0/shares/TermLab RESULTS MISSING
>>>> snapserver /hd/vol_mnt0/shares/bcl RESULTS MISSING
>>>> snapserver /hd/vol_mnt0/shares/biochem RESULTS MISSING
>>>> snapserver /hd/vol_mnt0/shares/confocal RESULTS MISSING
>>>> driver: FATAL Don't know how to send ABORT command to chunker
>>>> chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
>>>> chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
>>>> chunker: FATAL error [bad command after RQ-MORE-DISK: "QUIT"]
>>>>
>>>>
>>>> STATISTICS:
>>>> Total Full Incr.
>>>> -------- -------- --------
>>>> Estimate Time (hrs:min) 0:08
>>>> Run Time (hrs:min) 1:01
>>>> Dump Time (hrs:min) 1:55 1:40 0:16
>>>> Output Size (meg) 8519.7 7729.7 790.1
>>>> Original Size (meg) 13146.3 11595.5 1550.8
>>>> Avg Compressed Size (%) 64.8 66.7 50.9 (level:#disks
>>>> ...)
>>>> Filesystems Dumped 35 12 23 (1:23)
>>>> Avg Dump Rate (k/s) 1261.0 1323.3 863.1
>>>>
>>>> Tape Time (hrs:min) 0:53 0:44 0:09
>>>> Tape Size (meg) 8521.6 7730.3 791.3
>>>> Tape Used (%) 21.1 19.0 2.1 (level:#disks
>>>> ...)
>>>> Filesystems Taped 35 12 23 (1:23)
>>>> (level:#chunks ...)
>>>> Chunks Taped 35 12 23 (1:23)
>>>> Avg Tp Write Rate (k/s) 2724.3 3000.8 1433.6
>>>>
>>>> USAGE BY TAPE:
>>>> Label Time Size % Nb Nc
>>>> Monthly21 0:53 8726112k 21.1 35 35
>>>>
>>>>
>>>> FAILED AND STRANGE DUMP DETAILS:
>>>>
>>>> /-- wagstaff /usr/local lev 1 FAILED [data write: Connection reset by
>>>> peer]
>>>> sendbackup: start [wagstaff level 1]
>>>> sendbackup: info BACKUP=/usr/sbin/ufsdump
>>>> sendbackup: info RECOVER_CMD=/usr/local/bin/gzip -dc
>>>>
>>>> |/usr/sbin/ufsrestore -f... -
>>>>
>>>> sendbackup: info COMPRESS_SUFFIX=.gz
>>>> sendbackup: info end
>>>>
>>>> | DUMP: Writing 32 Kilobyte records
>>>> | DUMP: Date of this level 1 dump: Tue May 08 01:11:26 2007
>>>> | DUMP: Date of last level 0 dump: Mon Apr 30 23:54:14 2007
>>>> | DUMP: Dumping /dev/rdsk/c0t0d0s7 (wagstaff:/usr/local) to standard
>>>>
>>>> output.
>>>>
>>>> | DUMP: Mapping (Pass I) [regular files]
>>>> | DUMP: Mapping (Pass II) [directories]
>>>> | DUMP: Mapping (Pass II) [directories]
>>>> | DUMP: Mapping (Pass II) [directories]
>>>> | DUMP: Estimated 13585968 blocks (6633.77MB) on 0.10 tapes.
>>>> | DUMP: Dumping (Pass III) [directories]
>>>> | DUMP: Dumping (Pass IV) [regular files]
>>>> | DUMP: 16.49% done, finished in 0:50
>>>> | DUMP: 28.34% done, finished in 0:57
>>>> | DUMP: 38.89% done, finished in 1:12
>>>>
>>>> \--------
>>>>
>>>> /-- lollipop /files1 lev 0 FAILED [data write: Connection reset by
>>>> peer] sendbackup: start [lollipop level 0]
>>>> sendbackup: info BACKUP=/usr/sbin/ufsdump
>>>> sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc |/usr/sbin/ufsrestore
>>>> -f... -
>>>> sendbackup: info COMPRESS_SUFFIX=.gz
>>>> sendbackup: info end
>>>>
>>>> | DUMP: Writing 32 Kilobyte records
>>>> | DUMP: Date of this level 0 dump: Mon May 07 23:52:56 2007
>>>> | DUMP: Date of last level 0 dump: the epoch
>>>> | DUMP: Dumping /dev/rdsk/c0t2d0s2 (lollipop:/files1) to standard
>>>>
>>>> output.
>>>>
>>>> | DUMP: Mapping (Pass I) [regular files]
>>>> | DUMP: Mapping (Pass II) [directories]
>>>> | DUMP: Estimated 34371404 blocks (16782.91MB) on 0.25 tapes.
>>>> | DUMP: Dumping (Pass III) [directories]
>>>> | DUMP: Dumping (Pass IV) [regular files]
>>>> | DUMP: 6.08% done, finished in 2:34
>>>> | DUMP: 13.06% done, finished in 2:13
>>>> | DUMP: 19.93% done, finished in 2:00
>>>> | DUMP: 26.85% done, finished in 1:49
>>>> | DUMP: 33.93% done, finished in 1:37
>>>> | DUMP: 41.38% done, finished in 1:25
>>>> | DUMP: 49.81% done, finished in 1:10
>>>> | DUMP: 57.36% done, finished in 0:59
>>>> | DUMP: 60.37% done, finished in 0:59
>>>> | DUMP: 64.71% done, finished in 0:54
>>>> | DUMP: 72.54% done, finished in 0:41
>>>> | DUMP: 78.46% done, finished in 0:32
>>>> | DUMP: 84.45% done, finished in 0:23
>>>> | DUMP: 89.91% done, finished in 0:15
>>>> | DUMP: 96.51% done, finished in 0:05
>>>>
>>>> \--------
>>>>
>>>> /-- helios /files3 lev 1 FAILED [data write: Connection reset by peer]
>>>> sendbackup: start [helios level 1]
>>>> sendbackup: info BACKUP=/bin/tar
>>>> sendbackup: info RECOVER_CMD=/bin/gzip -dc |/bin/tar -f... -
>>>> sendbackup: info COMPRESS_SUFFIX=.gz
>>>> sendbackup: info end
>>>> \--------
>>>>
>>>>
>>>> NOTES:
>>>> planner: wagstaff /files6/vol/speech7 20070507 0 [dumps too big, 9328
>>>> KB, full dump delayed]
>>>> planner: oz /files1 20070507 0 [dumps too big, 11716 KB, full dump
>>>> delayed]
>>>> planner: wagstaff /files6/vol/bvd 20070507 0 [dumps too big, 12856
>>>> KB, full dump delayed]
>>>> planner: oz / 20070507 0 [dumps too big, 21760 KB, full dump delayed]
>>>> planner: wagstaff / 20070507 0 [dumps too big, 23105 KB, full dump
>>>> delayed]
>>>> planner: helios / 20070507 0 [dumps too big, 38583 KB, full dump
>>>> delayed]
>>>> planner: wizard / 20070507 0 [dumps too big, 44165 KB, full dump
>>>> delayed]
>>>> planner: snapserver /hd/vol_mnt0/shares/surgery 20070507 0 [dumps too
>>>> big, 91538 KB, full dump delayed]
>>>> planner: wagstaff /usr 20070507 0 [dumps too big, 112839 KB, full
>>>> dump delayed]
>>>> planner: lollipop /usr 20070507 0 [dumps too big, 131680 KB, full
>>>> dump delayed]
>>>> planner: wizard /usr 20070507 0 [dumps too big, 162844 KB, full dump
>>>> delayed]
>>>> planner: martin /var 20070507 0 [dumps too big, 178985 KB, full dump
>>>> delayed]
>>>> planner: wizard /var 20070507 0 [dumps too big, 210323 KB, full dump
>>>> delayed]
>>>> planner: professor /usr/local 20070507 0 [dumps too big, 219910 KB,
>>>> full dump delayed]
>>>> planner: snapserver /hd/vol_mnt0/shares/NeuroGen 20070507 0 [dumps
>>>> too big, 247911 KB, full dump delayed]
>>>> planner: wagstaff /files6/vol/spdata1 20070507 0 [dumps too big,
>>>> 358932 KB, full dump delayed]
>>>> planner: lollipop /usr/local 20070507 0 [dumps too big, 436518 KB,
>>>> full dump delayed]
>>>> planner: wizard /usr/local 20070507 0 [dumps too big, 458028 KB, full
>>>> dump delayed]
>>>> planner: snapserver /hd/vol_mnt0/shares/immuno 20070507 0 [dumps too
>>>> big, 494376 KB, full dump delayed]
>>>> planner: oz /usr 20070507 0 [dumps too big, 520179 KB, full dump
>>>> delayed]
>>>> planner: wagstaff /files6/vol/spdata2 20070507 0 [dumps too big,
>>>> 522063 KB, full dump delayed]
>>>> planner: snapserver /hd/vol_mnt0/shares/MID 20070507 0 [dumps too
>>>> big, 522216 KB, full dump delayed]
>>>> planner: professor /usr 20070507 0 [dumps too big, 565153 KB, full
>>>> dump delayed]
>>>> planner: helios /usr 20070507 0 [dumps too big, 639655 KB, full dump
>>>> delayed]
>>>> planner: helios /var 20070507 0 [dumps too big, 1609562 KB, full dump
>>>> delayed]
>>>> planner: helios /usr/local 20070507 0 [dumps too big, 1930526 KB,
>>>> full dump delayed]
>>>> planner: helios /files2 20070507 0 [dumps too big, 3326170 KB, full
>>>> dump delayed]
>>>> planner: snapserver /hd/vol_mnt0/shares/BioInf 20070507 0 [dumps too
>>>> big, 4663399 KB, full dump delayed]
>>>> planner: wagstaff /usr/local 20070507 0 [dumps too big, 5087313 KB,
>>>> full dump delayed]
>>>> planner: snapserver /hd/vol_mnt0/shares/urology 20070507 0 [dumps too
>>>> big, 5786512 KB, full dump delayed]
>>>> planner: helios /files3 20070507 0 [dumps too big, 8327643 KB, full
>>>> dump delayed]
>>>> planner: wagstaff /files1 20070507 0 [dumps too big, 11203120 KB,
>>>> full dump delayed]
>>>> taper: tape Monthly21 kb 8726112 fm 35 [OK]
>>>>
>>>>
>>>> DUMP SUMMARY:
>>>> DUMPER STATS
>>>> TAPER STATS
>>>> HOSTNAME DISK L ORIG-kB OUT-kB COMP% MMM:SS KB/s
>>>> MMM:SS KB/s
>>>> -------------------------- -------------------------------------------
>>>> -------------
>>>> helios / MISSING
>>>> -------------------------------------------------
>>>> helios /files2 MISSING
>>>> -------------------------------------------------
>>>> helios /files3 1 FAILED
>>>> --------------------------------------------------
>>>> helios /usr MISSING
>>>> -------------------------------------------------
>>>> helios /usr/local MISSING
>>>> -------------------------------------------------
>>>> helios /var MISSING
>>>> -------------------------------------------------
>>>> lollipop / MISSING
>>>> -------------------------------------------------
>>>> lollipop /files1 0 FAILED
>>>> --------------------------------------------------
>>>> lollipop /usr MISSING
>>>> -------------------------------------------------
>>>> lollipop /usr/local MISSING
>>>> -------------------------------------------------
>>>> martin /var 1 33930 15808 46.6 0:03 5776.6
>>>> 0:04 4107.7
>>>> oz / 1 290 96 33.1 0:00 404.7 0:01
>>>> 83.3
>>>> oz /files1 1 3640 1344 36.9 0:00 4641.9 0:04
>>>> 304.1
>>>> oz /usr 1 60180 18432 30.6 0:05 3425.3 0:10
>>>> 1822.2
>>>> oz /var 0 796760 339968 42.7 1:34 3632.0 3:12
>>>> 1771.7
>>>> professor.as / 0 104980 46016 43.8 0:22
>>>> 2099.4 0:09 4941.8
>>>> professor.as /usr 1 6590 640 9.7 0:01
>>>> 449.2 0:01 534.7
>>>> professor.as /usr/local 1 1640 192 11.7 0:00
>>>> 422.0 0:09 21.8
>>>> snapserver -res/BioInf 1 2380 416 17.5 0:13 27.5
>>>> 0:05 90.0
>>>> snapserver -shares/MID 1 310 96 31.0 0:04 11.8
>>>> 0:01 83.7
>>>> snapserver -es/MolGene 0 10 64 640.0 0:00 32.8
>>>> 0:03 23.2
>>>> snapserver -s/NeuroGen 1 80 64 80.0 0:01 10.5
>>>> 0:01 56.0
>>>> snapserver -es/TermLab MISSING
>>>> -------------------------------------------------
>>>> snapserver -ares/admin 0 10 64 640.0 0:00 38.7
>>>> 0:01 45.1
>>>> snapserver -shares/bcl MISSING
>>>> -------------------------------------------------
>>>> snapserver -es/biochem MISSING
>>>> -------------------------------------------------
>>>> snapserver -s/confocal MISSING
>>>> -------------------------------------------------
>>>> snapserver -ares/histo 0 10 64 640.0 0:00 31.3
>>>> 0:01 58.4
>>>> snapserver -res/immuno 1 3400 416 12.2 0:04 87.6
>>>> 0:01 292.0
>>>> snapserver -ares/mcard 0 10 64 640.0 0:00 36.2
>>>> 0:01 54.6
>>>> snapserver -ares/mysql 0 10 64 640.0 0:00 41.4
>>>> 0:02 26.7
>>>> snapserver -s/neurosci 0 4421720 2413152 54.6 12:45 3155.2
>>>> 10:18 3905.8
>>>> snapserver -es/surgery 1 10 64 640.0 0:00 39.4
>>>> 0:07 8.6
>>>> snapserver -s/sysadmin 0 2010 2048 101.9 0:00 6036.8
>>>> 0:03 586.2
>>>> snapserver -es/urology 1 160880 113472 70.5 0:19 5824.1
>>>> 0:50 2275.4
>>>> wagstaff / 1 95 64 67.4 0:04 2.9
>>>> 0:01 56.7
>>>> wagstaff /files1 1 824351 564832 68.5 8:59 1048.6
>>>> 3:52 2436.2
>>>> wagstaff /files3 MISSING
>>>> -------------------------------------------------
>>>> wagstaff /files4 MISSING
>>>> -------------------------------------------------
>>>> wagstaff /files5 MISSING
>>>> -------------------------------------------------
>>>> wagstaff -/Voiceware MISSING
>>>> -------------------------------------------------
>>>> wagstaff -s6/vol/bvd 1 10 64 640.0 0:00 4.5
>>>> 0:01 56.7
>>>> wagstaff -ol/spdata1 1 1890 416 22.0 0:09 42.3
>>>> 0:01 359.9
>>>> wagstaff -ol/spdata2 1 480 160 33.3 0:02 50.8
>>>> 0:04 36.5
>>>> wagstaff -ol/speech7 1 40 64 160.0 0:00 14.4
>>>> 0:01 57.9
>>>> wagstaff /usr 1 31 64 206.5 0:05 0.2
>>>> 0:01 55.9
>>>> wagstaff /usr/local 1 FAILED
>>>> --------------------------------------------------
>>>> wizard / 1 95 64 67.4 0:04 2.9
>>>> 0:02 35.1
>>>> wizard /files1 0 4576895 4086720 89.3 63:58 1064.8
>>>> 14:28 4706.1
>>>> wizard /files2 MISSING
>>>> -------------------------------------------------
>>>> wizard /usr 1 63 64 101.6 0:06 0.2
>>>> 0:01 58.0
>>>> wizard /usr/local 1 369407 21600 5.8 2:39 135.8
>>>> 2:23 151.2
>>>> wizard /var 1 118239 71840 60.8 2:38 454.1
>>>> 1:22 872.8
>>>> wizard /var/log 0 110271 72032 65.3 2:10 554.3
>>>> 11:28 104.7
>>>> wizard /var/mail 0 1861087 955584 51.3 18:53 843.7
>>>> 4:10 3820.7
>>>>
>>>> (brought to you by Amanda version 2.5.0p2)
>>
>> This version is getting a wee bit long in the tooth but I think I'd look
>> in the logs for timeout messages.. One of the reasons for a lack of reply
>> might be that no one is ATM, running a duplicate of your config as deduced
>> from the limited info posted. More details might help.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
The number of computer scientists in a room is inversely proportional
to the number of bugs in their code.
|