Amanda-Users

more problems with dumper/taper/flush retries failures

2005-11-09 06:51:02
Subject: more problems with dumper/taper/flush retries failures
From: Jean-Francois Malouin <Jean-Francois.Malouin AT bic.mni.mcgill DOT ca>
To: AMANDA users <amanda-users AT amanda DOT org>
Date: Wed, 9 Nov 2005 06:26:10 -0500
Amanda-2.4.5 WITH the dumper retry patch provided by Jean-Louis
Martineau on Nov 1st. 

< I'm having a LOT of problems right now with failed retries and to me
it makes no sense that amanda should not be able to dump/tape or flush
an image of a DLE weather it's in the holding disk or still sitting
remotely on a client, knowing that client-server communication is
established, the DLE can be stuffed in a tape, data timeouts are
within limits, etc. My understanding is that amanda should try very
hard to tape a DLE but right now it drops the puck fairly easily
without much fighting. My take: if a client DLE hits EOT amanda should
at least retry once more and not simply go on to the next DLE. 

OK, sorry, problem follows, excuse my venting.../> 

I have this client DLE that refuses to be taped. It's in the holdding
this however (~80GB) and trying to flush the failed dumped I get the
email output from amanda:

The dumps were flushed to tapes stk_40-conf7-000025, stk_40-conf7-000026, 
stk_40-conf7-000017.

FAILURE AND STRANGE DUMP SUMMARY:
  yorick     /data/mril/mril5/bojana lev 0 FAILED 20051107 [too many taper 
retries]

    [...blah...]

USAGE BY TAPE:
  Label                     Time      Size      %    Nb
  stk_40-conf7-000025       0:00        0k    0.0     0
  stk_40-conf7-000026       0:00        0k    0.0     0
  stk_40-conf7-000017       0:00        0k    0.0     0

NOTES:
  taper: tape stk_40-conf7-000025 kb 81303232 fm 1 writing file: No space left 
on device
  taper: retrying yorick:/data/mril/mril5/bojana.0 on new tape: [writing file: 
No space left on device]
  taper: tape stk_40-conf7-000026 kb 81267648 fm 1 writing file: No space left 
on device
  taper: retrying yorick:/data/mril/mril5/bojana.0 on new tape: [writing file: 
No space left on device]
  taper: tape stk_40-conf7-000017 kb 0 fm 0 [OK]

DUMP SUMMARY:
                                  DUMPER STATS                  TAPER STATS  
HOSTNAME DISK     L    ORIG-kB     OUT-kB COMP% MMM:SS    KB/s MMM:SS    KB/s
------------------- ------------------------------------------ --------------
yorick   -weasel1   NO FILE TO FLUSH ----------------------------------------

    [...]

yorick   -/bojana 0 FAILED --------------------------------------------------

    [...]

yorick   -a/ficbm   NO FILE TO FLUSH ----------------------------------------

(brought to you by Amanda version 2.4.5)


The amflush log file:

amflush: start at Tue Nov  8 15:48:04 EST 2005
amflush: datestamp 20051108
FLUSH yorick /data/mril/mril5/bojana 20051107 0 
/holddisk/conf7/stk_40/20051107181508/yorick._data_mril_mril5_bojana.0
ENDFLUSH
driver: pid 2539253 executable driver version 2.4.5
driver: tape size 101761024
driver: send-cmd time 0.008 to taper: START-TAPER 20051108
driver: adding holding disk 0 dir /holddisk/conf7/stk_40 size 104857600 
chunksize 2097056
reserving 104857600 out of 104857600 for degraded-mode dumps
taper: pid 2596081 executable taper version 2.4.5
taper: page size is 16384
taper: buffer size is 32768

...[taper buffer stuff cutout]

taper: buffer structures at 410c000 for 240 bytes
changer: opening pipe to: /opt/amanda/amanda7/libexec/chg-zd-mtx -info
driver: flush size 84041852
driver: start time 0.054 inparallel 10 bandwidth 1025400 diskspace 104857600 
dir OBSOLETE datestamp 20051108 driver: drain-ends tapeq FIRST big-dumpers SSSS
changer: got exit: 0 str: 66 10 1 1
changer_query: changer return was 10 1 1
changer_query: searchable = 1
changer_find: looking for stk_40-conf7-000025 changer is searchable = 1
changer_search: stk_40-conf7-000025
changer: opening pipe to: /opt/amanda/amanda7/libexec/chg-zd-mtx -search 
stk_40-conf7-000025
changer: got exit: 0 str: 62 /hw/tape/tps17d7nrnsv
taper: slot 62: date 20051024 label stk_40-conf7-000025 (exact label match)
taper: read label `stk_40-conf7-000025' date `20051024'
taper: wrote label `stk_40-conf7-000025' date `20051108'
driver: result time 100.176 from taper: TAPER-OK 
driver: state time 100.176 free kps: 1025400 space: 104857600 taper: idle 
idle-dumpers: 10 qlen tapeq: 1 runq: 0 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 100.176 if : free 1024000 if LE0: free 400 if 
LOCAL: free 1000
driver: hdisk-state time 100.176 hdisk 0: free 104857600 dumpers 0
driver: send-cmd time 100.176 to taper: FILE-WRITE 00-00001 
/holddisk/conf7/stk_40/20051107181508/yorick._data_mril_mril5_bojana.0 yorick 
UNKNOWNFEATURE /data/mril/mril5/bojana 0 20051107
driver: startaflush: FIRST yorick /data/mril/mril5/bojana 84041852 101761024
driver: state time 100.176 free kps: 1025400 space: 104857600 taper: writing 
idle-dumpers: 10 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 100.176 if : free 1024000 if LE0: free 400 if 
LOCAL: free 1000
driver: hdisk-state time 100.176 hdisk 0: free 104857600 dumpers 0
taper: writing end marker. [stk_40-conf7-000025 ERR kb 81303232 fm 1]

    [ first failure ]

changer: opening pipe to: /opt/amanda/amanda7/libexec/chg-zd-mtx -info
changer: got exit: 0 str: 62 10 1 1
changer_query: changer return was 10 1 1
changer_query: searchable = 1
changer_find: looking for stk_40-conf7-000026 changer is searchable = 1
changer_search: stk_40-conf7-000026
changer: opening pipe to: /opt/amanda/amanda7/libexec/chg-zd-mtx -search 
stk_40-conf7-000026
changer: got exit: 0 str: 63 /hw/tape/tps17d7nrnsv
taper: slot 63: date 20051024 label stk_40-conf7-000026 (exact label match)
taper: read label `stk_40-conf7-000026' date `20051024'
taper: wrote label `stk_40-conf7-000026' date `20051108'
driver: result time 7141.322 from taper: TRY-AGAIN 00-00001 [writing file: No 
space left on device]
driver: taper-tryagain time 7141.322 disk yorick:/data/mril/mril5/bojana
driver: send-cmd time 7141.322 to taper: FILE-WRITE 00-00002 
/holddisk/conf7/stk_40/20051107181508/yorick._data_mril_mril5_bojana.0 yorick 
UNKNOWNFEATURE /data/mril/mril5/bojana 0 20051107
driver: startaflush: FIRST yorick /data/mril/mril5/bojana 84041852 101761024
driver: state time 7141.322 free kps: 1025400 space: 104857600 taper: writing 
idle-dumpers: 10 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 7141.322 if : free 1024000 if LE0: free 400 if 
LOCAL: free 1000
driver: hdisk-state time 7141.322 hdisk 0: free 104857600 dumpers 0
taper: writing end marker. [stk_40-conf7-000026 ERR kb 81267648 fm 1]

    [ second failure ]

changer: opening pipe to: /opt/amanda/amanda7/libexec/chg-zd-mtx -info
changer: got exit: 0 str: 63 10 1 1
changer_query: changer return was 10 1 1
changer_query: searchable = 1
changer_find: looking for stk_40-conf7-000017 changer is searchable = 1
changer_search: stk_40-conf7-000017
changer: opening pipe to: /opt/amanda/amanda7/libexec/chg-zd-mtx -search 
stk_40-conf7-000017
changer: got exit: 0 str: 64 /hw/tape/tps17d7nrnsv
taper: slot 64: date 20051027 label stk_40-conf7-000017 (exact label match)
taper: read label `stk_40-conf7-000017' date `20051027'
taper: wrote label `stk_40-conf7-000017' date `20051108'
driver: result time 14436.223 from taper: TRY-AGAIN 00-00002 [writing file: No 
space left on device]
driver: taper-tryagain time 14436.223 disk yorick:/data/mril/mril5/bojana
driver: taper failed 00-00002 yorick /data/mril/mril5/bojana, too many taper 
retry
driver: state time 14436.224 free kps: 1025400 space: 104857600 taper: idle 
idle-dumpers: 10 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 86400 driver-idle: 
not-idle
driver: interface-state time 14436.224 if : free 1024000 if LE0: free 400 if 
LOCAL: free 1000
driver: hdisk-state time 14436.224 hdisk 0: free 104857600 dumpers 0
driver: QUITTING time 14436.224 telling children to quit
driver: send-cmd time 14436.224 to taper: QUIT
taper: DONE [idle wait: 0.002 secs]
taper: writing end marker. [stk_40-conf7-000017 OK kb 0 fm 0]
driver: FINISHED time 14442.947

stumped.
jf

<Prev in Thread] Current Thread [Next in Thread>