Amanda-Users

enough free holdingdisk, but chunker0: RQ-MORE-DISK fails

2006-05-08 03:27:14
Subject: enough free holdingdisk, but chunker0: RQ-MORE-DISK fails
From: Sven Rudolph <Sven_Rudolph AT drewag DOT de>
To: amanda-users AT amanda DOT org
Date: Mon, 08 May 2006 09:22:45 +0200
(Reposting.)

The follwing is with Amanda 2.5.0 (Debian package 1:2.5.0-2); Amanda
has no free tape in access (degraded mode), so the dump should go to
holdingdisk and stay there.

When I start the dump, amstatus shows:

  Using /var/log/amanda/WeeklySet1/amdump from Thu May  4 17:46:01 CEST 2006

  unixserver:/data/backup 0   192436m dumping    39690m ( 20.63%) (17:53:51)
  
  SUMMARY          part      real  estimated
                             size       size
  partition       :   1
  estimated       :   1               192436m
  flush           :   0         0m
  failed          :   0                    0m           (  0.00%)
  wait for dumping:   0                    0m           (  0.00%)
  dumping to tape :   0                    0m           (  0.00%)
  dumping         :   1     39690m    192436m ( 20.63%) ( 20.63%)
  dumped          :   0         0m         0m (  0.00%) (  0.00%)
  wait for writing:   0         0m         0m (  0.00%) (  0.00%)
  wait to flush   :   0         0m         0m (100.00%) (  0.00%)
  writing to tape :   0         0m         0m (  0.00%) (  0.00%)
  failed to tape  :   0         0m         0m (  0.00%) (  0.00%)
  taped           :   0         0m         0m (  0.00%) (  0.00%)
  12 dumpers idle : not-idle
  taper idle
  network free kps:     44000
  holding space   :   1841161m (100.00%)
   dumper0 busy   :  0:00:00  (  0.00%)
   0 dumpers busy :  0:00:00  (  0.00%)
   1 dumper busy  :  0:00:00  (  0.00%)

A minor surpise: It says that 100% holding space is still available
even though a dump to holdingdisk already runs.

Some hours later the dump fails. amstatus shows:

  Using /var/log/amanda/WeeklySet1/amdump.1 from Thu May  4 17:46:01 CEST 2006

  unixserver:/data/backup 0   192436m dumping   202058m (105.00%) (17:53:51) 
(waiting for holding disk space)

  SUMMARY          part      real  estimated
                           size       size
  partition       :   1
  estimated       :   1               192436m
  flush           :   0         0m
  failed          :   0                    0m           (  0.00%)
  wait for dumping:   0                    0m           (  0.00%)
  dumping to tape :   0                    0m           (  0.00%)
  dumping         :   1    202058m    192436m (105.00%) (105.00%)
  dumped          :   0         0m         0m (  0.00%) (  0.00%)
  wait for writing:   0         0m         0m (  0.00%) (  0.00%)
  wait to flush   :   0         0m         0m (100.00%) (  0.00%)
  writing to tape :   0         0m         0m (  0.00%) (  0.00%)
  failed to tape  :   0         0m         0m (  0.00%) (  0.00%)
  taped           :   0         0m         0m (  0.00%) (  0.00%)
  11 dumpers idle : no-dumpers
  taper idle
  network free kps:     35858
  holding space   :   1639097m ( 89.03%)
   dumper0 busy   :  8:30:45  ( 98.49%)
   0 dumpers busy :  8:07:52  ( 94.07%)            not-idle:  8:07:52  (100.00%)
   1 dumper busy  :  0:22:52  (  4.41%)          no-dumpers:  0:22:52  (100.00%)

Obviously there is still enough holdingdisk space.


So I looked into /var/log/amanda/WeeklySet1/amdump.1 (I quote some
parts here; the complete file is appended below.):

  driver: result time 31115.937 from chunker0: RQ-MORE-DISK 00-00001
  find diskspace: not enough diskspace. Left with 2412180312 K
  find diskspace: not enough diskspace. Left with 2412180312 K
  driver: Don't know how to send ABORT command to chunker
  chunker: error [bad command after RQ-MORE-DISK: "QUIT"]
  chunker: time 30645.022: error [bad command after RQ-MORE-DISK: "QUIT"]
  chunker: time 30645.022: pid 7453 finish time Fri May  5 02:24:37 2006
  taper: DONE [idle wait: 30645.029 secs]

I have no idea how the number 2412180312 K was calculated. It is more
than the size of all holdingdisks (see amstatus above, this number
seems to be correct).

Right now the workaround seems to be to split the DLE, my
second-largest DLE is below 180 GB and didn't cause such trouble.

Another observation: In this case the estimate is too low; above QUIT
occured after some successful RQ-MORE-DISK. So another workaround
might be to avoid RQ-MORE-DISK by setting a larger comprate.

        Sven


The amdump.1 file:

amdump: start at Thu May  4 17:46:01 CEST 2006
amdump: datestamp 20060504
planner: pid 4960 executable /usr/lib/amanda/planner version 2.5.0
planner: build: VERSION="Amanda-2.5.0"
planner:        BUILT_DATE="Mon Apr 3 05:06:49 PDT 2006"
planner:        BUILT_MACH="Linux rover 2.6.15-15-686 #1 SMP PREEMPT Mon Feb 6 
18:34:45 UTC 2006 i686 GNU/Linux"
driver: pid 4961 executable /usr/lib/amanda/driver version 2.5.0
driver: tape size 311296000
planner:        CC="gcc"
planner:        CONFIGURE_COMMAND="'./configure' '--prefix=/usr' 
'--bindir=/usr/sbin' '--mandir=/usr/share/man' '--libexecdir=/usr/lib/amanda' 
'--enable-shared' '--sysconfdir=/etc' '--localstatedir=/var/lib' 
'--with-gnutar-listdir=/var/lib/amanda/gnutar-lists' 
'--with-index-server=localhost' '--with-user=backup' '--with-group=backup' 
'--with-bsd-security' '--with-amandahosts' 
'--with-smbclient=/usr/bin/smbclient' '--with-debugging=/var/log/amanda' 
'--with-dumperdir=/usr/lib/amanda/dumper.d' '--with-tcpportrange=50000,50100' 
'--with-udpportrange=840,860' '--with-maxtapeblocksize=256' '--no-create' 
'--no-recursion'"
planner: paths: bindir="/usr/sbin" sbindir="/usr/sbin"
planner:        libexecdir="/usr/lib/amanda" mandir="/usr/share/man"
planner:        AMANDA_TMPDIR="/tmp/amanda"
planner:        AMANDA_DBGDIR="/var/log/amanda" CONFIG_DIR="/etc/amanda"
planner:        DEV_PREFIX="/dev/" RDEV_PREFIX="/dev/" DUMP="/sbin/dump"
planner:        RESTORE="/sbin/restore" VDUMP=UNDEF VRESTORE=UNDEF
planner:        XFSDUMP="/sbin/xfsdump" XFSRESTORE="/sbin/xfsrestore"
planner:        VXDUMP=UNDEF VXRESTORE=UNDEF
planner:        SAMBA_CLIENT="/usr/bin/smbclient" GNUTAR="/bin/tar"
planner:        COMPRESS_PATH="/bin/gzip" UNCOMPRESS_PATH="/bin/gzip"
planner:        LPRCMD="/usr/bin/lpr" MAILER="/usr/bin/Mail"
planner:        listed_incr_dir="/var/lib/amanda/gnutar-lists"
planner: defs:  DEFAULT_SERVER="localhost" DEFAULT_CONFIG="DailySet1"
planner:        DEFAULT_TAPE_SERVER="localhost"
planner:        DEFAULT_TAPE_DEVICE="null:" HAVE_MMAP HAVE_SYSVSHM
planner:        LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE
planner:        AMANDA_DEBUG_DAYS=4 BSD_SECURITY RSH_SECURITY USE_AMANDAHOSTS
planner:        CLIENT_LOGIN="backup" FORCE_USERID HAVE_GZIP
planner:        COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
planner:        COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
driver: send-cmd time 0.002 to taper: START-TAPER 20060504
READING CONF FILES...
taper: pid 4962 executable taper version 2.5.0
driver: adding holding disk 0 dir /var/spool/amanda/9/9 size 550800356 
chunksize 1048576
driver: adding holding disk 1 dir /var/spool/amanda/8/8 size 138585904 
chunksize 1048576
taper: page size is 4096
taper: buffer size is 32768
driver: adding holding disk 2 dir /var/spool/amanda/6/6 size 65610248 chunksize 
1048576
taper: buffer[00] at 0xb7c88000
taper: buffer[01] at 0xb7c90000
taper: buffer[02] at 0xb7c98000
taper: buffer[03] at 0xb7ca0000
taper: buffer[04] at 0xb7ca8000
taper: buffer[05] at 0xb7cb0000
taper: buffer[06] at 0xb7cb8000
taper: buffer[07] at 0xb7cc0000
taper: buffer[08] at 0xb7cc8000
taper: buffer[09] at 0xb7cd0000
taper: buffer[10] at 0xb7cd8000
taper: buffer[11] at 0xb7ce0000
taper: buffer[12] at 0xb7ce8000
taper: buffer[13] at 0xb7cf0000
taper: buffer[14] at 0xb7cf8000
taper: buffer[15] at 0xb7d00000
taper: buffer[16] at 0xb7d08000
taper: buffer[17] at 0xb7d10000
taper: buffer[18] at 0xb7d18000
taper: buffer[19] at 0xb7d20000
taper: buffer structures at 0xb7d28000 for 240 bytes
driver: adding holding disk 3 dir /var/spool/amanda/5/5 size 564966916 
chunksize 1048576
driver: adding holding disk 4 dir /var/spool/amanda/4/4 size 138500524 
chunksize 1048576
driver: adding holding disk 5 dir /var/spool/amanda/3/3 size 289936068 
chunksize 1048576
planner: time 0.012: startup took 0.012 secs

SENDING FLUSHES...
ENDFLUSH

SETTING UP FOR ESTIMATES...
planner: time 0.012: setting up estimates for unixserver:/data/backup
driver: adding holding disk 6 dir /var/spool/amanda/2/2 size 125241712 
chunksize 1048576
driver: adding holding disk 7 dir /var/spool/amanda/1/1 size 11708072 chunksize 
1048576
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -info
setup_estimate: unixserver:/data/backup: command 0, options: none    last_level 
0 next_level0 0 level_days 0    getting estimates 0 (-2) -1 (-2) -1 (-2)
planner: time 0.022: setting up estimates took 0.009 secs

GETTING ESTIMATES...
planner: time 0.034: bind_portrange2: trying port=842
planner: time 0.034: dgram_bind: socket bound to 0.0.0.0.842
reserving 18853498 out of 1885349800 for degraded-mode dumps
driver: started dumper0 pid 4970
dumper: pid 4970 executable dumper0 version 2.5.0
driver: started dumper1 pid 4971
dumper: pid 4971 executable dumper1 version 2.5.0
driver: started dumper2 pid 4975
driver: started dumper3 pid 4976
driver: started dumper4 pid 4977
dumper: pid 4975 executable dumper2 version 2.5.0
dumper: pid 4976 executable dumper3 version 2.5.0
dumper: pid 4978 executable dumper5 version 2.5.0
dumper: pid 4977 executable dumper4 version 2.5.0
driver: started dumper5 pid 4978
driver: started dumper6 pid 4979
driver: started dumper7 pid 4980
driver: started dumper8 pid 4981
driver: started dumper9 pid 4982
driver: started dumper10 pid 4983
driver: started dumper11 pid 4984
driver: start time 0.072 inparallel 12 bandwidth 44000 diskspace 1885349800 dir 
OBSOLETE datestamp 20060504 driver: drain-ends tapeq LARGESTFIT big-dumpers 
BTTTSTSTSTS
dumper: pid 4979 executable dumper6 version 2.5.0
dumper: pid 4980 executable dumper7 version 2.5.0
dumper: pid 4982 executable dumper9 version 2.5.0
dumper: pid 4984 executable dumper11 version 2.5.0
dumper: pid 4983 executable dumper10 version 2.5.0
dumper: pid 4981 executable dumper8 version 2.5.0
changer: got exit: 0 str: 4 9 1 1
changer_query: changer return was 9 1 1
changer_query: searchable = 1
changer_find: looking for WEEKLY003 changer is searchable = 1
changer_search: WEEKLY003
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -search WEEKLY003
planner: time 0.528: got partial result for host unixserver disk /data/backup: 
0 -> -2K, -1 -> -2K, -1 -> -2K
changer: got exit: 2 str: <none> Line 221 malformed in 
/etc/amanda/changer/chg-zd-mtx-barcodes: Set1-1-10  
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot current
changer: got exit: 0 str: 4 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
changer: got exit: 0 str: 5 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
planner: time 92.411: got partial result for host unixserver disk /data/backup: 
0 -> 232420180K, -1 -> -2K, -1 -> -2K
planner: time 92.412: got result for host unixserver disk /data/backup: 0 -> 
232420180K, -1 -> -2K, -1 -> -2K
planner: time 92.412: getting estimates took 92.389 secs
FAILED QUEUE: empty
DONE QUEUE:
  0: unixserver      /data/backup

ANALYZING ESTIMATES...
pondering unixserver:/data/backup... next_level0 0 last_level 0 (due for level 
0) (picking inclevel for degraded mode)   picklev: last night 0, so tonight 
level 1
(no inc estimate)
  curr level 0 size 197055206 total size 197055302 total_lev0 197055206 
balanced-lev0size 39411041
INITIAL SCHEDULE (size 197055302):
  unixserver /data/backup pri 1 lev 0 size 197055206

DELAYING DUMPS IF NEEDED, total_size 197055302, tape length 3735552000 mark 0
  delay: Total size now 197055302.

PROMOTING DUMPS IF NEEDED, total_lev0 197055206, balanced_size 39411041...
planner: time 92.412: analysis took 0.000 secs

GENERATING SCHEDULE:
--------
DUMP unixserver fffffeff9ffe7f0000 /data/backup 20060504 1 0 1970:1:1:0:0:0 
197055206 24200
--------
changer: got exit: 0 str: 6 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
changer: got exit: 0 str: 7 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
changer: got exit: 0 str: 8 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
changer: got exit: 0 str: 9 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
changer: got exit: 0 str: 1 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
changer: got exit: 0 str: 2 /dev/overland-sdlt600-tape
changer: opening pipe to: /usr/lib/amanda/chg-zd-mtx -slot next
changer: got exit: 0 str: 3 /dev/overland-sdlt600-tape
slot 4: read label `Set1-Apr-04', date `20060430'
slot 5: read label `Set1-Apr-05', date `20060501'
slot 6: read label `Set1-Apr-06', date `20060501'
slot 7: read label `Set1-Apr-07', date `20060501'
slot 8: read label `Set1-Apr-08', date `20060501'
slot 9: read label `Set1-Apr-09', date `20060501'
slot 1: read label `Set1-Apr-01', date `20060430'
slot 2: read label `Set1-Apr-02', date `20060430'
slot 3: read label `Set1-Apr-03', date `20060430'

driver: result time 470.926 from taper: TAPE-ERROR [slot 4: read label 
`Set1-Apr-04', date `20060430'
slot 5: read label `Set1-Apr-05', date `20060501'
slot 6: read label `Set1-Apr-06', date `20060501'
slot 7: read label `Set1-Apr-07', date `20060501'
slot 8: read label `Set1-Apr-08', date `20060501'
slot 9: read label `Set1-Apr-09', date `20060501'
slot 1: read label `Set1-Apr-01', date `20060430'
slot 2: read label `Set1-Apr-02', date `20060430'
slot 3: read label `Set1-Apr-03', date `20060430'
]
driver: state time 470.926 free kps: 44000 space: 1885349800 taper: idle 
idle-dumpers: 12 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: not-idle
driver: interface-state time 470.926 if : free 44000
driver: hdisk-state time 470.926 hdisk 0: free 550800356 dumpers 0 hdisk 1: 
free 138585904 dumpers 0 hdisk 2: free 65610248 dumpers 0 hdisk 3: free 
564966916 dumpers 0 hdisk 4: free 138500524 dumpers 0 hdisk 5: free 289936068 
dumpers 0 hdisk 6: free 125241712 dumpers 0 hdisk 7: free 11708072 dumpers 0
driver: flush size 0
dump of driver schedule before start degraded mode:
--------
  unixserver                /data/backup              lv 0 t 24200 s 197055238 
p 1
--------
dump of driver schedule after start degraded mode:
--------
  unixserver                /data/backup              lv 0 t 24200 s 197055238 
p 1
--------
driver: started chunker0 pid 7453
driver: send-cmd time 470.927 to chunker0: PORT-WRITE 00-00001 
/var/spool/amanda/5/5/20060504174601/unixserver._data_backup.0 unixserver 
fffffeff9ffe7f0000 /data/backup 0 1970:1:1:0:0:0 1048576 GNUTAR 197061280 
|;auth=BSD;compress-fast;no-record;index;
chunker: pid 7453 executable chunker0 version 2.5.0
chunker: try_socksize: receive buffer size is 65536
chunker: bind_portrange2: trying port=50072
chunker: stream_server: waiting for connection: 0.0.0.0.50072
driver: result time 470.932 from chunker0: PORT 50072
driver: send-cmd time 470.932 to dumper0: PORT-DUMP 00-00001 50072 unixserver 
fffffeff9ffe7f0000 /data/backup NODEVICE 0 1970:1:1:0:0:0 GNUTAR 
|;auth=BSD;compress-fast;no-record;index;
dumper: bind_portrange2: trying port=50013
dumper: stream_client: connected to 127.0.0.1.50072
dumper: stream_client: our side is 0.0.0.0.50013
chunker: stream_accept: connection from 127.0.0.1.50013
dumper: try_socksize: send buffer size is 65536
chunker: try_socksize: receive buffer size is 32768
dumper: bind_portrange2: trying port=840
dumper: dgram_bind: socket bound to 0.0.0.0.840
dumper: bind_portrange2: trying port=50013
dumper: bind_portrange2: trying port=50014
dumper: stream_client: connected to 172.16.10.206.50033
dumper: stream_client: our side is 0.0.0.0.50014
dumper: try_socksize: send buffer size is 65536
dumper: try_socksize: receive buffer size is 65536
dumper: bind_portrange2: trying port=50013
dumper: bind_portrange2: trying port=50014
dumper: bind_portrange2: trying port=50015
dumper: stream_client: connected to 172.16.10.206.50034
dumper: stream_client: our side is 0.0.0.0.50015
dumper: try_socksize: send buffer size is 65536
dumper: try_socksize: receive buffer size is 65536
dumper: bind_portrange2: trying port=50013
dumper: bind_portrange2: trying port=50014
dumper: bind_portrange2: trying port=50015
dumper: bind_portrange2: trying port=50016
dumper: stream_client: connected to 172.16.10.206.50035
dumper: stream_client: our side is 0.0.0.0.50016
dumper: try_socksize: send buffer size is 65536
dumper: try_socksize: receive buffer size is 65536
driver: state time 29742.946 free kps: 35858 space: 1688288520 taper: DOWN 
idle-dumpers: 11 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: 
no-dumpers
driver: interface-state time 29742.946 if : free 35858
driver: hdisk-state time 29742.946 hdisk 0: free 550800356 dumpers 0 hdisk 1: 
free 138585904 dumpers 0 hdisk 2: free 65610248 dumpers 0 hdisk 3: free 
367905636 dumpers 1 hdisk 4: free 138500524 dumpers 0 hdisk 5: free 289936068 
dumpers 0 hdisk 6: free 125241712 dumpers 0 hdisk 7: free 11708072 dumpers 0
driver: result time 29742.963 from chunker0: RQ-MORE-DISK 00-00001
driver: send-cmd time 29742.963 to chunker0: CONTINUE 00-00001 
/var/spool/amanda/5/5/20060504174601/unixserver._data_backup.0 1048576 9853088
driver: state time 31115.921 free kps: 35858 space: 1678435432 taper: DOWN 
idle-dumpers: 11 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: 
no-dumpers
driver: interface-state time 31115.921 if : free 35858
driver: hdisk-state time 31115.921 hdisk 0: free 550800356 dumpers 0 hdisk 1: 
free 138585904 dumpers 0 hdisk 2: free 65610248 dumpers 0 hdisk 3: free 
358052548 dumpers 1 hdisk 4: free 138500524 dumpers 0 hdisk 5: free 289936068 
dumpers 0 hdisk 6: free 125241712 dumpers 0 hdisk 7: free 11708072 dumpers 0
driver: result time 31115.937 from chunker0: RQ-MORE-DISK 00-00001
find diskspace: not enough diskspace. Left with 2412180312 K
find diskspace: not enough diskspace. Left with 2412180312 K
driver: Don't know how to send ABORT command to chunker
chunker: error [bad command after RQ-MORE-DISK: "QUIT"]
chunker: time 30645.022: error [bad command after RQ-MORE-DISK: "QUIT"]
chunker: time 30645.022: pid 7453 finish time Fri May  5 02:24:37 2006
taper: DONE [idle wait: 30645.029 secs]
dumper: kill index command
amdump: end at Fri May  5 02:24:37 CEST 2006
lp: nothing to print
amreport: printer command failed: /usr/bin/lpr
Scanning /var/spool/amanda/9/9...
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.
  tmp: skipping cruft directory, perhaps you should delete it.
Scanning /var/spool/amanda/8/8...
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.
Scanning /var/spool/amanda/6/6...
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.
Scanning /var/spool/amanda/5/5...
  20060430010625: found Amanda directory.
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.
Scanning /var/spool/amanda/4/4...
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.
Scanning /var/spool/amanda/3/3...
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.
Scanning /var/spool/amanda/2/2...
  20060430010625: found Amanda directory.
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.
Scanning /var/spool/amanda/1/1...
  20060504065747: found Amanda directory.
  20060504070720: found Amanda directory.
  20060504174601: found Amanda directory.




<Prev in Thread] Current Thread [Next in Thread>