Amanda-Users

Re: Release of amanda-2.5.0b2

2006-02-13 17:03:53
Subject: Re: Release of amanda-2.5.0b2
From: Kevin Till <kevin.till AT zmanda DOT com>
To: amanda-users AT amanda DOT org
Date: Mon, 13 Feb 2006 13:58:09 -0800
Josef Wolf wrote:
On Tue, Feb 07, 2006 at 08:58:30PM +0100, Josef Wolf wrote:

Below, I tried to amfetchdump host.do.main:/m/b.  Only full dumps were
done of this DLE.  This DLE is available seven times on the tapes:

  lv  dumpdate  chunks on tape
a. 0   20060204  VOL08:7,  VOL08:8,  VOL08:9, VOL09:1, VOL09:1
b. 0   20060204  VOL10:1,  VOL10:2,  VOL10:3, VOL10:4, VOL10:5
c. 0   20060205  VOL10:10, VOL10:11, VOL01:1, VOL01:2, VOL01:3
d. 0   20060205  VOL02:7,  VOL02:8,  VOL02:9, VOL03:1, VOL03:2
e. 0   20060205  VOL04:1
f. 0   20060207  VOL04:7,  write aborted due to full tape.
g. 0   20060207  VOL05:1

Tapings a..d were done with tape_splitsize=500mb
Tapings b, d and e were done by autoflush because of bug#1425436.
All dumps are compressed.

Below is a transcript attached.  There are several problems I see here:

1. VOL05:1 (this is the newest non-broken available dump) is _not_
  considered for retrieval at all.

2. Instead, amfetchdump _tries_ to get the (broken) VOL04:7.

3. But instead of VOL04:7 it gets the (older) VOL04:1.  There seems to be
  no attempt to further search for VOL04:7

4. The order of tapes seems to be wired.  I would have expected
     VOL05 VOL02 VOL03 VOL10         (how they were sceduled)
  or VOL05 VOL04 VOL10               (last available for every dumpdate)
  or VOL05 VOL10 VOL01 VOL08 VOL09   (first available for every dumpdate)
  or some such.

5. When trying to append the second chunk to the first one, amfetchdump
  fails with "Bad file descriptor".  The resulting dump (uncompressed)
  is 527620009 bytes long.

6. Next problem is with amrecover, but it seems to be closely related with the "Bad file descriptor" problem. Unfortunately, I don't have a
  transcript for this problem, because the system crashed.  Here's the
  description:

  When I tried to retrieve the above mentioned DLE mentioned in line c
  with amrecover, the system (Athlon 1800+, 500MB RAM, 2G swap,
  suse-10.0) freezed, but vterm switching and pinging from a different
  host worked.  This reminds me of overcommitments caused by memory-hogs.

  After reboot, I noticed following file in the slot-directory
  of the vtape directory:

   -rw-------   1 amanda disk 527630347 Feb  7 07:52 info

  Notice that the length is almost the same as in 5. This file starts with
  following contents:

position 0
AMANDA: FILE 20060205 raven.wolf.local /m/b  lev 0 comp .gz program /bin/tar
To restore, position tape at start of file and run:
        dd if=<tape> bs=32k skip=1 |     /usr/bin/gzip -dc |   /bin/tar -f... -


  Notice the first line "position 0" which seems to be the original
  contents of the info file.  At position 32779 (that is,
  strlen("position 0\n")+32kb) starts a tar file which turns out to be
  the first chunk of the dump I tried to restore.

  This looks like amrecover writes the dump to the wrong file descriptor.
  The error message from amfetchdump looks as if amfetchdump has a similar
  problem.


Here is the transcript:

host:/ # amfetchdump ppc host.do.main /m/b
5 tape(s) needed for restoration
changer: got exit: 0 str: 4 10 1 1
The following tapes are needed: VOL04 VOL02 VOL10 VOL01 VOL03
Press enter when ready

Looking for tape VOL04...
changer: got exit: 0 str: 4 10 1 1
changer_query: changer return was 10 1 1
changer_query: searchable = 1
changer_find: looking for VOL04 changer is searchable = 1
changer_search: VOL04
changer: got exit: 0 str: 4 file:/m/amchanger/ppc
amfetchdump: slot 4: date 20060207 label VOL04 (exact label match)
Scanning VOL04 (slot 4)
amfetchdump:   1: restoring FILE: date 20060205 host host.do.main disk /m/b lev 
0 comp .gz program /bin/tar
amfetchdump: Search of VOL04 complete
Looking for tape VOL02...
changer: got exit: 0 str: 4 10 1 1
changer_query: changer return was 10 1 1
changer_query: searchable = 1
changer_find: looking for VOL02 changer is searchable = 1
changer_search: VOL02
changer: got exit: 0 str: 2 file:/m/amchanger/ppc
amfetchdump: slot 2: date 20060206 label VOL02 (exact label match)
Scanning VOL02 (slot 2)
amfetchdump:   7: restoring split dumpfile: date 20060205 host host.do.main 
disk /m/b part 1/5 lev 0 comp .gz program /bin/tar
amfetchdump:   8: restoring split dumpfile: date 20060205 host host.do.main 
disk /m/b part 2/5 lev 0 comp .gz program /bin/tar
amfetchdump:      appending to host.do.main._m_b.20060205.0.1
restore: write error: Bad file descriptor

gzip: stdin: unexpected end of file
host:/ #


Hello!

Are there no opinions about those problems?  I think at least points
5. and 6. are serious problems.  Opinions?

Josef,

can you make sure you have restore-src/restore.c revision 1.19 or above?
One fix went it on r1.19 which resolved one file descriptor problem.



--
Thank you!
Kevin Till

Amanda documentation: http://wiki.zmanda.com
Amanda forums:        http://forums.zmanda.com