Amanda-Users

Re: amrestore problem, headers ok but no data

2004-12-30 15:00:57
Subject: Re: amrestore problem, headers ok but no data
From: Brian Cuttler <brian AT wadsworth DOT org>
To: Brian Cuttler <brian AT wadsworth DOT org>
Date: Thu, 30 Dec 2004 14:55:37 -0500
Gene,
Stefan,

* I'm gonna remove a lot of text from the middle of this.

On Wednesday 29 December 2004 16:29, Brian Cuttler wrote:
## Amanda    2.4.4p1-20030716
## SGI/IRIX  6.5.19m
## mtx       1.3.8
## 8 Slot jukebox with SDLT 320 (Quantum)
## gnutar    1.13.25
## 
## I'm planning to dump 2-3 times per week but have been starting the
## process manually (via the # at command) and have seen dumps in only
## 50 hours, down from hundreds when I initially started the server up.
## 
## By all accounts everything is working properly, some tuning is still
## in order but there are no apparent errors. Or where not until I
##  tried to test the file restore process.
## 
## I am able to "see" the DLEs on tape but I'm unable to retried any
##  data, either from the dump or the tar partitions.
## 
## samar 144# mt -f /dev/sdlt2 rewind
## samar 145# /usr/local/sbin/amrestore /dev/sdlt2 samar /usr5/dtaylor
## amrestore:   0: skipping start of tape: date 20041227 label SAMAR23
## amrestore:   1: skipping samar._usr1.20041227.1
## amrestore:   2: skipping samar._usr5_lalor.20041227.1
## amrestore:   3: skipping samar._usr5_amanda.20041227.0
## amrestore:   4: restoring samar._usr5_dtaylor.20041227.1
## amrestore: read error: I/O error
## 
## gzip: stdin: unexpected end of file
## 
## 
## I am able to use amrestore with the "-r" option which results in
## files like this
## 
## 
## samar 77# more samar._usr5_lalor.20041227.1
## AMANDA: FILE 20041227 samar /usr5/lalor lev 1 comp .gz program
##  /usr/local/sbin/gnutar To restore, position tape at start of file
##  and run:
##         dd if=<tape##  bs=32k skip=1 | /usr/sbin/gzip -dc |
##  usr/local/sbin/gnutar -f... -
## 
## 
## where the rest of the file is empty. I should be more explicite.
## 
## samar 82# ls -las samar._usr5_lalor.20041227.1
##   64 -rw-r-----   1 root sys 32768 Dec 29 15:36
##  samar._usr5_lalor.20041227.1
## 
## samar 79# cat -evt samar._usr5_lalor.20041227.1 | head
## AMANDA: FILE 20041227 samar /usr5/lalor lev 1 comp .gz program
##  /usr/local/sbin/g nutar$
## To restore, position tape at start of file and run:$
## ^Idd if=<tape##  bs=32k skip=1 | /usr/sbin/gzip -dc |
##  usr/local/sbin/gnutar -f... -$
## ^L$
## ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
## ^@^@
##  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
## @^@^@
##  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
## @^@^@
##  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
## @^@^@
## 
## The file headers are there, I just don't know where the data has
##  gotten to. I certainly see multiple "chunksize" files in the work
##  area when I'm running amdump.
## 
## I've otherwise tested the tape drive with the native version of #
##  tar and have been able to write and recover data. Besides the
##  amrestore listing is out there.
## 
## # mt commands seem to run progressively longer as I read the tape,
##  I'm apparently writing lots of something between EOF marks, I'm
##  just hoping that its actual data.
## 
## # ls -las  /dev/sdlt2
## lrwxr-xr-x    1 root  sys  20 Nov 22 14:45 /dev/sdlt2 -## 
##  /dev/rmt/tps1d4nrnsv

> The two things I'd question are: what version of tar?  We've found 
> that tar-1.13-19 and 1.13-25.  We've had a report of 1.14 not working 
> a week or 2 back up the log.  And earlier versions than 1.13-19 were 
> mistakenly putting a number of some kind at the start of the header, 
> which wasn't compatible with manda.

Confirm, I am using GNU tar 1.13.25

samar 35# /usr/local/sbin/gnutar --version                                      
tar (GNU tar) 1.13.25

> And I'd observe that amanda generally uses a fixed length record.

I can switch to a fixed length device by removing the trailing 'v'
in the device name. It hasn't been an issue on other systems though
I admit that this SDLT on IRIX is unique at my site.

> Back when I was running real tapes, I found that a 32kilobyte record 
> was measureable faster than the default of 512 bytes.  But, if 
> everything isn't set to the same record length, the whole thing blew 
> up for me.  Is there any way you can make it run fixed length records 
> on that lashup?

Block size on tape. Where is that set, is this a function of the
underlying DD command within Amanda or is this elsewhere ?

> Generallly speaking, this record length, and the state of the hardware 
> compressor are both encoded in a pre-header on the tape, and once the 
> tape has been written badly, it must be fully rewound, the settings 
> changed, a new label written, followed by sufficient junk data 
> from /dev/zero to force the drive to flush its buffers, at that 
> point, and only at that point, will the drive rewrite that hidden 
> header on the tape with your new settings.

Compression. We are not using HW compression, I know its a no-no with
SW compression and also a 'c' would be included in the device name if
I'd selected the HW compression feature of the device.

I'd wondered what the underlying mechanism was. These tapes have all
been labeled on this device using amlabel... Perhaps not... We had a
jukebox failure (caused by user error) and its possible that this is
not the same physical tape drive. However it is the same host and same
device selection as the original.

> That said, I didn't relabel the tapes, but read out the existing label 
> to a scratch file with dd, rewound the tape, changed the settings and 
> dd'd that scratch file back to the drive followed by 100000 records 
> from /dev/zero, all with dd.

> That was about 5 minutes faster per tape and didn't mess up amanda's 
> view of her world with amlabel.  I also put those mt commands to set 
> the record size in my rc.local file so they were done at boot time, 
> and it worked IF there was a tape in the drive at the time.  There 
> must be for the mt stuff to work correctly.

> mt has the commands required to set the record size used.

I'd hate to have to that another 25 times (and whenever I replaced
volumes). How long did it take to sus out the sequence ?

## ## I am able to "see" the DLEs on tape but I'm unable to retried any
## ##  data, either from the dump or the tar partitions.
## ## 
## ## samar 144# mt -f /dev/sdlt2 rewind
## ## samar 145# /usr/local/sbin/amrestore /dev/sdlt2 samar /usr5/dtaylor
## ## amrestore:   0: skipping start of tape: date 20041227 label SAMAR23
## ## amrestore:   1: skipping samar._usr1.20041227.1
## ## amrestore:   2: skipping samar._usr5_lalor.20041227.1
## ## amrestore:   3: skipping samar._usr5_amanda.20041227.0
## ## amrestore:   4: restoring samar._usr5_dtaylor.20041227.1
## ## amrestore: read error: I/O error
## ## 
## ## gzip: stdin: unexpected end of file

> And you don't get a file with this command?
> I once had this behavior on one of my machines, just have to remember...

> You could also try to specify the blocksize with the amrestore-option
> "-b", maybe this is not detected correctly.

> Do you use DUMP or TAR for samar._usr5_dtaylor?

The /usr5 stuff is all TAR the root and /usr1 are both DUMP.

Note - I had the same result when trying to retrieve DLEs that 
where written with dump. Whatever the problem is, its common to
both, suggesting as you both noted, tape/block issues, rather
than, say, TAR issues.

## ## I am able to use amrestore with the "-r" option which results in
## ## files like this
## ## 
## ## 
## ## samar 77# more samar._usr5_lalor.20041227.1
## ## AMANDA: FILE 20041227 samar /usr5/lalor lev 1 comp .gz program
## ##  /usr/local/sbin/gnutar To restore, position tape at start of file
## ##  and run:
## ##         dd if=<tape##  bs=32k skip=1 | /usr/sbin/gzip -dc |
## ##  usr/local/sbin/gnutar -f... -
## ## 
## ## 
## ## where the rest of the file is empty.

> Tried this with one of my holdingdisk-chunks, the "rest" is not empty here.

No, not normally. Have a new test below that I will append.

## ##  I should be more explicite.
## ## 
## ## samar 82# ls -las samar._usr5_lalor.20041227.1
## ##   64 -rw-r-----   1 root sys 32768 Dec 29 15:36
## ##  samar._usr5_lalor.20041227.1

> This one is only 32768 kB in size ... a header.

Exactly.

## ## 
## ## samar 79# cat -evt samar._usr5_lalor.20041227.1 | head
## ## AMANDA: FILE 20041227 samar /usr5/lalor lev 1 comp .gz program
## ##  /usr/local/sbin/g nutar$
## ## To restore, position tape at start of file and run:$
## ## ^Idd if=<tape##  bs=32k skip=1 | /usr/sbin/gzip -dc |
## ##  usr/local/sbin/gnutar -f... -$
## ## ^L$
## ## ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
## ## ^@^@
## ##  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
## ## @^@^@
## ##  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
## ## @^@^@
## ##  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
## ## @^@^@
## ## 
## ## The file headers are there, I just don't know where the data has
## ##  gotten to. I certainly see multiple "chunksize" files in the work
## ##  area when I'm running amdump.

> Are the chunks still there? This would explain why the headers have
> made it to the tape, while the data (or parts of it) is still in the
> holdingdisk. (This is also a lev1-dump, so the question is, if there
> were any changes at all?)

The amdump report was totally normal and straight forward. The work
area was empty at the end of the run.

> What did the REPORT-mail say?

Totally normal - I will include it below.

New test, I started another amanda run last night. At this moment I have
two DLEs both partially written to work area.

Notes

1) the first 32k block of data is the instruction for unpacking as
   seen above.
2) this is the 1st chunk of DLE /usr5/joy, which Amanda set to level 0
3) DLE backup via TAR

# dd if=samar._usr5_joy.0.tmp skip=1 bs=32k | gzip -dc | /usr/local/sbin/gnutar 
-tf -
./
./batch_130/
./batch_140/
./eukaryotes/
./eukaryotes/80s/
./eukaryotes/80s/40s/
./eukaryotes/80s/60s/
./eukaryotes/80s_cell/
./eukaryotes/EF2_AlF4/
./eukaryotes/EF2_AlF4/alf4_new04/
./eukaryotes/EF2_AlF4/alf4_new04/micro/
./eukaryotes/EF2_AlF4/alf4_new04/power/
./eukaryotes/EF2_AlF4/final/
./eukaryotes/EF2_AlF4/good/
./eukaryotes/EF2_AlF4/lalor/
./eukaryotes/EF2_AlF4/lalor/alf4/

  MANY LINES REMOVED FROM THIS EMAIL

./eukaryotes/EF2_AlF4/alf4_new04/micro/raw006.tif
./eukaryotes/EF2_AlF4/alf4_new04/micro/raw007.tif
./eukaryotes/EF2_AlF4/alf4_new04/micro/raw008.tif
./eukaryotes/EF2_AlF4/alf4_new04/micro/raw009.tif
./eukaryotes/EF2_AlF4/alf4_new04/micro/raw010.tif
32767+0 records in
32767+0 records out

gzip: stdin: unexpected end of file
/usr/local/sbin/gnutar: Unexpected EOF in archive
/usr/local/sbin/gnutar: Error is not recoverable: exiting now

All in all pretty good, remember this is only the 1st chunk of
what is already 22 chunks and counting. Chunks are 1 GByte in size.

For comparison... I get the unpacking instructions here and there 
seems to be lots of non-null data. Unlike the file I pulled off of
the tape yesterday.

samar 61# cat samar._usr5_joy.0.tmp | head
AMANDA: FILE 20041229 samar /usr5/joy lev 0 comp .gz program 
/usr/local/sbin/gnutar
CONT_FILENAME=/usr5/dumps/amanda/20041229/samar._usr5_joy.0.1
To restore, position tape at start of file and run:
        dd if=<tape> bs=32k skip=1 | /usr/sbin/gzip -dc |/usr/local/sbin/gnutar 
-f... -

                     
§Û«=s¡¹ábIJ.Iª*ûÛÏøE½¨*-JtVj-»xrï8Øسìø⿾øäÿÝÝmîöÛí»»»uÿ7þ÷nÙ®Nÿ{ÿïåÅr·ìVwênýânY­Ýÿòå'·¬~÷ñWþËyrñ/ïþý/÷ñý÷ß¼þðU
mïVÛíÆÊ¡ôË?|õÏËòë×o¾ÿ·7øþ>üæÅ¿zõÃ׿~¹¬ïþ¹é¯,÷þýݯ?¾øòû·¿ÿî×?¼üÕ×o^|ùÛï[óºÿùðñÃÿ¹ry²~¹?~õ~¿úÍë¯ß¼þú/Þ¿úí¯^½ù~ýÍ«÷¿ýæáñ»·_ÔGNïÞ¾|¿\Þª÷Ç·ß½yýñ¬yú÷õïwû¯gAÿzøó¯w¾sñÕ¯Ý_¿y5þ±ÿâÕï¿ë«»DßÜÿãwò«UÊß¼ÿX<ñ«÷§lwúÇ»ñíB©þ±ë/Æ?_|ý÷ãÇ/>þð±¾
                                                               ëÎqw_ͯßý¶:^ÿýýÛ×


That is all I have at the moment - I will try a recovery from this new
amdump run as soon as it completes. Probably not before I leave today.

Output from that amdump run that wrote this specific tape.

These dumps were to tape SAMAR23.
The next 7 tapes Amanda expects to used are: a new tape, a new tape, SAMAR01,
+SAMAR02, SAMAR03, SAMAR04, SAMAR05.

FAILURE AND STRANGE DUMP SUMMARY:
  samar      /usr5/liw lev 2 STRANGE 
  samar      /usr5/leith lev 0 STRANGE
  samar      /usr5/skaur lev 0 STRANGE

STATISTICS:
                          Total       Full      Daily
                        --------   --------   --------
Estimate Time (hrs:min)    1:25
Run Time (hrs:min)        32:33
Dump Time (hrs:min)       49:29      47:20       2:08
Output Size (meg)      126436.8   125731.7      705.1
Original Size (meg)    192304.9   191245.7     1059.2
Avg Compressed Size (%)    65.7       65.7       66.6   (level:#disks ...)
Filesystems Dumped           14          7          7   (1:6 2:1)
Avg Dump Rate (k/s)       726.9      755.5       93.7   

Tape Time (hrs:min)       14:34      14:30       0:04
Tape Size (meg)        126436.8   125731.7      705.1
Tape Used (%)              79.0       78.6        0.4   (level:#disks ...)
Filesystems Taped            14          7          7   (1:6 2:1)
Avg Tp Write Rate (k/s)  2469.2     2466.8     2993.4   

USAGE BY TAPE:
  Label         Time      Size      %    Nb
  SAMAR23      14:34  126436.8   79.0    14
  
FAILED AND STRANGE DUMP DETAILS:

/-- samar      /usr5/liw lev 2 STRANGE
sendbackup: start [samar:/usr5/liw level 2]
sendbackup: info BACKUP=/usr/local/sbin/gnutar
sendbackup: info RECOVER_CMD=/usr/sbin/gzip -dc |/usr/local/sbin/gnutar -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
? gtar: ./tmrnanew_refttk/30Sfq.tmk: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/30Sttk.ttk: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/50Sfq.tmk: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/50Sttk.ttk: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/LOG.spi: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/bpr05fq_win_tmRNA.tmk: Warning: Cannot stat: No such
+file or directory
? gtar: ./tmrnanew_refttk/50Sttk.ttk: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/LOG.spi: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/bpr05fq_win_tmRNA.tmk: Warning: Cannot stat: No such
+file or directory
? gtar: ./tmrnanew_refttk/mask50fqar.ttk: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/results.spi.0: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/tmRNA.tmk: Warning: Cannot stat: No such file or
+directory
? gtar: ./tmrnanew_refttk/tmRNAfq.tmk: Warning: Cannot stat: No such file or
+directory
| Total bytes written: 884971520 (844MB, 173kB/s)
sendbackup: size 864230
sendbackup: end
\--------

/-- samar      /usr5/leith lev 0 STRANGE
sendbackup: start [samar:/usr5/leith level 0]
sendbackup: info BACKUP=/usr/local/sbin/gnutar
sendbackup: info RECOVER_CMD=/usr/sbin/gzip -dc |/usr/local/sbin/gnutar -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
? gtar: ./refinetest/final/vft01.tcr: Warning: Cannot stat: No such file or
+directory
? gtar: ./refinetest/local/refproj01_001.tcr: Warning: Cannot stat: No such file
+or directory
? gtar: ./refinetest/work/selvoea_01.tcr: Warning: Cannot stat: No such file or
+directory
? gtar: ./refinetest/work/vol_02_002.tcr: Warning: Cannot stat: No such file or
+directory
? gtar: ./refinetest/work/vol_02_002_even.tcr: Warning: Cannot stat: No such
+file or directory
? gtar: ./refinetest/work/vol_02_002_odd.tcr: Warning: Cannot stat: No such file
+or directory
? gtar: ./refinetest/work/vol_02_003.tcr: Warning: Cannot stat: No such file or
+directory

? gtar: ./refinetest/work/vol_02_003_even.tcr: Warning: Cannot stat: No such
+file or directory
? gtar: ./refinetest/work/vol_02_003_odd.tcr: Warning: Cannot stat: No such file
+or directory
| Total bytes written: 20916613120 (19GB, 663kB/s)
sendbackup: size 20426380
sendbackup: end
\--------

/-- samar      /usr5/skaur lev 0 STRANGE
sendbackup: start [samar:/usr5/skaur level 0]
sendbackup: info BACKUP=/usr/local/sbin/gnutar
sendbackup: info RECOVER_CMD=/usr/sbin/gzip -dc |/usr/local/sbin/gnutar -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
? gtar: ./stc/Refinement/local/refproj01_017.stc: Warning: Cannot stat: No such
+file or directory
| Total bytes written: 63006955520 (59GB, 1.3MB/s)
sendbackup: size 61530230
sendbackup: end
\--------

NOTES:
  planner: Incremental of samar:/usr5/hxgao bumped to level 3.
  planner: Full dump of samar:/usr5/skaur promoted from 7 days ahead.
  planner: Full dump of samar:/usr5/leith promoted from 7 days ahead.
  planner: Full dump of samar:/usr5/hxgao promoted from 1 day ahead.
  planner: Full dump of samar:/usr5/bimal promoted from 7 days ahead.
  planner: Full dump of samar:/usr5/tapu promoted from 7 days ahead.
  planner: Full dump of samar:/usr5/amanda promoted from 7 days ahead.
  planner: Full dump of samar:/ promoted from 7 days ahead.
  taper: tape SAMAR23 kb 129471872 fm 14 [OK] 
  
DUMP SUMMARY:
                                     DUMPER STATS                TAPER STATS
HOSTNAME DISK           L   ORIG-KB    OUT-KB COMP% MMM:SS   KB/s MMM:SS    KB/s
----------------------- --------------------------------------- --------------  
samar  /              0  19728372  16073751  81.5 464:36  576.6  27:30  9740.0  
samar  /usr1          1       197        26  13.2   1:08    0.4   0:01    25.8  
samar  /usr5/allengr  1    133520     19195  14.4  12:43   25.2   0:03  7672.3  
samar  /usr5/amanda   0     43640     11408  26.1   0:11 1001.8   0:02  7441.9  
samar  /usr5/bimal    0  20516220   3598138  17.5 275:14  217.9  23:43  2528.7  
samar  /usr5/dtaylor  1      8930      1437  16.1   6:36    3.6   0:01  1375.4  
samar  /usr5/hxgao    0  70691184  47347912  67.0 722:02 1092.9 356:44  2212.1  
samar  /usr5/joy      1     65180     51618  79.2  15:05   57.1   0:14  3662.2  
samar  /usr5/lalor    1      3460       539  15.6   2:26    3.7   0:01   522.2  
samar  /usr5/leith    0  20426380  14072654  68.9 513:14  457.0  19:41 11912.8  
samar  /usr5/liw      2    864230    647793  75.0  83:19  129.6   3:40  2938.5  
samar  /usr5/ninggao  1      9060      1428  15.8   7:12    3.3   0:01  1318.2  
samar  /usr5/skaur    0  61530232  45823552  74.5 761:18 1003.2 419:24  1821.0  
samar  /usr5/tapu     0   2899590   1821810  62.8 103:36  293.1  22:49  1330.7  

(brought to you by Amanda version 2.4.4p1-20030716)



                                                thank you,

                                                Brian
---
   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773