Amanda-Users

RE: 2.4.5: dumping directly to tape failure: no retry?

2005-11-01 15:05:03
Subject: RE: 2.4.5: dumping directly to tape failure: no retry?
From: "Scott R. Burns" <Scott.Burns AT Netcontech DOT Com>
To: "Jean-Francois Malouin" <Jean-Francois.Malouin AT bic.mni.mcgill DOT ca>, "AMANDA users" <amanda-users AT amanda DOT org>
Date: Tue, 1 Nov 2005 14:55:28 -0500
I observed this same issue with:

2.4.4p4
NetBSD/i386
HP DAT24*6

My last DLE was > than the remaining tape, but would fit entirely on the
next empty tape in the changer as it was the only remaining DLE. It was
reported as retrying but never did. In this case my holding space was < the
compressed size of this DLE but it would fit onto tape.

Ultimately I found enough drive space to reserve for a holding space > than
this DLE size but I would like to gain that back if this issue could be
fixed.

Search the archives for detailed reporting on the problem by me.

Scott...

Scott R. Burns
NETCON Technologies Incorporated
Suite 135 - 4474 Blakie Road
London, Ontario, Canada
N6L 1G7
Voice: +1.519.652.0401
Fax: +1.519.652.9275
Web: www.netcontech.com

-----Original Message-----
From: owner-amanda-users AT amanda DOT org
[mailto:owner-amanda-users AT amanda DOT org]On Behalf Of Jean-Francois Malouin
Sent: Tuesday, November 01, 2005 11:37 AM
To: AMANDA users
Subject: 2.4.5: dumping directly to tape failure: no retry?


Hi,

[ BTW, I noticed a thread addressing this question yesterday or so
  but I don't want to hijack it... ]

I just upgraded to 2.4.5 and noticed that if a DLE is sent directly to
tape it is not retried if it hits EOT (a full dump that won't fit in
the holdding space). I have runtapes = 10 and never run into this
problem before, or at least with the previous version I was running,
2.4.4p3-20040805. OS: irix-6.5.x with an STK L80 library and LTO1
Ultriums.

Here is some excerpts form the amdump logfile. In the following look
at yorick:/data/mril/mril2. It's the one that failed.

To me it looks like it was being dumped directly to tape label
stk_40-conf2-000028, it hit EOT and was not retried. amanda
just went on with the other DLEs in her schedule.

----
amdump logfile:
----

taper: slot 11: date 20050925 label stk_40-conf2-000028 (exact label match)
taper: read label `stk_40-conf2-000028' date `20050925'
taper: wrote label `stk_40-conf2-000028' date `20051101'
driver: result time 98.468 from taper: TAPER-OK

    [...]

FAILED QUEUE: empty
DONE QUEUE:

    [...]

 10: yorick     /data/mril/mril2

    [...]

ANALYZING ESTIMATES...

    [...]

GENERATING SCHEDULE:
--------
DUMP yorick fffffeff9ffe7f /data/mril/mril2 20051101 2 0
1970:1:1:0:0:0 958730408789 3 2005:10:28:10:51:39 95873040 281565

    [ a full dump of ~92GB is scheduled ]
--------

    [...]

driver: send-cmd time 15078.866 to dumper0: FILE-DUMP 01-00029
/holddisk/conf2/stk_40/20051101031508/shadow._.1 shadow
fffffeff9ffe0f / NODEVICE 1 2005:10:30:12:33:6 2097056 DUMP 265696
|;bsd-auth;index;

    [...]

driver: finished-cmd time 15241.837 taper wrote shadow:/
driver: dumping yorick:/data/mril/mril2 directly to tape
driver: send-cmd time 15241.838 to taper: PORT-WRITE 00-00031 yorick
fffffeff9ffe7f /data/mril/mril2 0 20051101
driver: result time 15241.842 from taper: PORT 16614
driver: send-cmd time 15241.842 to dumper0: PORT-DUMP 00-00031 16614
yorick fffffeff9ffe7f /data/mril/mril2 NODEVICE 0 1970:1:1:0:0:0 DUMP
|;bsd-auth;index;

   [ Here EOT is encountered, next tape is loaded
     and then 3 DLEs are dumped to tape but not the one
    that hit EOT ]

taper: writing end marker. [stk_40-conf2-000028 ERR kb 103821568 fm 16]
changer: opening pipe to: /opt/amanda/amanda2/libexec/chg-zd-mtx -info
changer: got exit: 0 str: 11 10 1 1
changer_query: changer return was 10 1 1
changer_query: searchable = 1
changer_find: looking for stk_40-conf2-000029 changer is searchable = 1
changer_search: stk_40-conf2-000029
changer: opening pipe to: /opt/amanda/amanda2/libexec/chg-zd-mtx
-search stk_40-conf2-000029
changer: got exit: 0 str: 12 /hw/tape/tps12d2nrnsv
taper: slot 12: date 20050926 label stk_40-conf2-000029 (exact label match)
taper: read label `stk_40-conf2-000029' date `20050926'
taper: wrote label `stk_40-conf2-000029' date `20051101'
dumper: driver: result time 23806.848 from dumper0: FAILED 00-00031
["data write : Broken pipe"]
driver: result time 23806.849 from taper: TRY-AGAIN 00-00031 [writing
file: No space left on device]
driver: error time 23806.888 serial gen mismatch
driver: dumping yorick:/data/noel/noel1 directly to tape
driver: send-cmd time 23806.888 to taper: PORT-WRITE 00-00032 yorick
fffffeff9ffe7f /data/noel/noel1 1 20051101

    [...]

driver: result time 23806.891 from taper: PORT 16475
driver: send-cmd time 23806.891 to dumper0: PORT-DUMP 00-00032 16475
yorick fffffeff9ffe7f /data/noel/noel1 NODEVICE 1 2005:10:30:15:52:23 DUMP
|;bsd-auth;index;
driver: state time 23806.891 free kps: 1023791 space: 52428800 taper:
writing idle-dumpers: 9 qlen tapeq: 0 runq: 2 roomq: 0 wakeup: 86400
driver-idle: not-idle
driver: interface-state time 23806.891 if : free 1022391 if LE0: free
400 if LOCAL: free 1000
driver: hdisk-state time 23806.891 hdisk 0: free 52428800 dumpers 0
dumper: time 8564.873taper: s: streamtream_accept: connection from
_client127.0.0.1: connected to .16650127.0.0.1taper: t.16475
ry_socksize: dumper: receive stream_clientbuffer size is : our side is
32768
0.0.0.0.16650

    [...]

driver: result time 23815.032 from dumper0: DONE 00-00032 37 36 8 [sec
7.917 kb 36 kps 4.5 orig-kb 37]
taper: reader-side: got label stk_40-conf2-000029 filenum 1
driver: result time 23819.646 from taper: DONE 00-00032
stk_40-conf2-000029 1 [sec 12.748 kb 37 kps 2.9
{wr: writers 3 rdwait 5.444 wrwait 4.611 filemark 2.691} ]
driver: error time 23819.646 serial gen mismatch
driver: dumping yorick:/data/neuroling/neuroling1 directly to tape

    [...]

taper: DONE [idle wait: 13960.445 secs]
taper: writing end marker. [stk_40-conf2-000029 OK kb 352 fm 3]
driver: FINISHED time 23883.755
amdump: end at Tue Nov  1 09:53:11 EST 2005

I hit a bug or a feature?


TIA
jf