Re: 2.4.5: dumping directly to tape failure: no retry?

* Jon LaBadie <jon AT jgcomp DOT com> [20051101 16:02]:
> On Tue, Nov 01, 2005 at 02:55:28PM -0500, Scott R. Burns wrote:
> > I observed this same issue with:
> > 
> > 2.4.4p4
> > NetBSD/i386
> > HP DAT24*6
> > 
> > My last DLE was > than the remaining tape, but would fit entirely on the
> > next empty tape in the changer as it was the only remaining DLE. It was
> > reported as retrying but never did. In this case my holding space was < the
> > compressed size of this DLE but it would fit onto tape.
> > 
> 
> So it was writing direct to tape.  And after the initial failure
> did not retry.  Sounds like the same situation Jeff Allison
> describes in the thread "spanning tapes".
> 
> > Ultimately I found enough drive space to reserve for a holding space > than
> > this DLE size but I would like to gain that back if this issue could be
> > fixed.
> 
> I wonder what is the best approach to a "failed" dump.  Note, it is
> not a failed taping such as would occur if the taping were coming
> from the holding disk.  The reason for the dump failure was tape
> related, but the dump did fail.
> 
> Is it reasonable to restart the entire dump of the DLE?
> Are there any other "dump" failures that are retried?

To the first question: yes, as long as the dump takes lees than
dtimeout as I'm saying below.

About the 2nd one I'll just say something about dump failures that are
*not* retried. I've seen dump failures because of holdding disk
attrition: there was plenny of space at the beginning of the backup
run but since I have many amanda configurations competing for holdding
disk resources, in the event that it fills up the DLE that was
migrating to tape will fail and there won't be a retry. I'm not quite
sure how such a situation should be deal with as I know that my setup
is far from being 'standard'. This being said maybe something like
'well, looks like /holddisk is full, so let's retry with a dump direct
to tape' would make sense?

> 
> GH pointed out that Jeff's total dump time was over 15 hours.
> As the failed DLE was the only large (and I think only remote)
> DLE, most of those 15 hours were spent on the one dump.  Should
> another 15 hours be expended on the retry of a failed dump?
> 
> Just soliciting opinions.

I believe this should be dealt with a proper/meaningful value for
dtimeout. In the case I presented above the dump took something like
3hr before failing only because amanda had no way of knowing which
DLEs made it to tape and how much tape capacity had been used before
the failed one. I certainly would want to see a retry in a case like
this because the 'failed' DLE *can* fit in a tape and amanda should
try as hard as she can to write the DLE to tape-- as long as it takes
less than dtimeout.

jf

> 
> -- 
> Jon H. LaBadie                  jon AT jgcomp DOT com
>  JG Computing
>  4455 Province Line Road        (609) 252-0159
>  Princeton, NJ  08540-4322      (609) 683-7220 (fax)