Re: sendbackup file renaming failure

On Thu, 26 Aug 2004, Paul Bijnens wrote:

> Mitch Collinsworth wrote:
>
> > Today one of my dumps failed in an unusual manner.
> >
> > After the dump finished and the index was completed, I received the
> > following error:
> >
> > error [renaming 
> > /var/adm/amanda/gnutar-lists/usda01afs:usda01_b_.*.backup_0.new
> > to /var/adm/amanda/gnutar-lists/usda01afs:usda01_b_.*.backup_0: No such 
> > file or
> > directory]
>
> I guess the '*' in the message above is a placeholder for the real
> name?  Or is this the real filename?
>
> Even then, where is the '.backup' coming from?  My gnutar-lists are
> named:  hostname_disk_0  (with the last 0 indicating the level,
> and any '/' in the hostname or disk is replaced with a '_').
> There is no place for a '.*.backup' in that scheme.

The disklist entry is:
usda01  afs:usda01/b/.*.backup  nocomp-user

So it follows the format you describe.  The last 0 does indeed indicate
a level 0 dump.


> Or does that has something to do with some afs modifications, as
> the name of the host suggests?

The only modifications we made to amanda itself were 1) in selfcheck, to
not try to test accessibility of a gnutar directory name beginning with
"afs", and 2) gnutar itself is replaced with a wrapper script that
examines the DLE and chooses whether to run gnutar or an AFS command to
produce the actual backup file.  ".*.backup" is meaningful to the wrapper
script.

This has all been working fine for a couple of years now.  There have been
no recent changes.  And despite this error on partition b, partitions a,
c, and d ran just fine on this host.


> > Somehow this was sufficient for the dump to "fail".  Therefore it was
> > deleted from the holding disk rather than copied to tape.  [Grr..!]
>
> The difference between success and failure of a backup is indeed a gray
> area, instead of a sharp line.  This one is just on the boundary.
> The next dump would have trouble anyway because it wouldn't have a
> gnutar-list to base its incremental dumps on.  That would result in
> a full dump, which is indeed a good fallback in that case.

Certainly a better fallback than throwing up hands and putting nothing
on tape!  :-)


> > In roughly 5 years of using amanda I have never seen this happen before.
>
> Me neither.
>
> >
> > This is 2.4.3b3.
>
> Try an upgrade to 2.4.4p3, at least on that client.

Am planning to do this globally some time 'soon'.  But it doesn't help
with why this has been working fine for many months and suddenly messed
up yesterday.

There is one possibly-interesting thing about this one partition.  It
has been growing recently.  Enough so that I have several times recently
had to bump up dtimeout just for it.  The problem there is that the
program we substituted for doing indexing of AFS dumps seems particularly
slow on large dumps containing many, many small files.  I did have to
do this just the day before this happened.  Is it somehow possible that
we ran into a corner case where it didn't timeout while the index was
being created, but got so close that the timer ran out just after it
finished and whoever checks the timer started some cleanup processing
that zapped this file?  And meanwhile whoever actually reports the timeout
error did not do so, because the dump did in fact finish successfully?
Sounds a bit crazy, but it's all I can think of so I may as well ask.  :-)

-Mitch