Amanda-Users

Re: error redirecting stderr to fd 51

2006-05-03 14:01:01
Subject: Re: error redirecting stderr to fd 51
From: Sean Walmsley <sean AT fpp.nuclearsafetysolutions DOT com>
To: amanda-users AT amanda DOT org
Date: Wed, 3 May 2006 13:51:58 -0400 (EDT)
> > 1) Can anybody tell me what the "error redirecting stderr to fd 51: Bad file
> > number" means? I googled the message but found nothing.
> 
> The program amandad opens data/mesg/index to file descriptor 50/51/52
> and then execs sendbackup.  And sendbackup connects those
> filedescriptors to the needed streams.  The message file descriptor
> needs to connected to the stderr of gnutar, but when tried, sendbackup
> notices that filedescriptor 51 is not valid (not open?).
> Really weird.

I believe that we may be running into a similar problem using
2.5.0p1 on Solaris (server and clients). Unfortunately, in our
case we do *NOT* get any warnings about stderr redirection.
Instead, some of our dump files contain:

1) a normal Amanda header
2) an ASCII list of files that looks suspiciously like the
   output of the index command
   
rather than the expected header followed by a gnu tar file (most
of the dump files are okay, it's only the odd few that seem to
have a problem).

When this happens, the index file for this volume is empty.

Obviously, this is a problem from a recovery point of view :-( !

My (unconfirmed) guess is that somehow the data/mesg/index file
descriptors are getting mixed up and that the index output is
ending up where the data should be. The code mentions "scheduling"
various operations on these file descriptors, so perhaps there
is a race condition somewhere?

!! WARNING !!

The only visible symptom of this issue on our platform is that
amverify reports "End-of-Information detected" (rather than
"End-of-Tape detected.") and exits before checking all of the
files on the tape. There are no 

In my opinion, this is a serious issue since:

1) it results in a corrupted backup for the volumes affected
2) it is not accompanied by a clear error message, i.e. you
   could easily miss the problem if your list of volumes is
   long or you didn't happen to note the difference between
   end of tape and end of information.

I've included below a sample output from amverify for an
affected backup:

--------------------------------------------------
Subject: m1 AMANDA VERIFY REPORT FOR MBK1_02

Tapes:  MBK1_02
No errors found!

amverify m1
Mon May  1 15:39:35 EDT 2006

Loading 2 slot...
Using device /dev/rmt/0n
Volume MBK1_02, Date 20060330
Checked megawatt._net_ghoncho_vol06.20060330.1
Checked megawatt._net_ghoss_vol06.20060330.1
Checked megawatt._.20060330.1
Checked megawatt._vol02.20060330.1
Checked megawatt._net_ghoncho_vol05.20060330.0
Checked megawatt._net_ghoss_vol01.20060330.0
Checked megawatt._vol01.20060330.0
End-of-Information detected.

(NOTE: ~20 filesystems after megawatt._vol01.20060330.0
aren't listed, despite the fact that they are on the
tape. The dump file for the volume after megawatt._vol01.20060330.0
is corrupt and contains only a header and a list of the files
that should have been backed up)

--------------------------------------------------



=================================================================
Sean Walmsley                 sean AT fpp.nuclearsafetysolutions DOT com
Nuclear Safety Solutions Ltd.  416-592-4608 (V)  416-592-5528 (F)
700 University Ave M/S H04 J19, Toronto, Ontario, M5G 1X6, CANADA


<Prev in Thread] Current Thread [Next in Thread>