Amanda-Users

Re: Backup issues with OpenBSD 4.5 machines

2009-09-04 19:04:02
Subject: Re: Backup issues with OpenBSD 4.5 machines
From: Nathan Stratton Treadway <nathanst AT ontko DOT com>
To: Michael Burk <burkml AT gmail DOT com>, "Dustin J. Mitchell" <dustin AT zmanda DOT com>
Date: Fri, 4 Sep 2009 18:40:55 -0400
On Fri, Sep 04, 2009 at 17:13:19 -0400, Dustin J. Mitchell wrote:
> On Fri, Sep 4, 2009 at 4:57 PM, Michael Burk<burkml AT gmail DOT com> wrote:
> > Thanks again for your help. Here's the output of the test prog:
> 
> Looks just like it does locally.  If the test had managed to reproduce
> this failure, then I would have expected to see
>   write: Resource temporarily unavailable
> 
> That means something deeper is going on here.

I wonder if more data needs to get sent down the pipe before the
unexpected "Resource temporarily unavailable" result is returned?


> > And here's the output after the patch change (again with the 0831 snapshot):
> 
> And this shows me that, as far as the kernel is concerned, O_NONBLOCK
> is not set on the file descriptor even on the first call.
> 
> Since my test program hasn't replicated the error, I think we should
> explore a bit more before going to the openbsd lists.  What I'd like
> to put together is a patch to amandad and sendbackup that posts a
> debug message about every dup(), dup2(), pipe(), fcntl(), and close()
> operation that they perform.  I'll whip that up and send it along in a
> few minutes.

If I have followed this thread correctly, it seems like the tests show
that the kernel treating the file descriptor as if O_NONBLOCK is set even
when it's not... but seems to revert to the correct behavior if
"fcntl(XX, F_GETFL, 0)" is called.  

If just running a F_GETFL operation changes the behavior of a file
handle, it seems to point to a kernel bug of some kind.... But the
tracing you are talking about, combined with moving the F_GETFL
operation to different stages of the program flow may help narrow down
which point in the "file-descriptor gymnastics" is triggering that bug.

For example, if you found that adding
  fcntl(3, F_GETFL, 0); 
right after the 
  dup2(input, 3);
line in that in sendbackup.c:start_index() behaved differently than
adding 
  fcntl(input, F_GETFL, 0); 
in that same spot, that would tend to point to something going wrong
in dup2....


                                                Nathan



----------------------------------------------------------------------------
Nathan Stratton Treadway  -  nathanst AT ontko DOT com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

<Prev in Thread] Current Thread [Next in Thread>