Amanda-Users

Re: Backup issues with OpenBSD 4.5 machines

2009-08-21 14:03:39
Subject: Re: Backup issues with OpenBSD 4.5 machines
From: stan <stanb AT panix DOT com>
To: John Hein <jhein AT timing DOT com>
Date: Fri, 21 Aug 2009 13:56:39 -0400
On Fri, Aug 21, 2009 at 09:57:36AM -0600, John Hein wrote:
> stan wrote at 10:56 -0400 on Aug 21, 2009:
>  > OK here is the latest on this saga :-)
>  > 
>  > On one of the OpenBSD 4.5 machines I have built 2.5.0p1, and was able to
>  > back this machine up successfully (using classic UDP based authentication)
>  > 
>  > On another of them, I built 2.5.2p1. The first attempt to back this machine
>  > up failed. I checked the log files, and found they were having issues
>  > because /etc/amdates was missing. I corrected that, and started a 2nd
>  > backup run. (Remember amcheck reports all is well with this machine). I 
>  > got the following from amstatus when I attempted to back up this machine.
>  > Also remember, one of the test I ran with a 2.6.1 client was to connect a
>  > test machine directly to the client, using a crossover cable to eliminate
>  > any firewall, or router type issues.
>  > 
>  > I am attaching, what I think is, the amadnad debug file associated with 
> this
>  > failure.
>  > 
>  > Can anyone suggest what I can do to further troubleshoot this?
>  > 
>  > pb48:wd0f                     1  dumper: [could not connect DATA stream:
>  > can't connect stream to pb48.meadwestvaco.com port 11996: Connection
>  > refused] (10:37:27)
>  > 
>    .
>    .
>    .
>  > amandad: time 30.019: stream_accept: timeout after 30 seconds
>  > amandad: time 30.019: security_stream_seterr(0x86b67000, can't accept new 
> stream connection: No such file or directory)
>  > amandad: time 30.019: stream 0 accept failed: unknown protocol error
>  > amandad: time 30.019: security_stream_close(0x86b67000)
>  > amandad: time 60.027: stream_accept: timeout after 30 seconds
>  > amandad: time 60.027: security_stream_seterr(0x81212000, can't accept new 
> stream connection: No such file or directory)
>  > amandad: time 60.027: stream 1 accept failed: unknown protocol error
>  > amandad: time 60.027: security_stream_close(0x81212000)
>  > amandad: time 90.035: stream_accept: timeout after 30 seconds
>  > amandad: time 90.036: security_stream_seterr(0x84877000, can't accept new 
> stream connection: No such file or directory)
>  > amandad: time 90.036: stream 2 accept failed: unknown protocol error
>  > amandad: time 90.036: security_stream_close(0x84877000)
>  > amandad: time 90.036: security_close(handle=0x81bbf800, driver=0x298a9240 
> (BSD))
>  > amandad: time 120.044: pid 17702 finish time Fri Aug 21 10:39:27 2009
> 
> For some reason the socket is not getting marked ready for read.
> select(2) is timing out waiting.  Firewall setup perhaps?
> 
> This bit of code in 2.5.2p1's common-src/stream.c is where
> the failure is happening for you...
> 
OK, I reproduced the failure with only a crossover cable between the test
client and the Amanda Master:

192.168.1.2:wd0f 0  dumper: [could not connect DATA stream: can't connect
stream to 192.168.1.2 port 24376: Connection refused] (13:48:23)

Note the 192.168.1.2 address :-)

This is with a 2.5.2p1 clinet on OpenBSD 4.5 2.5.0p1 works on this same
machine/OS/netwrok configuration.

So, it appears to me that this must be because of something that changed
between 2.5.0p1 and 2.5.2p1. And we have a pretty good idea where in the
code this is failing. So can anyone enlighten me as to what chaged in this
area between those 2 versions?


-- 
One of the main causes of the fall of the roman empire was that, lacking
zero, they had no way to indicate successful termination of their C
programs.