Amanda-Users

RE: remote client failures

2004-09-22 19:32:22
Subject: RE: remote client failures
From: donald.ritchey AT exeloncorp DOT com
To: kelbley AT cs.unm DOT edu, amanda-users AT amanda DOT org
Date: Wed, 22 Sep 2004 18:28:37 -0500
George:

See if someone in your Network team changed anything in your firewall that
would affect timeouts for UDP traffic.  It appears to me that the amandad is
not seeing the messages in time and timing out.  Someone may have shortened
a timeout value in the firewall or fat-fingered a setting that affects your
connection.

You may have already checked these items; if so, sorry to duplicate your
efforts, otherwise, look at network changes that occur between the two
servers.

Best wishes,

Donald L. (Don) Ritchey
Information Technology
Exelon Corporation


-----Original Message-----
From: George Kelbley [mailto:kelbley AT cs.unm DOT edu]
Sent: Wednesday, September 22, 2004 11:20 AM
To: amanda-users AT amanda DOT org
Subject: remote client failures


I am having problems backing up a client in different building (on a 
different subnet).  The strange thing is this worked for over a year and 
suddenly started failing.  It does not appear to be an acl or firewall 
issues, because sometimes it will work.  I set up a separate config just 
for this host so I could test, and find that I can kick of amdump, it 
appears to start on the client (I get the debug files), but at some 
point it dies and I get an amandad debug with the following that the end:
amandad: time 126.413: dgram_recv: timeout after 10 seconds
amandad: time 126.413: waiting for ack: timeout, retrying
amandad: time 136.413: dgram_recv: timeout after 10 seconds
amandad: time 136.413: waiting for ack: timeout, retrying
amandad: time 146.413: dgram_recv: timeout after 10 seconds
amandad: time 146.413: waiting for ack: timeout, retrying
amandad: time 156.413: dgram_recv: timeout after 10 seconds
amandad: time 156.413: waiting for ack: timeout, retrying
amandad: time 166.413: dgram_recv: timeout after 10 seconds
amandad: time 166.413: waiting for ack: timeout, giving up!
amandad: time 166.413: pid 6926 finish time Wed Sep 22 10:04:29 2004

If I stop the processes on the server, and run amcleanup, and restart 
amdump, with _no_ changes to anything, the dump will complete normally, 
most of the time.  Other times it will fail with the same type of output 
in the debug file and I'll have to repeat.

The client and server are running linux, (debian testing) amanda 2.4.4.p3.

Needless to say, the intermittancy of this making troubleshooting 
difficult.

The clients on the same subnet as the server back up normally.

-- 
George Kelbley                  System Support Group    
Computer Science Department     University of New Mexico
505-277-6502                    Fax: 505-277-6927


************************************************************************
This e-mail and any of its attachments may contain Exelon Corporation
proprietary information, which is privileged, confidential, or subject 
to copyright belonging to the Exelon Corporation family of Companies. 
This e-mail is intended solely for the use of the individual or entity 
to which it is addressed.  If you are not the intended recipient of this 
e-mail, you are hereby notified that any dissemination, distribution, 
copying, or action taken in relation to the contents of and attachments 
to this e-mail is strictly prohibited and may be unlawful.  If you have 
received this e-mail in error, please notify the sender immediately and 
permanently delete the original and any copy of this e-mail and any 
printout. Thank You.
************************************************************************


<Prev in Thread] Current Thread [Next in Thread>