Amanda-Users

Re: hosts timing out on amdump but not amcheck

2003-03-07 17:07:57
Subject: Re: hosts timing out on amdump but not amcheck
From: Richard Morse <remorse AT partners DOT org>
To: "justin m. clayton" <justincl AT u.washington DOT edu>
Date: Fri, 7 Mar 2003 15:42:16 -0500
Regarding the time outs, look in the logs on the client, for sendsize*debug (I think) -- see how long it takes to get the estimate for the various drives. Then, check to see that your etimeout is large enough...

HTH,
Ricky


On Friday, March 7, 2003, at 12:20  PM, justin m. clayton wrote:

Thanks for the help. Unfortunately, this has not proved to be the solution to my problem. Suddenly, however, one of the machines (still on autoneg, btw) began working, without any warning or me touching it. All others are
still timing out on amdump (though amcheck still works).

--Justin

On Thu, 13 Feb 2003, Amanda Admin wrote:

Justin,

This sounds like the symptoms others on the amanda list have attributed to
half/full duplex network interface and/or switch problems.

I seem to recall a posting just recently saying that (the built-in eth
interface on ??) Solaris had a particular affinity towards incorrect duplex detection. Maybe a search of the archives on this topic wwill turn up some
details.

Doug

-----Original Message-----
From: owner-amanda-users AT amanda DOT org
[mailto:owner-amanda-users AT amanda DOT org]On Behalf Of justin m. clayton
Sent: Thursday, February 13, 2003 2:37 PM
To: Joshua Baker-LePain
Cc: amanda-users AT amanda DOT org
Subject: Re: hosts timing out on amdump but not amcheck


On Thu, 13 Feb 2003, Joshua Baker-LePain wrote:

On Thu, 13 Feb 2003 at 9:15am, justin m. clayton wrote

First of all, thanks to all who helped me track down my NAK
problems from
last week. Having fixed that, all backup hosts pass amcheck
with flying
colors. However, when it comes time for the amdump, my log
report claims
"Request to <host> timed out" when I return the following morning.
However, if I run amcheck again, no hosts report problems.
This has been
going on for a number of days now. I am getting "Read error at byte
0...:Bad file number" on some hosts (via
/tmp/amanda/sendsize.*), and some
are reporting "amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying" (via
/tmp/amanda/amandad.*).
Strangely, though, the symptom is the same for all machines.

What OS/distro?  Are there firewalls in the way?

The clients are Solaris 8, the server is stable Debian linux, both using 2.4.2p2. No firewalls in the way. This configuration has worked in the
past.

Justin Clayton
VLSI Research System Administrator
University of Washington
Electrical Engineering Dept
justincl AT u.washington DOT edu
206/543.2523  EE/CSE 307E




Justin Clayton
VLSI Research System Administrator
University of Washington
Electrical Engineering Dept
justincl AT u.washington DOT edu
206/543.2523  EE/CSE 307E



<Prev in Thread] Current Thread [Next in Thread>