Amanda-Users

amandad: dgram_recv: timeout

2003-01-02 11:42:26
Subject: amandad: dgram_recv: timeout
From: David Raistrick <drais AT wow.atlasta DOT net>
To: amanda-users AT amanda DOT org
Date: Thu, 2 Jan 2003 08:07:00 -0800 (PST)
Hey folks.

I've been trying to solve a problem with amanda for the past few
months.  Until yesterday it was only a problem on one (out of ~10
servers) client.  Now it's two!

Examples from the dump report:
  newww.gta. / lev 0 FAILED [Request to newww.gta.com timed out.]
  bento.gta. / lev 0 FAILED [Request to bento.gta.com timed out.]


bento has had the problem longer.  amcheck reports no errors with
bento.  today, amcheck DOES report errors with newww;
WARNING: newww.gta.com: selfcheck request timed out.  Host down?

Even though selfcheck...debug seems fine:
/tmp/amanda%# more selfcheck.20030102105821.debug 
selfcheck: debug 1 pid 61064 ruid 2 euid 2 start time Thu Jan  2 10:58:21
2003
/usr/local/libexec/amanda/selfcheck: version 2.4.3b2
selfcheck: checking disk /var
selfcheck: device /var
selfcheck: OK
selfcheck: checking disk /usr
selfcheck: device /usr
selfcheck: OK
selfcheck: checking disk /home
selfcheck: device /home
selfcheck: OK
selfcheck: checking disk /
selfcheck: device /
selfcheck: OK
selfcheck: pid 61064 finish time Thu Jan  2 10:58:21 2003

(ran it twice to be sure..same result, same report.)

The amandad..debug for this ends with:

amandad: It's not an ack
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, giving up!
amandad: pid 61063 finish time Thu Jan  2 10:59:11 2003

---

The amandad..debug on the client when amdump runs is similar:

<clip>

amandad: sending REP packet:
----
Amanda 2.4 REP HANDLE 009-80350808 SEQ 1041408009
OPTIONS maxdumps=1;
/ 0 SIZE 46800
/ 1 SIZE 46800
/home 0 SIZE 547240
/home 1 SIZE 547240
/usr 0 SIZE 5118390
/usr 1 SIZE 5119120
/usr 2 SIZE 5119120
/var 0 SIZE 179050
/var 1 SIZE 179050
----

amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying
amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, giving up!
amandad: pid 13671 finish time Wed Jan  1 03:01:23 2003

-------


Help! I'm open to any and all suggestions.

FWIW, bento and the amanda server are on the same ethernet switch.  newww
and the amanda server are seperated by a firewall (which is, and has been,
correctly configured.  Two other servers on the same network as newww
still backup correctly.)

If you need any specific information from me, let me know and I can
provide it.  I'm not yet sure what will help you folks help me.:)

thanks.

...david





---
david raistrick
drais AT atlasta DOT net                http://www.expita.com/nomime.html



<Prev in Thread] Current Thread [Next in Thread>