Amanda-Users

Re: hosts timing out on amdump but not amcheck

2003-02-14 05:10:15
Subject: Re: hosts timing out on amdump but not amcheck
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: amanda-users AT amanda DOT org
Date: Fri, 14 Feb 2003 09:57:02 +0100
justin m. clayton wrote:
On Thu, 13 Feb 2003, Joshua Baker-LePain wrote:


On Thu, 13 Feb 2003 at 9:15am, justin m. clayton wrote


First of all, thanks to all who helped me track down my NAK problems from
last week. Having fixed that, all backup hosts pass amcheck with flying
colors. However, when it comes time for the amdump, my log report claims
"Request to <host> timed out" when I return the following morning.
However, if I run amcheck again, no hosts report problems. This has been
going on for a number of days now. I am getting "Read error at byte
0...:Bad file number" on some hosts (via /tmp/amanda/sendsize.*), and some
are reporting "amandad: dgram_recv: timeout after 10 seconds
amandad: waiting for ack: timeout, retrying" (via /tmp/amanda/amandad.*).
Strangely, though, the symptom is the same for all machines.

What OS/distro?  Are there firewalls in the way?


The clients are Solaris 8, the server is stable Debian linux, both using
2.4.2p2. No firewalls in the way. This configuration has worked in the
past.

For the record, I also had that problem with errors like "Bad file number" and "dgram_recv: timeout" just after I upgraded to 2.4.3. I had the problem for 3 consecutive nights, each night on different hosts.

I then recompiled and reinstalled the Amanda software with "--with-debugging". Strange enough, since then, the problem never happened again. Currently I'm still running with debugging. I have a mix of Solaris 8, Solaris 2.7, Solaris 2.6, SunOS 4.1.4 (yeekes), Linux (Slackware, Red Hat).



Justin Clayton
VLSI Research System Administrator
University of Washington
Electrical Engineering Dept
justincl AT u.washington DOT edu
206/543.2523  EE/CSE 307E



--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************



<Prev in Thread] Current Thread [Next in Thread>