Amanda-Users

Re: got FAILED for no apparent reason

2005-09-01 05:24:15
Subject: Re: got FAILED for no apparent reason
From: Rodrigo Ventura <yoda AT isr.ist.utl DOT pt>
To: Paul Bijnens <paul.bijnens AT xplanation DOT com>
Date: Thu, 01 Sep 2005 10:06:35 +0100
It happened again. This time the timeout was on localhost! There is no
firewall involved, both in this case as well as in the last (mail from
July 12th).

FAILURE AND STRANGE DUMP SUMMARY:
  omni       //new/E$ lev 0 FAILED [Estimate timeout from omni]
  omni       //new/C$ lev 0 FAILED [Estimate timeout from omni]
  omni       /var/spool/imap/user/uz lev 0 FAILED [Estimate timeout from omni]
  omni       /var/spool/imap/user/nt lev 0 FAILED [Estimate timeout from omni]
[...]

The timeout setting at amanda.conf is 300 (the default, I
believe). There is comment stating that:
# a positive number will be multiplied by the number of filesystems on
# each host; a negative number will be taken as an absolute total time-out.
# The default is 5 minutes per filesystem.

Let's look at the debug files:

-----[amandad.20050901030010000.debug]----------------------------------------
[...]
//new/C$ 2 SIZE 874751
//new/E$ 0 SIZE 7786744
//new/E$ 1 SIZE 467994
//new/E$ 2 SIZE 467994
----

amandad: time 14935.917: dgram_recv: timeout after 10 seconds
amandad: time 14935.917: waiting for ack: timeout, retrying
amandad: time 14945.917: dgram_recv: timeout after 10 seconds
amandad: time 14945.917: waiting for ack: timeout, retrying
amandad: time 14955.917: dgram_recv: timeout after 10 seconds
amandad: time 14955.917: waiting for ack: timeout, retrying
amandad: time 14965.917: dgram_recv: timeout after 10 seconds
amandad: time 14965.917: waiting for ack: timeout, retrying
amandad: time 14975.917: dgram_recv: timeout after 10 seconds
amandad: time 14975.917: waiting for ack: timeout, giving up!
amandad: time 14975.971: pid 1524 finish time Thu Sep  1 07:09:46 2005
                                                                                
                                                                 
-----[sendsize.20050901030010.debug]----------------------------------------
[...]
sendsize[22886]: estimate time for //new/E$ level 2: 0.422
sendsize[22886]: estimate size for //new/E$ level 2: 467994 KB
sendsize[22886]: time 14925.824: waiting for /usr/bin/smbclient "//new/E$" child
sendsize[22886]: time 14925.824: after /usr/bin/smbclient "//new/E$" wait
sendsize[22886]: time 14925.824: done with amname '//new/E$', dirname 
'//new/E$', spindle -1
sendsize[1525]: time 14925.827: child 22886 terminated normally
sendsize: time 14925.834: pid 1525 finish time Thu Sep  1 07:08:56 2005



What can we gather from this? sendsize finishes 50 seconds before
amandad; the amandad timeouts start 10 seconds before sendize
finishes. What is happening here?

Cheers,

Rodrigo

-- 

*** Rodrigo Martins de Matos Ventura <yoda AT isr.ist.utl DOT pt>
***  Web page: http://www.isr.ist.utl.pt/~yoda
***   Teaching Assistant and PhD Student at ISR:
***    Instituto de Sistemas e Robotica, Polo de Lisboa
***     Instituto Superior Tecnico, Lisboa, PORTUGAL
*** PGP fingerprint = 0119 AD13 9EEE 264A 3F10  31D3 89B3 C6C4 60C6 4585

<Prev in Thread] Current Thread [Next in Thread>