Amanda-Users

RE: Data Timeout

2006-06-01 04:25:50
Subject: RE: Data Timeout
From: "Paul Duncan" <Paul.Duncan AT yolus DOT com>
To: "Paul Bijnens" <paul.bijnens AT xplanation DOT com>
Date: Thu, 1 Jun 2006 09:15:28 +0100
Paul

Thanks for the help.  We have changed disklist so that the filesystem
which was giving data timeouts now backs up using client side
compression.  This appears to have fixed the problem.  I have answered
your questions below and will troubleshoot this further if the problem
recurs.

> When a dumper needs more diskspace on the holdingdisk than it 
> reserved in the beginning, it asks driver with a command 
> "RQ-MORE-DISK"? Do you see that string in the logfile?

Yes; this entry appears four times:

driver: result time 4573.354 from dumper2: RQ-MORE-DISK 03-00029
driver: result time 4828.225 from dumper2: RQ-MORE-DISK 03-00039
driver: result time 4829.323 from dumper2: RQ-MORE-DISK 03-00039
driver: result time 12208.703 from dumper0: RQ-MORE-DISK 00-00001

Dumper 0 is the dumper for the filesystem that failed.  What does this
tell me?

> I would have a look in the sendbackup.*.debug file on the 
> client and see if some warning/error message is in there, and 
> verify that the client was still running at that time.

I have not been able to find this file.

> Could this also be just another symptom of the problem described here:
> 
> http://wiki.zmanda.com/index.php/Amdump:_mesg_read:_Connection
_reset_by_peer
> 

Thanks for the link but it does not relate to my problem: The filesystem
being backed up is on the Amanda server and so there is no firewall
between client and server.  Also the error identifying the problem does
not appear in my amdump log file.

Yours,
 
Paul Duncan
Yolus Ltd.


 

> -----Original Message-----
> From: Paul Bijnens [mailto:paul.bijnens AT xplanation DOT com] 
> Sent: 31 May 2006 15:53
> To: Paul Duncan
> Cc: amanda-users AT amanda DOT org
> Subject: Re: Data Timeout
> 
> On 2006-05-30 10:30, Paul Duncan wrote:
> > Hello,
> >  
> > One of our filesystems is failing to get backed up and I am 
> interested 
> > in trying to ascertain why.  The report entry is:
> >  
> > compaqdev2 /export/home lev 0 FAILED [data timeout]
> > 
> > In the amdump file I see the following suspicious entries.  I get a 
> > series of "driver-idle: no-diskspace" entries which span 3 hours:
> > 
> > driver: state time 1737.214 free kps: 254081 space: 2164824 taper: 
> > idle
> > idle-dumpers: 1 qlen tapeq: 0 runq: 26 roomq: 0 wakeup: 15 
> driver-idle: 
> > no-diskspace
> > 
> > driver: state time 12480.205 free kps: 286720 space: 
> 15482922 taper: 
> > writing idle-dumpers: 5 qlen tapeq: 6 runq: 2 roomq: 0 wakeup: 86400
> > driver-idle: no-diskspace
> 
> The above means that Amanda did not start up another dumper 
> because the holdingdisk had all space reserved by other dumpers.
> 
> When a dumper needs more diskspace on the holdingdisk than it 
> reserved in the beginning, it asks driver with a command 
> "RQ-MORE-DISK"? Do you see that string in the logfile?
> 
> > 
> > Then the filesystem dump fails over an hour later:
> > 
> > driver: result time 17127.984 from dumper0: FAILED 01-00054 [data 
> > timeout]
> 
> I would have a look in the sendbackup.*.debug file on the 
> client and see if some warning/error message is in there, and 
> verify that the client was still running at that time.
> 
> Could this also be just another symptom of the problem described here:
> 
> http://wiki.zmanda.com/index.php/Amdump:_mesg_read:_Connection
_reset_by_peer
> 
> 
> > Why does the data timeout occur over an hour after disk 
> space becomes 
> > available?  The dtimeout parameter in amanda.conf is set to 1800.
> 
> I think it is because the datatimeout is not triggered by the 
> diskspace but by something else.
> 
> 
> -- 
> Paul Bijnens, xplanation Technology Services        Tel  +32 
> 16 397.511
> Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 
> 16 397.512
> http://www.xplanation.com/          email:  
> Paul.Bijnens AT xplanation DOT com
> **************************************************************
> *********
> * I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, 
> ^Q, ^^, *
> * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, 
> bye, /bye, *
> * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  
> hangup, *
> * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  
> shutdown, *
> * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, 
> Stop-A, ... *
> * ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  
>         *
> **************************************************************
> *********
> 
> 


-------------------------------------------------------------------------------
This message (including any attachments) is confidential and may be privileged. 
If you have received it by mistake please notify the sender by return e-mail 
and delete this message from your system. Any unauthorised use or dissemination
of this message in whole or in part is strictly prohibited. Please note that
e-mails are susceptible to change. Yolus Limited shall not be responsible for
the improper or incomplete transmission of the information contained in this
communication nor for any delay in its receipt or damage to your system. 

Yolus Limited does not guarantee that the integrity of this communication has
been maintained nor that this communication is free of viruses, interceptions 
or interference.             
-------------------------------------------------------------------------------



<Prev in Thread] Current Thread [Next in Thread>
  • RE: Data Timeout, Paul Duncan <=