Amanda-Users

Re: Failed Backups

2003-06-06 11:25:08
Subject: Re: Failed Backups
From: Chris Gordon <chris AT theory14 DOT net>
To: smw_purdue <smw_purdue AT yahoo DOT com>, amanda-users AT amanda DOT org
Date: Fri, 6 Jun 2003 11:23:43 -0400
Steve, 


On Wed, Jun 04, 2003 at 02:29:20PM -0000, smw_purdue wrote:
> Chris,
> 
> I'm having the same problem using a similar configuration of backups
> to disk without any holding disks.  Every time Amanda drops into
> degraded mode it's because an error occurred with one of the clients
> (usually a timeout, indicating that a client system was unavailable).
>  I would suspect that there's a bug in the code that puts Amanda into
> degraded mode on more errors than just a tape error.  Notice in your
> log that you have an "unknown response" from gilgamesh.  This error
> was probably what kicked Amanda into degraded mode.

That is exactly what appears to be happening.  I configured a holding
disk in an attempt to eliminate that as a possible cause. In my case,
the problem is intermittent with everything working fine for some time
and then I a failure.  The failure may be some file systems on a given
host or most/all of the backup run.

Today, I had two file systems fail on the again on gilgamesh 
and I began checking the various logs for issue.  What I found in
"sendbackup.lotsofnumbers.debug" is:

---[ begin ]---
sendbackup: time 0.002: stream_server: waiting for connection:
0.0.0.0.1496
sendbackup: time 0.002: stream_server: waiting for connection:
0.0.0.0.1497
sendbackup: time 0.002: stream_server: waiting for connection:
0.0.0.0.1498
sendbackup: time 0.003: waiting for connect on 1496, then 1497, then
1498
sendbackup: time 29.996: stream_accept: timeout after 30 seconds
sendbackup: time 29.996: timeout on data port 1496
sendbackup: time 59.996: stream_accept: timeout after 30 seconds
sendbackup: time 59.996: timeout on mesg port 1497
sendbackup: time 89.996: stream_accept: timeout after 30 seconds
sendbackup: time 89.996: timeout on index port 1498
sendbackup: time 89.996: pid 5263 finish time Fri Jun  6 00:47:44 2003
---[ end ]---

> Anybody out there have time to debug the source?  I may take a look at
> it but time is at a premium right now... (when isn't it???).

Anyone have any ideas?  This only happens occasionally and I haven't
yet been able to draw a correlation.

Thanks,
Chris

<Prev in Thread] Current Thread [Next in Thread>