Amanda-Users

[Fwd: Sendsize Timeout Errors]

2007-01-08 12:03:07
Subject: [Fwd: Sendsize Timeout Errors]
From: Sean Connors <Sean.Connors AT argonst DOT com>
To: amanda-users AT amanda DOT org, amanda-hackers AT amanda DOT org
Date: Mon, 08 Jan 2007 10:43:40 -0500
An update:

Been working this since the original email - backups are everything, you know. Thanks in part to Mr. John Hein and Bill Nolf (a guy here where I work), I decided to run the same backup scheme using ufsdump, and that seems to have corrected the problem. Note that I am using 2.5.0p2. For versions above that, Kevin Till tells us that bsdtcp auth is now tcp exclusively. This negates any udp socket size issues I might be having, I'd imagine.

I originally used gtar (v1.13). Although I've not been able to confirm it yet, I suspect there is a socket size issue with udp. Once I can safely say my backups are operational, I can test this theory. I don't have a test bench, so testing at leisure is not possible.

John Hein's note stated:

Sounds similar to the issue I saw in the past when I started backing
up lots of data.  This was on FreeBSD, but it turned out that the udp
socket was not using it's max size.

See these messages and patch:

http://article.gmane.org/gmane.comp.archivers.amanda.devel/1148/match=message+long http://article.gmane.org/gmane.comp.archivers.amanda.devel/1152/match=message+long

Best to all,

Sean

-------- Original Message ----------

Hi all,

My first post here. I have perused all resources I know of to answer this question, but results have been very limited. Forgive me if this question has been posed and answer prior, but I cannot find the answer.

My issue is a sendsize timeout error. When it happens, amstatus shows the final filesystem as "getting estimate", and it'll hang there for days. The only error comes out of the amandad.xxx log (in /tmp/amanda), and it is:


amandad: time 21599.605: /usr/local/libexec/sendsize timed out waiting for REP data
amandad: time 21599.605: sending NAK pkt:
<<<<<
ERROR timeout on reply pipe

amandad: time 21605.615: pid 17898 finish time Thu Jan 4 20:40:06 2007

That's it. All other logs basically say OK to everything. Does anyone know anything about this? Is this something that has been seen before?

My environment is:
Two servers, one is the Amanda server, and one is the Amanda client. Workstations are not backed up. Attached to the Amanda server is an Apple X-Serve RAID, RAID 5, and largest partitions are 250GB ea.
The greatest amount of data on a single partition is 100GB
Note that the failure only began when I installed the Apple X-Serve RAID, and began using very large partitions. I am leaning toward an issue with gtar trying to calculate backup size on 100+GB worth of data.

Amanda Server:
SunFire V240
Solari 8 (patched to 02/06)
Amanda 2.5.0 (presently - error exists up to 2.5.1p2)
gtar (dumper in use) is version 1.13.1

Dumptype uses:
GNUTAR
compress server fast (client fast for client dumptype)
holdingdisk yes

Presently, I am testing this config because of what I think may be a gtar issue. As yet I have no data:
Dumptype:
<same as above except>
compress none
estimate calcsize

Anything anyone can contribute to this issue will be greatly appreciated.

Sean

Sean Connors
Systems Administrator
ArgonST




<Prev in Thread] Current Thread [Next in Thread>
  • [Fwd: Sendsize Timeout Errors], Sean Connors <=