Amanda-Users

Disk was stranded on waitq, but estimates OK.. Meh? (server 2.5.2p1 - client 2.4.4p3)

2007-11-21 08:54:53
Subject: Disk was stranded on waitq, but estimates OK.. Meh? (server 2.5.2p1 - client 2.4.4p3)
From: Francis Galiegue <fg AT one2team DOT com>
To: amanda-users AT amanda DOT org
Date: Wed, 21 Nov 2007 14:47:18 +0100
[Resend... Without the attachments. Looks like the mailing list handler 
refuses the messages with attachments, but if yes I haven't had a mail back 
mentioning it. Please tell me how I should send them: in the mail itself? 
They're not small...]

Hello list,

First, the setup:

* the server is an RHEL4.x based machine, with amanda version 2.5.2p1, an RPM 
I made from the 2.5.0p2 source RPM from RHEL5 and recompiled, dropping all 
patches (pristine source);
* the failing client is also RHEL4.x based, with the amanda{,-client} RPMs 
from the distribution itself, version 2.4.4p3.

Well, when I say "failing", in fact it has only failed twice in the last two 
weeks, but failed nonetheless.

The server uses vtapes. Amanda.conf for this configuration is attached. Yes, I 
know, it's messy and deserves a huge cleanup, but I haven't had the time for 
this yet :(

Attached are also the amdump and log file of the failure in a tar file. From 
the amdump mail report:

----
FAILURE AND STRANGE DUMP SUMMARY:
  chaos.olympe.o2t  /var/exports/platform       lev 0  FAILED [hmm, disk was 
stranded on waitq]
  chaos.olympe.o2t  /var/exports/taz            lev 0  FAILED [hmm, disk was 
stranded on waitq]
  chaos.olympe.o2t  /var/exports/devel          lev 0  FAILED [hmm, disk was 
stranded on waitq]
[etc etc]
----

The thing is, the backup starts at 12:45am. I received the amdump mail report 
at... 8:30am this morning!

On the client side, the sendsize log files (not attached) are  there and look 
OK. But THERE IS NO SENDBACKUP FILES AT ALL! Now that puzzles me.

Also, the two times that this host failed to backup were with this same 
configuration. There's another configuration doing full backups each time 
which has never failed. Before, the server was 2.4.4p3 and using a DAT 
changer and I had never seen this problem (but since then the server fried 
and we decided to use vtapes instead).

This same backup configuration also backs up five other machines, all having 
the same OS/amanda version setup and none have failed so far.

And finally, I reran amdump this morning for this host only: success...

Do you know what could have happened? I don't have the faintest idea... Any 
hint appreciated... I can send more logs if needed.

Thanks,
-- 
Francis Galiegue, fg AT one2team DOT com - Ingénieur système
[ATTENTION : CHANGEMENT DE COORDONNÉES !]
One2team - 42 Av. Raymond Poincaré - 75116 PARIS CEDEX
+33683877875, +33178945552