Amanda-Users

Re: Disk was stranded on waitq, but estimates OK.. Meh? (server 2.5.2p1 - client 2.4.4p3)

2007-11-21 14:14:22
Subject: Re: Disk was stranded on waitq, but estimates OK.. Meh? (server 2.5.2p1 - client 2.4.4p3)
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: Francis Galiegue <fg AT one2team DOT com>
Date: Wed, 21 Nov 2007 14:07:47 -0500
You should never get a "hmm, disk was stranded on waitq" error message.
Post the complete amdump.X log file from the server and the complete amanda.*.debug and sendsize.*.debug files from the client.

Jean-Louis

Francis Galiegue wrote:
[Resend... Without the attachments. Looks like the mailing list handler refuses the messages with attachments, but if yes I haven't had a mail back mentioning it. Please tell me how I should send them: in the mail itself? They're not small...]

Hello list,

First, the setup:

* the server is an RHEL4.x based machine, with amanda version 2.5.2p1, an RPM I made from the 2.5.0p2 source RPM from RHEL5 and recompiled, dropping all patches (pristine source); * the failing client is also RHEL4.x based, with the amanda{,-client} RPMs from the distribution itself, version 2.4.4p3.

Well, when I say "failing", in fact it has only failed twice in the last two weeks, but failed nonetheless.

The server uses vtapes. Amanda.conf for this configuration is attached. Yes, I know, it's messy and deserves a huge cleanup, but I haven't had the time for this yet :(

Attached are also the amdump and log file of the failure in a tar file. From the amdump mail report:

----
FAILURE AND STRANGE DUMP SUMMARY:
chaos.olympe.o2t /var/exports/platform lev 0 FAILED [hmm, disk was stranded on waitq] chaos.olympe.o2t /var/exports/taz lev 0 FAILED [hmm, disk was stranded on waitq] chaos.olympe.o2t /var/exports/devel lev 0 FAILED [hmm, disk was stranded on waitq]
[etc etc]
----

The thing is, the backup starts at 12:45am. I received the amdump mail report at... 8:30am this morning!

On the client side, the sendsize log files (not attached) are there and look OK. But THERE IS NO SENDBACKUP FILES AT ALL! Now that puzzles me.

Also, the two times that this host failed to backup were with this same configuration. There's another configuration doing full backups each time which has never failed. Before, the server was 2.4.4p3 and using a DAT changer and I had never seen this problem (but since then the server fried and we decided to use vtapes instead).

This same backup configuration also backs up five other machines, all having the same OS/amanda version setup and none have failed so far.

And finally, I reran amdump this morning for this host only: success...

Do you know what could have happened? I don't have the faintest idea... Any hint appreciated... I can send more logs if needed.

Thanks,