Bacula-users

[Bacula-users] Re-read of last block OK, but block numbers differ

2008-11-27 12:22:01
Subject: [Bacula-users] Re-read of last block OK, but block numbers differ
From: Allan Black <Allan.Black AT btconnect DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 27 Nov 2008 17:19:21 +0000
Hi, all,

Can anyone help me analyse the problem here? On more than
one occasion, a DDS3 drive has produced this, when it reaches
the end of a tape:

23-Nov 22:03 gershwin-dir JobId 506: Using Device "DDS3-0"
23-Nov 22:06 gershwin-sd JobId 506: End of Volume "MainCatalog-004" at 57:4963 
on device "DDS3-0" (/dev/rmt/1cbn). Write of 64512 bytes got 0.
23-Nov 22:06 gershwin-sd JobId 506: Error: Re-read of last block OK, but block 
numbers differ. Last block=4963 Current block=4963.
23-Nov 22:06 gershwin-sd JobId 506: End of medium on Volume "MainCatalog-004" 
Bytes=28,148,843,520 Blocks=436,334 at 23-Nov-2008 22:06.

Having checked the SD source, I believe what is happening is
that the SD tried to write block 4963 to the tape, but got EOM
(End Of Medium) back from the drive. It then re-read the last
block successfully, but ....

Because of the EOM, the SD expected that it had failed to write
block 4963. When it re-read the last block on the tape, it expected
the block number (which is in the block header) to be 4962, hence
the error message. However, I think that the drive successfully
wrote the block, but also returned a SCSI sense indicating that the
tape was now full.

I put that tape back in the drive and read it with bls -k, getting
this:

[...]
Block: 4961 size=64512
Block: 4962 size=64512
Block: 4963 size=64512
26-Nov 13:44 bls JobId 0: End of file 58 on device "dds3" (/dev/rmt/1cbn), 
Volume "MainCatalog-004"

Which makes me think that block 4963 has indeed been written to the
tape.

However: I also think that that catalog backup has actually been
corrupted, because I think the SD would write the data from block
4963 at the beginning of the new tape, which means if I restored
that particular backup, I would find a block of data repeated.

[Since this was "only" a catalog backup, it was no problem to run it
again manually, so I do actually have a good catalog backup!]

I do not believe this is a Solaris or SCSI problem; DLT drives (and
autochangers) work perfectly on the same card (although on a different
segment of the bus). I suspect (yuck) I may have to experiment with
the configuration switches on the drive.

Has anyone come across a similar situation before and, if so, is
able to point me in the correct direction to debug it? In particular,
does anyone have any experience of how a SCSI drive is supposed to
behave at EOM?

Bacula 2.4.3
Solaris 10 x86
HP C1557A DDS-3 autoloader

Allan

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>