Bacula-users

Re: [Bacula-users] btape fill failure on HP LTO6/4 drives

2017-01-28 08:04:25
Subject: Re: [Bacula-users] btape fill failure on HP LTO6/4 drives
From: Kern Sibbald <kern AT sibbald DOT com>
To: Allan Black <Allan.Black AT btconnect DOT com>, bacula-users AT lists.sourceforge DOT net
Date: Sat, 28 Jan 2017 14:03:14 +0100
Very interesting.  It seems we do not run a full fill test on real tape 
drives very often nor do we run it on alternative systems like 
Solaris.    Too bad Solaris did not run the full Bacula regression tests :-)

Kern

On 01/28/2017 01:00 PM, Allan Black wrote:
> On 09/05/14 14:09, Allan Black wrote:
>> On 01/04/14 10:26, Roberts, Ben wrote:
>>>> It appears that the OS tape driver does not properly
>>>> implement back space record after an EOT.  This is a defect of the 
>>>> operating
>>>> system driver, but it is not fatal for Bacula.
>>> Indeed I was seeing the same failure to backspace over EOT error in the job 
>>> logs:
>>> End of Volume "GSA784L6" at 3910:8005 on device "drive-1-tapestore1" 
>>> (/dev/rmt/1mbn). Write of 64512 bytes got 0.
>>> Error: Backspace record at EOT failed. ERR=I/O error
>>> End of medium on Volume "GSA784L6" Bytes=3,909,704,343,552 
>>> Blocks=60,604,295 at 28-Mar-2014 21:28
>> I am getting exactly the same symptoms, also under Solaris 11, except this 
>> time
>> with an LTO2 drive. The drive worked perfectly under Solaris 10, though, and 
>> I
>> only started seeing this after upgrading to 11.1.
>>
>> The btape fill/m test gave me this at the end of the first tape:
>>
>> Wrote block=3160000, file,blk=204,13499 VolBytes=203,857,855,488 rate=20.62 
>> MB/s
>> 08-May 16:59 btape JobId 0: End of Volume "TestVolume1" at 204:15112 on 
>> device
>> "lto" (/dev/rmt/4cbn). Write of 64512 bytes got 0.
>> 08-May 16:59 btape JobId 0: Error: Backspace record at EOT failed. ERR=I/O 
>> error
>> btape: btape.c:2702 Last block at: 204:15111 this_dev_block_num=15112
>> btape: btape.c:2737 End of tape 204:-1. Volume Bytes=203,961,913,344. Write 
>> rate
>> = 20.60 MB/s
>> 08-May 16:59 btape JobId 0: End of medium on Volume "TestVolume1"
>> Bytes=203,961,913,344 Blocks=3,161,612 at 08-May-2014 16:59.
> This is quite an old thread, but I'm reviving it because I have something 
> significant
> to add - I'm pleased to report that BSR over EOT is now working after an 
> upgrade to
> Solaris 11.3.
>
>> I have difficulty believing that the Solaris mtio and/or st modules fail to
>> handle EOT properly,
> I now believe it, though :-)
>
> A while ago, when researching this, I came across a document on 
> support.oracle.com.
> (Doc ID 1919928.1 if you want to go looking for it!) It wasn't really the 
> same problem,
> but it did contain a statement that struck a chord with me:
>
> "Solaris 11 changed the way initial tape position is determined. st now uses 
> Long Form
> Read Position to determine the tapes position."
>
> No **** man :-)
> In other words, Sun/Oracle rewrote a large part of the st driver, and broke 
> it.
>
> The btape fill test would write to the tape OK with "Backward Space Record = 
> no", but
> the unfill test would get an I/O error trying to read the end of tape 1. I 
> believe the
> data were all written correctly but btape couldn't position the tape properly 
> for unfill
> due to problems with the Solaris 11 st driver.
>
> I suspect that Bacula under early Solaris 11 wouldn't be able to recover some 
> data from
> a backup, if that data happened to be near the end of a tape. Restoring an 
> entire backup
> would possibly be OK though.
>
> [ufsdump/ufsrestore work fine with multiple tapes because ufsrestore doesn't 
> seek]
>
> I can't be sure exactly where the bug appeared and disappeared, but it's a 
> good bet it's
> been there since Solaris 11 was first released. Based on the Solaris versions 
> I have
> used, I can say this much:
>
> Solaris 10 - Works as expected
> Solaris 11.1.0.24.2 - BSR over EOT causes I/O error
> Solaris 11.1.18.5.0 - BSR over EOT causes I/O error
> Solaris 11.2.8.4.0 - BSR over EOT causes I/O error
> Solaris 11.3.13.4.0 - Works as expected
>
> So it was fixed somewhere between 11.2.8 and 11.3.13 - now I wish I had 
> upgraded to
> Solaris 11.3 months ago!
>
> Allan
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>