Bacula-users

Re: [Bacula-users] Tape MTEOM error with Dell TL2000 (IBM > TS3100)

2009-05-12 08:41:24
Subject: Re: [Bacula-users] Tape MTEOM error with Dell TL2000 (IBM > TS3100)
From: yvan <yvan AT skywalker.is-a-chef DOT com>
Date: Tue, 12 May 2009 14:36:42 +0200
Hey!

Unfortunatelly I already used that trick :), the firmware has been 
upgraded for both the drive and the autochanger... And it didn't change 
at all.

I had a bunch of backup launched, and it seems that the further it has 
to seek on the take to position to append data, the more subject of 
errors it is. The 5 1st jobs on the tape are usually allright, but then 
it begins to be unpredictive.

To go on, I've noticed that I can update the tape status and put it back 
from "Error" to "Append", and 50% of the time the job waiting for a new 
tape starts successfully.  I have a pack of 16 tapes, and it seems it's 
with all of them. Now I have a tape that is stuck to 100G, and can't 
write anything further, and one that has successfully gone to 130Gb 
(each test job is 12 Gb).

I really like bacula and how it manage the pools, volumes, schedules and 
clients, and I 'll try to stay with. Now i'm trying to setup a 
workaround, to use the tapes only once : to write only full bakckups 
once a week, 'cause that machine is a dedicated backup server with loads 
of disc space, and incremental are done with "rsync" on disk. Then I can 
maybe just dump the local filesystem on 2 tapes (i need 1.5Tb for every 
full backup, so it means 2x LTO4 800Gb-1.6TB).

For Hayden : the "cap" is a command of the "btape" utility, that shows 
every configuration option set in the storage config file for that 
device. Just type "cap" in  btape...

Regards
Yvan Broccard

Thomas Bennett wrote:
> It might help to update your firmware if it is not current.
>
> While setting up Bacula for the first time with RHEL 5.2 and the TL2000 I ran 
> btape and then cap which showed the tape was  append.  I tried to run the 
> test 
> and it failed.  I ran cap again and it showed !append.  This happened every 
> time I restarted btape without unmounting the tape.  Also, it appears that 
> Yosemite Backup partitioned my tapes.  I would run btape erase and it would 
> run just a couple of seconds.  Then I immediatly again and would take over an 
> hour.  After trying several other possible solutions I went to the dell site 
> and found there was new firmware and a diagnostics and firmware upgrade tool
>
> ITDT - DELL Customer Release
>
> which can be downloaded at
>
> http://support.dell.com/support/downloads/index.aspx?c=ca&l=en&s=gen&SystemID=PWV_TL2000
>
> just choose your OS and Language.  I updated the TL-2000 firmware and ran the 
> ITDT diagnostics which wrote to tapes with no problem.  After that, the test 
> in btape ran fine.  Then I had an issue with my backups running but after 
> upgrading to PostgreSQL 8.4 backups ran fine.
>
>
> Here are my confs if that might help.
>
> >From bacula-dir.conf
>
> #Definition of LTO-4 storage
> Storage {
>   Name = TL2000
>   Address = 192.168.0.1         # N.B. Use a fully qualified name here
>   SDPort = 9103
>   Password = "xxxxxxxxx"     # password for Storage daemon
>   Device = TL2000                 # must be same as Device in Storage daemon
>   Media Type = LTO-4            # must be same as MediaType in Storage daemon
>   Autochanger = yes             # enable for autochanger device
> }
>
> >From bacula-sd.conf
>
> Autochanger{
>   Name = TL2000
>   Device = LTO-4
>   Changer Device = /dev/sg4
>   Changer Command = "/etc/bacula/mtx-changer %c %o %S %a %d"
> }
>
> Device{
>   Name = LTO-4
>   Drive Index = 0
>   Autochanger = yes
>   Device Type = Tape
>   Media Type = LTO-4
>   LabelMedia = yes
>   Archive Device = /dev/nst0
>   Always Open = yes
>   AutomaticMount = yes
>   RemovableMedia = yes
>   RandomAccess = no
> }
>
>
> By the way, there seems to be some disagreement between btape and bacula-sd 
> when using the -t switch for testing
>
> using 
> Storage {                             # definition of myself
>   Name = xyz-sd
>   WorkingDirectory = "/var/bacula/working"
>   Pid Directory = "/var/run"
>   Maximum Concurrent Jobs = 20
>   SDAddresses = {ip={
>          addr = 192.168.0.1; port= 9103}
>   }
> }
>
> btape complains about SDAddresses format.
>
> Thomas
>
>
>
>
>
> On Friday 08 May 2009 03:24:56 yvan wrote:
>   
>> Hi,
>> here is the configuration file.
>> Yesterday I blanked 2 tapes, and assignated one to a pool "Daily" and
>> one to the pool "Weekly". I ran btape fill and test with successs.
>> I then started full backup jobs of 12Gb in loop. Got that result:
>>
>> Job A Daily, EOD 12GB ok
>> Job B Weekly, EOD 12GB ok
>> Job A Daily, EOD 24GB ok
>> Job B Weekly, EOD 24GB ok
>> Job A Daily, EOD 36GB ok
>> Job B Weekly, EOD 36GB ok
>> Job A Daily,  Starting, unloading tape from JOB B and i got the "MTEOM"
>> error at this point. (Could not append to end of data)
>>
>> 08-May 09:16 setmseblx0007-sd JobId 181: Error: Unable to position to
>> end of data on device "IBMLTO4" (/dev/nst0): ERR=dev.c:1354 ioctl MTFSF
>> error on "IBMLTO4" (/dev/nst0). ERR=Input/output error.
>>
>> This morning I updated the status of that volume from the console,
>> putting it to append again, and the job started and finished successfully !
>> There is a "sleep 60" after the load in my mtx-changer script, and a
>> "sleep 180" after the unload. Should be enough no ? when I do it
>> manually with "mtx" it takes something like 15 seconds.
>>
>> Now the 4th "Job B" sent me the MTEOM error as well at the same position
>> on the tape.
>>
>> Here is the configuration file :
>>
>> Device {
>>         Name = IBMLTO4
>>         Media Type = LTO4
>>         Device Type = tape
>>         Archive Device  = /dev/nst0
>>         Autochanger = yes
>>         #Changer Device = /dev/sg3
>>         #Alert Command = "sh -c 'tapeinfo -f %c | grep -i tapealert'"
>>         Label Type = IBM
>>         Check Labels = yes
>>         LabelMedia = No;
>>         Random Access = No;
>>         RandomAccess = No;                  # which one is the correct
>> syntax ?
>>         AutomaticMount = yes;               # when device opened, read it
>>         RemovableMedia = yes;
>>         AlwaysOpen = yes
>>         Requires Mount = no;
>>
>>         #Maximum Job Spool Size = 2G     # default is unlimited
>>         #Spool Directory = "/export/shared/spool"
>>
>>         # things to try:
>>         TWO EOF = No            # tried but append test fails. default is
>> No #Offline On Unmount = no                # default is No
>>         Hardware End of Medium = no             # default is Yes
>>         BSF at EOM = No;                        # default is No
>>         #Backward Space Record = no             # default is Yes for
>> tape device
>>         #Backward Space File = no               #
>>         # Use MTIOCGET = Yes                    # only need to no on
>> some ***BSD systems
>>         Fast Forward Space File = no            # tried, not better, got
>> MTFSF error
>>                                 # This line required if above HEOM is
>> set to "No"
>>         Volume Poll Interval = 300      # Poll the drive to seek the status
>>         }
>>
>> Autochanger {
>>   Name = TL2000
>>   Device = IBMLTO4
>>   Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
>>   Changer Device = /dev/sg3  # already set in the device resource. both
>> are possibles
>> }
>>
>> Thank you all for your help !
>>
>> Hayden Katzenellenbogen wrote:
>>     
>>> Yvan,
>>>
>>> Could you paste a copy of your bacula-sd.conf. The device and auto
>>> changer sections.
>>>
>>> I have found that if I load the tape into the drive then run the fill
>>> test it will not give the WEOF error, but when it loads the second tape
>>> it will give the WEOF error.
>>>
>>> If I have any other tape in the drive before I start and btape loads tape
>>> one for the fill test I get the WEOF error on both the first and second
>>> tape.
>>>
>>> The btape test runs 100% including the append test. Here is a snippet of
>>> the last fill test I ran. I had already loaded tape 1 into the drive
>>> before start. I also did an erase on both tapes using John's two mt
>>> commands.
>>>
>>> Also would it make a difference that I am running this on Ubuntu 8.0.4
>>> LTS and using a fiber channel drive?
>>>
>>> H
>>>
>>> root@archive:~/bacula/etc# ../bin/btape -c bacula-sd.conf /dev/nst0
>>> Tape block granularity is 1024 bytes.
>>> btape: butil.c:285 Using device: "/dev/nst0" for writing.
>>> 05-May 15:40 btape JobId 0: 3301 Issuing autochanger "loaded? drive 0"
>>> command. 05-May 15:40 btape JobId 0: 3302 Autochanger "loaded? drive 0",
>>> result is Slot 1. btape: btape.c:383 open device "Drive-1" (/dev/nst0):
>>> OK
>>> *fill
>>>
>>> This command simulates Bacula writing to a tape.
>>> It requires either one or two blank tapes, which it
>>> will label and write.
>>>
>>> If you have an autochanger configured, it will use
>>> the tapes that are in slots 1 and 2, otherwise, you will
>>> be prompted to insert the tapes when necessary.
>>>
>>> It will print a status approximately
>>> every 322 MB, and write an EOF every 3.2 GB.  If you have
>>> selected the simple test option, after writing the first tape
>>> it will rewind it and re-read the last block written.
>>>
>>> If you have selected the multiple tape test, when the first tape
>>> fills, it will ask for a second, and after writing a few more
>>> blocks, it will stop.  Then it will begin re-reading the
>>> two tapes.
>>>
>>> This may take a long time -- hours! ...
>>>
>>> Do you want to run the simplified test (s) with one tape
>>> or the complete multiple tape (m) test: (s/m) m
>>> Multiple tape test selected.
>>> Wrote Volume label for volume "TestVolume1".
>>> Wrote Start of Session label.
>>> 15:44:05 Begin writing Bacula records to first tape ...
>>> Wrote blk_block=5000, dev_blk_num=4999 VolBytes=322,495,488 rate=80623.9
>>> KB/s Wrote blk_block=10000, dev_blk_num=9999 VolBytes=645,055,488
>>> rate=92150.8 KB/s Wrote blk_block=15000, dev_blk_num=14999
>>> VolBytes=967,615,488 rate=96761.5 KB/s Wrote blk_block=20000,
>>> dev_blk_num=4499 VolBytes=1,290,175,488 rate=86011.7 KB/s Wrote
>>> blk_block=25000, dev_blk_num=9499 VolBytes=1,612,735,488 rate=76796.9
>>> KB/s Wrote blk_block=30000, dev_blk_num=14499 VolBytes=1,935,295,488
>>> rate=80637.3 KB/s
>>>
>>>
>>> Wrote blk_block=13055000, dev_blk_num=15500 VolBytes=842,204,095,488
>>> rate=70625.1 KB/s 19:02:52 Flush block, write EOF
>>> Wrote blk_block=13060000, dev_blk_num=4000 VolBytes=842,526,655,488
>>> rate=70598.9 KB/s Wrote blk_block=13065000, dev_blk_num=9000
>>> VolBytes=842,849,215,488 rate=70608.1 KB/s Wrote blk_block=13070000,
>>> dev_blk_num=14000 VolBytes=843,171,775,488 rate=70611.5 KB/s Wrote
>>> blk_block=13075000, dev_blk_num=3500 VolBytes=843,494,335,488
>>> rate=70603.0 KB/s Wrote blk_block=13080000, dev_blk_num=8500
>>> VolBytes=843,816,895,488 rate=70612.3 KB/s 05-May 19:03 btape JobId 0:
>>> End of Volume "TestVolume1" at 1226:13010 on device "Drive-1"
>>> (/dev/nst0). Write of 64512 bytes got -1. 05-May 19:03 btape JobId 0:
>>> Re-read of last block succeeded.
>>> btape: btape.c:2360 Last block at: 1226:13009 this_dev_block_num=13010
>>> btape: btape.c:2394 End of tape 1226:0. VolumeCapacity=844,107,844,608.
>>> Write rate = 70595.3 KB/s 05-May 19:03 btape JobId 0: End of medium on
>>> Volume "TestVolume1" Bytes=844,107,844,608 Blocks=13,084,509 at
>>> 05-May-2009 19:03. 05-May 19:03 btape JobId 0: 3307 Issuing autochanger
>>> "unload slot 1, drive 0" command. 05-May 19:04 btape JobId 0: 3301
>>> Issuing autochanger "loaded? drive 0" command. 05-May 19:04 btape JobId
>>> 0: 3302 Autochanger "loaded? drive 0", result: nothing loaded. 05-May
>>> 19:04 btape JobId 0: 3304 Issuing autochanger "load slot 2, drive 0"
>>> command. 05-May 19:04 btape JobId 0: 3305 Autochanger "load slot 2, drive
>>> 0", status is OK. 05-May 19:04 btape: Fatal Error at dev.c:1705 because:
>>> dev.c:1704 Attempt to WEOF on non-appendable Volume
>>> Wrote Volume label for volume "TestVolume2".
>>> 05-May 19:04 btape JobId 0: Wrote label to prelabeled Volume
>>> "TestVolume2" on device "Drive-1" (/dev/nst0) 05-May 19:04 btape JobId 0:
>>> New volume "TestVolume2" mounted on device "Drive-1" (/dev/nst0) at
>>> 05-May-2009 19:04. Done writing 0 records ...
>>> Wrote End of Session label.
>>> Wrote state file last_block_num1=13009 last_block_num2=11
>>>
>>>
>>> 19:04:42 Done filling tapes at 0:13. Now beginning re-read of first tape
>>> ... 05-May 19:04 btape JobId 0: 3307 Issuing autochanger "unload slot 2,
>>> drive 0" command. 05-May 19:05 btape JobId 0: 3304 Issuing autochanger
>>> "load slot 1, drive 0" command. 05-May 19:05 btape JobId 0: 3305
>>> Autochanger "load slot 1, drive 0", status is OK. 05-May 19:05 btape
>>> JobId 0: Ready to read from volume "TestVolume1" on device "Drive-1"
>>> (/dev/nst0). Rewinding.
>>> Reading the first 10000 records from 0:0.
>>> 10000 records read now at 1:5084
>>> Reposition from 1:5084 to 1226:13009
>>> Reading block 13009.
>>>
>>> The last block of the first tape matches.
>>>
>>> 05-May 19:06 btape JobId 0: 3307 Issuing autochanger "unload slot 1,
>>> drive 0" command. 05-May 19:07 btape JobId 0: 3304 Issuing autochanger
>>> "load slot 2, drive 0" command. 05-May 19:07 btape JobId 0: 3305
>>> Autochanger "load slot 2, drive 0", status is OK. 05-May 19:07 btape
>>> JobId 0: Ready to read from volume "TestVolume2" on device "Drive-1"
>>> (/dev/nst0). Reposition from 0:0 to 0:1
>>> Reading block 1.
>>>
>>> The first block on the second tape matches.
>>>
>>> Reposition from 0:2 to 0:11
>>> Reading block 11.
>>>
>>> The last block on the second tape matches. Test succeeded.
>>>
>>> *
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: yvan [mailto:yvan AT skywalker.is-a-chef DOT com]
>>> Sent: Thursday, May 07, 2009 6:15 AM
>>> To: Win Htin
>>> Cc: bacula-users AT lists.sourceforge DOT net
>>> Subject: Re: [Bacula-users] Tape MTEOM error with Dell TL2000 (IBM >
>>> TS3100)
>>>
>>> Hi !
>>>
>>> Yes, doing that at the moment. It takes a long time to reset all my
>>> tapes, erase them, and start some tests on them .... it takes days ...
>>> More to come soon ...
>>>
>>> By the way, what is the best way to erase a tape ? I tried a
>>> "dd if=/dev/zero of=/dev/st0" but I had to stop it after 24h (maybe I
>>> should use bigger block size to increase speed ?)
>>> mt -f /dev/st0 erase gives me some error : Input/Ouput error after a few
>>> seconds...
>>>
>>> Le 05.05.2009 14:12, Win Htin a écrit :
>>>       
>>>> Did you erase the tapes before re-running the backups?
>>>>
>>>> I would recommend first to completely erase the tape(s), run "btape"
>>>> to make sure everything is working fine and then start testing the
>>>> actual backups. Capture the output while running "btape" and go
>>>> through it line by line to make sure you don't have even a single
>>>> error.
>>>>
>>>> BTW, I forgot to mention I'm running Bacula version 2.2.6 on RHEL4 and
>>>> 2.4.3 on RHEL5.2.
>>>>
>>>> HTH,
>>>> Win
>>>>
>>>>         
>>>>> Message: 7
>>>>> Date: Mon, 04 May 2009 13:13:45 +0200
>>>>> From: yvan<yvan AT skywalker.is-a-chef DOT com>
>>>>> Subject: Re: [Bacula-users] Tape MTEOM error with Dell TL2000 (IBM
>>>>>         TS3100)
>>>>> To: bacula-users AT lists.sourceforge DOT net
>>>>> Message-ID:<49FECDE9.8050204 AT skywalker.is-a-chef DOT com>
>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>>
>>>>> Hi,
>>>>>
>>>>> thank you for the advices. We have very similar configurations, and I'm
>>>>> sure I'm not too far from the solution. I tried to use your settings in
>>>>> the storage daemon config file, and the "mt" options as well, than I
>>>>> didn't try so far.
>>>>>
>>>>> But, as soon as I put those settings, I have another error
>>>>>
>>>>>         Hardware End of Medium = No     # defaut is Yes
>>>>>         Fast Forward Space File = No    # This line required if above
>>>>>
>>>>> which is :
>>>>> 3-May 02:20 setmseblx0007-sd JobId 170: Error: Unable to position to
>>>>> end of data on device "IBMLTO4" (/dev/nst0): ERR=dev.c:1354 ioctl MTFSF
>>>>> error on "IBMLTO4" (/dev/nst0). ERR=Input/output error.
>>>>>
>>>>> Strange that it works for you then ... I issued all the "mt" commands
>>>>> you wrote. Tapeinfo gives me the samed infos as you have, but it's
>>>>> MTFSF error or MTEOM ...
>>>>>
>>>>> Regards
>>>>> Yvan Broccard
>>>>>           
>>> -------------------------------------------------------------------------
>>> ----- The NEW KODAK i700 Series Scanners deliver under ANY circumstances!
>>> Your production scanning environment may not be a perfect world - but
>>> thanks to Kodak, there's a perfect scanner to get the job done! With the
>>> NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with
>>> all image processing features enabled. http://p.sf.net/sfu/kodak-com
>>> _______________________________________________
>>> Bacula-users mailing list
>>> Bacula-users AT lists.sourceforge DOT net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>       
>> ---------------------------------------------------------------------------
>> --- The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
>> production scanning environment may not be a perfect world - but thanks to
>> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
>> i700 Series Scanner you'll get full speed at 300 dpi even with all image
>> processing features enabled. http://p.sf.net/sfu/kodak-com
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>     
>
>   

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users