Bacula-users

Re: [Bacula-users] ML6000 does not work al of a sudden

2012-05-06 05:52:17
Subject: Re: [Bacula-users] ML6000 does not work al of a sudden
From: "Satmarean, Mihai" <mihai.satmarean AT trixter DOT de>
To: Tilman Schmidt <t.schmidt AT phoenixsoftware DOT de>
Date: Sun, 6 May 2012 11:47:36 +0200 (CEST)
Hi all,

After following Tilman's advice, the backup went ok.

Thanks a lot.



From: "Mihai Satmarean" <mihai.satmarean AT trixter DOT de>
To: "Tilman Schmidt" <t.schmidt AT phoenixsoftware DOT de>
Cc: bacula-users AT lists.sourceforge DOT net
Sent: Friday, May 4, 2012 9:57:14 AM
Subject: Re: [Bacula-users] ML6000 does not work al of a sudden

Hi Tilman,

Thanks a lot for your reply,
you were right about the spool file, I did searched for files there and I found a file of approx 725 Gb dated 22.02.2012 which is also the date when our problems begin.

I'll erase that and start a new job.

I need to explain to you a bit in detail what happened:

The tape drive had an error that kept coming:

Name: T050 - TapeAlert 31 - Hardware B
Ticket Number: 18           State: UnOpened
Creation Date: 05-May-2011 11:39:08           Severity: 2
Last Occurrence: 05-May-2011 11:39:08           
Description:
Tape drive internal hardware problem detected.
Technical Details
Fault code generic     0
Fault code specific     22
Element type generic     UDS-3 Sled
Element Location     [0,2]
Drive Sled Serial Number     1310022196
Partition     library_a
Fault description     All Tape Alerts posted-> 31


Also Calibration error and such.

So after 3 weeks of diagnosis with Dell, we got another tape drive.


But the new tape drive did not write tapes also (the latest errors in the first post are from the second tape drive and most probably is the spool quota as you pointed out)
and here is the output of btape for the new tape drive

#>btape /dev/rmt/0bn
Tape block granularity is 1024 bytes.
btape: butil.c:284 Using device: "/dev/rmt/0bn" for writing.
23-Apr 14:54 btape JobId 0: 3301 Issuing autochanger "loaded? drive 0" command.
test
23-Apr 14:54 btape JobId 0: 3302 Autochanger "loaded? drive 0", result is Slot 47.
btape: btape.c:476 open device "LTO4-01" (/dev/rmt/0bn): OK
*
=== Write, rewind, and re-read test ===

I'm going to write 10000 records and an EOF
then write 10000 records and an EOF, then rewind,
and re-read the data to verify that it is correct.

This is an *essential* feature ...

btape: btape.c:1148 Wrote 10000 blocks of 64412 bytes.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1164 Wrote 10000 blocks of 64412 bytes.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1206 Rewind OK.
10000 blocks re-read correctly.
Got EOF on tape.
10000 blocks re-read correctly.
=== Test Succeeded. End Write, rewind, and re-read test ===

btape: btape.c:1274 Block position test
btape: btape.c:1286 Rewind OK.
Reposition to file:block 0:4
Block 5 re-read correctly.
Reposition to file:block 0:200
Block 201 re-read correctly.
Reposition to file:block 0:9999
Block 10000 re-read correctly.
Reposition to file:block 1:0
Block 10001 re-read correctly.
Reposition to file:block 1:600
Block 10601 re-read correctly.
Reposition to file:block 1:9999
Block 20000 re-read correctly.
=== Test Succeeded. End Write, rewind, and re-read test ===

=== Append files test ===

This test is essential to Bacula.

I'm going to write one record  in file 0,
                   two records in file 1,
             and three records in file 2

btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:476 open device "LTO4-01" (/dev/rmt/0bn): OK
btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1418 Now moving to end of medium.
btape: btape.c:626 dev.c:956 ioctl MTEOM error on "LTO4-01" (/dev/rmt/0bn). ERR=I/O error.
We should be in file 3. I am at file 0. This is NOT correct!!!!

Append test failed. Attempting again.
Setting "Hardware End of Medium = no
    and "Fast Forward Space File = no
and retrying append test.

=== Append files test ===

This test is essential to Bacula.

I'm going to write one record  in file 0,
                   two records in file 1,
             and three records in file 2

btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:476 open device "LTO4-01" (/dev/rmt/0bn): OK
btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1418 Now moving to end of medium.
btape: btape.c:629 Moved to end of medium.
We should be in file 3. I am at file 3. This is correct!

Now the important part, I am going to attempt to append to the tape.

btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
Done appending, there should be no I/O errors

Doing Bacula scan of blocks:
1 block of 64448 bytes in file 1
End of File mark.
2 blocks of 64448 bytes in file 2
End of File mark.
3 blocks of 64448 bytes in file 3
End of File mark.
Total files=3, blocks=6, bytes = 386,688
End scanning the tape.
We should be in file 4. I am at file 3. This is NOT correct!!!!

It looks like the append failed. Attempting again.
Setting "BSF at EOM = yes" and retrying append test.

=== Append files test ===

This test is essential to Bacula.

I'm going to write one record  in file 0,
                   two records in file 1,
             and three records in file 2

btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:476 open device "LTO4-01" (/dev/rmt/0bn): OK
btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1418 Now moving to end of medium.
btape: btape.c:629 Moved to end of medium.
We should be in file 3. I am at file 3. This is correct!

Now the important part, I am going to attempt to append to the tape.

btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
Done appending, there should be no I/O errors

Doing Bacula scan of blocks:
1 block of 64448 bytes in file 1
End of File mark.
2 blocks of 64448 bytes in file 2
End of File mark.
3 blocks of 64448 bytes in file 3
End of File mark.
1 block of 64448 bytes in file 4
End of File mark.
Total files=4, blocks=7, bytes = 451,136
End scanning the tape.
We should be in file 4. I am at file 4. This is correct!

It looks like the test worked this time, please add:

    Hardware End of Medium = No
    Fast Forward Space File = No
    BSF at EOM = yes

to your Device resource in the Storage conf file.

The above Bacula scan should have output identical to what follows.
Please double check it ...
=== Sample correct output ===
1 block of 64448 bytes in file 1
End of File mark.
2 blocks of 64448 bytes in file 2
End of File mark.
3 blocks of 64448 bytes in file 3
End of File mark.
1 block of 64448 bytes in file 4
End of File mark.
Total files=4, blocks=7, bytes = 451,136
=== End sample correct output ===

If the above scan output is not identical to the
sample output, you MUST correct the problem
or Bacula will not be able to write multiple Jobs to 
the tape.

Skipping read backwards test because BSR turned off.

=== Forward space files test ===

This test is essential to Bacula.

I'm going to write five files then test forward spacing

btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1905 Wrote one record of 64412 bytes.
btape: btape.c:1907 Wrote block to device.
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:608 Wrote 1 EOF to "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1632 Now forward spacing 1 file.
We should be in file 1. I am at file 1. This is correct!
btape: btape.c:1644 Now forward spacing 2 files.
We should be in file 3. I am at file 3. This is correct!
btape: btape.c:578 Rewound "LTO4-01" (/dev/rmt/0bn)
btape: btape.c:1657 Now forward spacing 4 files.
We should be in file 4. I am at file 4. This is correct!

btape: btape.c:1675 Now forward spacing 1 more file.
We should be in file 5. I am at file 5. This is correct!

=== End Forward space files test ===

Ah, I see you have an autochanger configured.
To test the autochanger you must have a blank tape
 that I can write on in Slot 1.

Do you wish to continue with the Autochanger test? (y/n): n


So, from here I presumed that it has something to do with not finding the end of the tape.

I'll keep you posted.
Thanks again


Mihai Satmarean
System Administrator

phone:
mobile:  
fax:
mail:
web:
+49 89 287018 0  
 
+49 89 287018 18  
mihai.satmarean AT trixter DOT de 
http://www.trixter.de

TRIXTER Film GmbH
Amalienstrasse 71
80799 Munich, Germany

TRIXTER FILM GMBH Gesellschaft mit beschraenkter Haftung, Sitz Muenchen, Registergericht Muenchen, HRB 125141
TRIXTER PRODUCTIONS GMBH & CO. KG: Kommanditgesellschaft, Sitz Muenchen, Registergericht Muenchen, HRA 80561
Persoenlich haftender Gesellschafter: TRIXTER VERWALTUNGS GMBH, Sitz Muenchen, Registergericht Muenchen, HRB 144097



From: "Tilman Schmidt" <t.schmidt AT phoenixsoftware DOT de>
To: bacula-users AT lists.sourceforge DOT net
Sent: Thursday, May 3, 2012 6:35:40 PM
Subject: Re: [Bacula-users] ML6000 does not work al of a sudden

I'm neither familiar with the ML6000 nor with Solaris, but in my
experience, it's generally a good idea to take error messages literally,
starting with the very first one. So:

> 27-Feb 19:54 de001bs002-sd JobId 732: Fatal error: Error writing data to spool file. ERR=Disc quota exceeded

Find the location and maximum size of spool files in your Bacula
configuration, then check whether your filesystem quota settings for
that location allow the Bacula Storage Daemon to write that much, and
whether there are old spool files lying around and eating up the quota.

> 27-Feb 19:54 de001bs002-sd JobId 732: Writing spooled data to Volume. Despooling 835,324,011,949 bytes ...
> 27-Feb 19:54 de001bs002-sd JobId 732: Despooling elapsed time = 00:00:01, Transfer rate = 835.3 G Bytes/second
> 27-Feb 19:55 de001bs002-sd JobId 732: Fatal error: Fatal despooling error.
> 27-Feb 19:55 de001bs002-fd JobId 732: Error: bsock.c:393 Write error sending 65536 bytes to Storage daemon:de001bs002:9103: ERR=Broken pipe
> 27-Feb 19:55 de001bs002-fd JobId 732: Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe

Those are inherited errors caused by the disk quota problem above. The
tape drive probably wasn't even involved up to then.

> 28-Mar 20:26 de001bs002-sd JobId 736: Volume "000329" previously written, moving to end of data.
> 28-Mar 20:27 de001bs002-sd JobId 736: Error: Unable to position to end of data on device "LTO4-01" (/dev/rmt/0bn): ERR=dev.c:956 ioctl MTEOM error on "LTO4-01" (/dev/rmt/0bn). ERR=I/O error.

A whole month later. That looks like a completely unrelated problem,
possibly a defective tape. Bacula handles that quite gracefully:

> 28-Mar 20:27 de001bs002-sd JobId 736: Marking Volume "000329" in Error in Catalog.
> 28-Mar 20:27 de001bs002-sd JobId 736: 3307 Issuing autochanger "unload slot 72, drive 0" command.
> 28-Mar 20:28 de001bs002-dir JobId 736: There are no more Jobs associated with Volume "000257". Marking it purged.
> 28-Mar 20:28 de001bs002-dir JobId 736: All records pruned from Volume "000257"; marking it "Purged"
> 28-Mar 20:28 de001bs002-dir JobId 736: Recycled volume "000257"

It marks the tape as bad and grabs the next one. Problem solved.

> 29-Mar 00:08 de001bs002-sd JobId 736: Fatal error: Error writing data to spool file. ERR=Disc quota exceeded

Again your disk quota problem. Nothing to do with the tape drive.

> Finally we did replaced the tape drive. Update tape library and drive firmware and Bacula to latest version
> But still not working:

Unsurprising. Nothing above points to the tape drive being the cause of
your problem.

> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3410@9/pci1077,137@0/fp@0,0/tape@w500308c09c25b005,0 (st0):
> Apr 30 15:40:52 de001bs002      Error for Command: rezero/rewind           Error Level: Fatal
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        Requested Block: 0                         Error Block: 0
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        Vendor: IBM                                Serial Number:            
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        Sense Key: Not_Ready
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        ASC: 0x3a (medium not present), ASCQ: 0x0, FRU: 0x30

Which operation did that occur with? "Medium not present" sounds like
"no tape in the drive" to me. What did Bacula log for that incident?

> and here is the log from Bacula itself

That's *not* the log of the same incident. Judging by the timestamps
it's from the following day.

> 01-May 09:47 de001bs002-sd JobId 752: Fatal error: Error writing data to spool file. ERR=Disc quota exceeded

Still the disk quota problem.

> Seems that the problem is that bacula cannot find the end of the tape no more.

Not at all. Rather, it seems that your disk spool area is simply full.

HTH
Tilman

--
Tilman Schmidt
Phoenix Software GmbH
Bonn, Germany

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users