Bacula-users

Re: [Bacula-users] ML6000 does not work al of a sudden

2012-05-03 16:23:59
Subject: Re: [Bacula-users] ML6000 does not work al of a sudden
From: Tilman Schmidt <t.schmidt AT phoenixsoftware DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 03 May 2012 18:35:40 +0200
I'm neither familiar with the ML6000 nor with Solaris, but in my
experience, it's generally a good idea to take error messages literally,
starting with the very first one. So:

> 27-Feb 19:54 de001bs002-sd JobId 732: Fatal error: Error writing data to 
> spool file. ERR=Disc quota exceeded

Find the location and maximum size of spool files in your Bacula
configuration, then check whether your filesystem quota settings for
that location allow the Bacula Storage Daemon to write that much, and
whether there are old spool files lying around and eating up the quota.

> 27-Feb 19:54 de001bs002-sd JobId 732: Writing spooled data to Volume. 
> Despooling 835,324,011,949 bytes ...
> 27-Feb 19:54 de001bs002-sd JobId 732: Despooling elapsed time = 00:00:01, 
> Transfer rate = 835.3 G Bytes/second
> 27-Feb 19:55 de001bs002-sd JobId 732: Fatal error: Fatal despooling error.
> 27-Feb 19:55 de001bs002-fd JobId 732: Error: bsock.c:393 Write error sending 
> 65536 bytes to Storage daemon:de001bs002:9103: ERR=Broken pipe
> 27-Feb 19:55 de001bs002-fd JobId 732: Fatal error: backup.c:1024 Network send 
> error to SD. ERR=Broken pipe

Those are inherited errors caused by the disk quota problem above. The
tape drive probably wasn't even involved up to then.

> 28-Mar 20:26 de001bs002-sd JobId 736: Volume "000329" previously written, 
> moving to end of data.
> 28-Mar 20:27 de001bs002-sd JobId 736: Error: Unable to position to end of 
> data on device "LTO4-01" (/dev/rmt/0bn): ERR=dev.c:956 ioctl MTEOM error on 
> "LTO4-01" (/dev/rmt/0bn). ERR=I/O error.

A whole month later. That looks like a completely unrelated problem,
possibly a defective tape. Bacula handles that quite gracefully:

> 28-Mar 20:27 de001bs002-sd JobId 736: Marking Volume "000329" in Error in 
> Catalog.
> 28-Mar 20:27 de001bs002-sd JobId 736: 3307 Issuing autochanger "unload slot 
> 72, drive 0" command.
> 28-Mar 20:28 de001bs002-dir JobId 736: There are no more Jobs associated with 
> Volume "000257". Marking it purged.
> 28-Mar 20:28 de001bs002-dir JobId 736: All records pruned from Volume 
> "000257"; marking it "Purged" 
> 28-Mar 20:28 de001bs002-dir JobId 736: Recycled volume "000257" 

It marks the tape as bad and grabs the next one. Problem solved.

> 29-Mar 00:08 de001bs002-sd JobId 736: Fatal error: Error writing data to 
> spool file. ERR=Disc quota exceeded

Again your disk quota problem. Nothing to do with the tape drive.

> Finally we did replaced the tape drive. Update tape library and drive 
> firmware and Bacula to latest version
> But still not working:

Unsurprising. Nothing above points to the tape drive being the cause of
your problem.

> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci8086,3410@9/pci1077,137@0/fp@0,0/tape@w500308c09c25b005,0 (st0):
> Apr 30 15:40:52 de001bs002      Error for Command: rezero/rewind           
> Error Level: Fatal
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        Requested 
> Block: 0                         Error Block: 0
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        Vendor: IBM   
>                              Serial Number:             
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        Sense Key: 
> Not_Ready
> Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice]        ASC: 0x3a 
> (medium not present), ASCQ: 0x0, FRU: 0x30

Which operation did that occur with? "Medium not present" sounds like
"no tape in the drive" to me. What did Bacula log for that incident?

> and here is the log from Bacula itself

That's *not* the log of the same incident. Judging by the timestamps
it's from the following day.

> 01-May 09:47 de001bs002-sd JobId 752: Fatal error: Error writing data to 
> spool file. ERR=Disc quota exceeded

Still the disk quota problem.

> Seems that the problem is that bacula cannot find the end of the tape no more.

Not at all. Rather, it seems that your disk spool area is simply full.

HTH
Tilman

-- 
Tilman Schmidt
Phoenix Software GmbH
Bonn, Germany

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users