Bacula-users

[Bacula-users] ML6000 does not work al of a sudden

2012-05-03 11:07:22
Subject: [Bacula-users] ML6000 does not work al of a sudden
From: "Satmarean, Mihai" <mihai.satmarean AT trixter DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 3 May 2012 17:03:45 +0200 (CEST)

Hi all,

please help me track the cause of the problem.

Short story is that we used successfully Bacula for years until that day when this error appeared in the messages:
- Server OpenIndiana 147
- Bacula 5.0.3
- Tape Library Dell ML6000 connected by FC

27-Feb 19:54 de001bs002-sd JobId 732: Fatal error: Error writing data to spool file. ERR=Disc quota exceeded
27-Feb 19:54 de001bs002-sd JobId 732: Writing spooled data to Volume. Despooling 835,324,011,949 bytes ...
27-Feb 19:54 de001bs002-sd JobId 732: Despooling elapsed time = 00:00:01, Transfer rate = 835.3 G Bytes/second
27-Feb 19:55 de001bs002-sd JobId 732: Fatal error: Fatal despooling error.
27-Feb 19:55 de001bs002-fd JobId 732: Error: bsock.c:393 Write error sending 65536 bytes to Storage daemon:de001bs002:9103: ERR=Broken pipe
27-Feb 19:55 de001bs002-fd JobId 732: Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe
27-Feb 19:55 de001bs002-dir JobId 732: Error: Bacula de001bs002-dir 5.0.3 (04Aug10): 27-Feb-2012 19:55:21
  Build OS:               i386-pc-solaris2.11 solaris 5.11
  JobId:                  732
  Job:                    BACKUP_A.2012-02-27_17.09.00_03
  Backup Level:           Full
  Client:                 "de001bs002-fd" 5.0.3 (04Aug10) i386-pc-solaris2.11,solaris,5.11
  FileSet:                "BACKUP_A" 2011-11-16 12:28:39
  Pool:                   "BACKUP_A" (From Job resource)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "LTO4-01" (From Job resource)
  Scheduled time:         27-Feb-2012 17:08:56
  Start time:             27-Feb-2012 17:09:02
  End time:               27-Feb-2012 19:55:21
  Elapsed time:           2 hours 46 mins 19 secs
  Priority:               10
  FD Files Written:       210,602
  SD Files Written:       210,602
  FD Bytes Written:       834,493,153,036 (834.4 GB)
  SD Bytes Written:       834,546,185,158 (834.5 GB)
  Rate:                   83624.9 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         000329
  Volume Session Id:      1
  Volume Session Time:    1330355979
  Last Volume Bytes:      64,512 (64.51 KB)
  Non-fatal FD errors:    1
  SD Errors:              1
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

Then
At another try:

Connecting to Director de001bs002:9101
1000 OK: de001bs002-dir Version: 5.0.3 (04 August 2010)
Enter a period to cancel a command.
*messages
28-Mar 20:26 de001bs002-sd JobId 736: 3304 Issuing autochanger "load slot 72, drive 0" command.
28-Mar 20:26 de001bs002-sd JobId 736: 3305 Autochanger "load slot 72, drive 0", status is OK.
28-Mar 20:26 de001bs002-sd JobId 736: Volume "000329" previously written, moving to end of data.
28-Mar 20:27 de001bs002-sd JobId 736: Error: Unable to position to end of data on device "LTO4-01" (/dev/rmt/0bn): ERR=dev.c:956 ioctl MTEOM error on "LTO4-01" (/dev/rmt/0bn). ERR=I/O error.

28-Mar 20:27 de001bs002-sd JobId 736: Marking Volume "000329" in Error in Catalog.
28-Mar 20:27 de001bs002-sd JobId 736: 3307 Issuing autochanger "unload slot 72, drive 0" command.
28-Mar 20:28 de001bs002-dir JobId 736: There are no more Jobs associated with Volume "000257". Marking it purged.
28-Mar 20:28 de001bs002-dir JobId 736: All records pruned from Volume "000257"; marking it "Purged" 
28-Mar 20:28 de001bs002-dir JobId 736: Recycled volume "000257" 
28-Mar 20:28 de001bs002-sd JobId 736: 3301 Issuing autochanger "loaded? drive 0" command.
28-Mar 20:28 de001bs002-sd JobId 736: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
28-Mar 20:28 de001bs002-sd JobId 736: 3304 Issuing autochanger "load slot 49, drive 0" command.
28-Mar 20:29 de001bs002-sd JobId 736: 3305 Autochanger "load slot 49, drive 0", status is OK.
28-Mar 20:29 de001bs002-sd JobId 736: Recycled volume "000257" on device "LTO4-01" (/dev/rmt/0bn), all previous data lost.
28-Mar 20:29 de001bs002-sd JobId 736: Spooling data ...
29-Mar 00:08 de001bs002-sd JobId 736: Fatal error: Error writing data to spool file. ERR=Disc quota exceeded
29-Mar 00:08 de001bs002-sd JobId 736: Writing spooled data to Volume. Despooling 835,324,140,997 bytes ...
29-Mar 00:08 de001bs002-sd JobId 736: Despooling elapsed time = 00:00:01, Transfer rate = 835.3 G Bytes/second
29-Mar 00:09 de001bs002-sd JobId 736: Fatal error: Fatal despooling error.
29-Mar 00:09 de001bs002-fd JobId 736: Error: bsock.c:393 Write error sending 65536 bytes to Storage daemon:de001bs002:9103: ERR=Broken pipe
29-Mar 00:09 de001bs002-fd JobId 736: Fatal error: backup.c:1024 Network send error to SD. ERR=Broken pipe
29-Mar 00:10 de001bs002-dir JobId 736: Error: Bacula de001bs002-dir 5.0.3 (04Aug10): 29-Mar-2012 00:10:02
  Build OS:               i386-pc-solaris2.11 solaris 5.11
  JobId:                  736
  Job:                    BACKUP_A.2012-03-28_20.24.51_17
  Backup Level:           Full
  Client:                 "de001bs002-fd" 5.0.3 (04Aug10) i386-pc-solaris2.11,solaris,5.11
  FileSet:                "BACKUP_A" 2011-11-16 12:28:39
  Pool:                   "BACKUP_A" (From Job resource)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "LTO4-01" (From Job resource)
  Scheduled time:         28-Mar-2012 20:24:47
  Start time:             28-Mar-2012 20:24:53
  End time:               29-Mar-2012 00:10:02
  Elapsed time:           3 hours 45 mins 9 secs
  Priority:               10
  FD Files Written:       210,602
  SD Files Written:       210,602
  FD Bytes Written:       834,493,284,108 (834.4 GB)
  SD Bytes Written:       834,546,316,230 (834.5 GB)
  Rate:                   61773.1 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         000257
  Volume Session Id:      5
  Volume Session Time:    1330355979
  Last Volume Bytes:      64,512 (64.51 KB)
  Non-fatal FD errors:    1
  SD Errors:              1
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

Finally we did replaced the tape drive. Update tape library and drive firmware and Bacula to latest version

But still not working:

This is the log from /var/adm/messages

Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3410@9/pci1077,137@0/fp@0,0/tape@w500308c09c25b005,0 (st0): Apr 30 15:40:52 de001bs002 Error for Command: rezero/rewind Error Level: Fatal Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number: Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice] Sense Key: Not_Ready Apr 30 15:40:52 de001bs002 scsi: [ID 107833 kern.notice] ASC: 0x3a (medium not present), ASCQ: 0x0, FRU: 0x30 Apr 30 15:41:11 de001bs002 bacula-sd[8286]: [ID 702911 daemon.error] bsock.c:339 Socket has errors=1 on call to client:10.11.25.2:36643 Apr 30 16:01:16 de001bs002 bacula-dir[8297]: [ID 699366 daemon.error] Shutting down Bacula service: de001bs002-dir ... Apr 30 16:01:16 de001bs002 bacula-fd[8291]: [ID 699366 daemon.error] Shutting down Bacula service: de001bs002-fd ... Apr 30 16:01:16 de001bs002 bacula-sd[8286]: [ID 699366 daemon.error] Shutting down Bacula service: de001bs002-sd ... Apr 30 16:40:15 de001bs002 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3410@9/pci1077,137@0/fp@0,0/tape@w500308c09c25b005,0 (st0): Apr 30 16:40:15 de001bs002 Error for Command: rezero/rewind Error Level: Fatal Apr 30 16:40:15 de001bs002 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Apr 30 16:40:15 de001bs002 scsi: [ID 107833 kern.notice] Vendor: IBM Serial Number: Apr 30 16:40:15 de001bs002 scsi: [ID 107833 kern.notice] Sense Key: Not_Ready Apr 30 16:40:15 de001bs002 scsi: [ID 107833 kern.notice] ASC: 0x3a (medium not present), ASCQ: 0x0, FRU: 0x30


and here is the log from Bacula itself

Connecting to Director de001bs002:9101 1000 OK: de001bs002-dir Version: 5.2.6 (21 February 2012) Enter a period to cancel a command. *messages 01-May 09:47 de001bs002-sd JobId 752: Fatal error: Error writing data to spool file. ERR=Disc quota exceeded 01-May 09:47 de001bs002-sd JobId 752: Writing spooled data to Volume. Despooling 835,324,266,196 bytes ... 01-May 09:47 de001bs002-sd JobId 752: Despooling elapsed time = 00:00:01, Transfer rate = 835.3 G Bytes/second 01-May 09:52 de001bs002-sd JobId 752: Fatal error: Fatal despooling error. 01-May 09:52 de001bs002-fd JobId 752: Error: bsock.c:389 Write error sending 65536 bytes to Storage daemon:de001bs002:9103: ERR=Broken pipe 01-May 09:52 de001bs002-fd JobId 752: Fatal error: backup.c:1190 Network send error to SD. ERR=Broken pipe 01-May 09:52 de001bs002-dir JobId 752: Error: Bacula de001bs002-dir 5.2.6 (21Feb12): Build OS: i386-pc-solaris2.11 solaris 5.11 JobId: 752 Job: BACKUP_A.2012-04-30_20.13.36_03 Backup Level: Full Client: "de001bs002-fd" 5.2.6 (21Feb12) i386-pc-solaris2.11,solaris,5.11 FileSet: "BACKUP_A" 2011-11-16 12:28:39 Pool: "BACKUP_A" (From Job resource) Catalog: "MyCatalog" (From Client resource) Storage: "LTO4-01" (From Job resource) Scheduled time: 30-Apr-2012 20:13:11 Start time: 30-Apr-2012 20:13:38 End time: 01-May-2012 09:52:58 Elapsed time: 13 hours 39 mins 20 secs Priority: 10 FD Files Written: 210,602 SD Files Written: 210,602 FD Bytes Written: 834,492,956,428 (834.4 GB) SD Bytes Written: 834,546,409,754 (834.5 GB) Rate: 16975.0 KB/s Software Compression: None VSS: no Encryption: no Accurate: no Volume name(s): Volume Session Id: 1 Volume Session Time: 1335809582 Last Volume Bytes: 64,512 (64.51 KB) Non-fatal FD errors: 1 SD Errors: 1 FD termination status: Error SD termination status: Error Termination: *** Backup Error ***


I am quite new to Bacula,
I did search the Internet for two months now and did not find a fix for this.

Seems that the problem is that bacula cannot find the end of the tape no more.


Please help me in this matter
Thank you

Mihai
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users