Bacula-users

[Bacula-users] Need help with Bacula losing track of tapes

2008-10-30 23:16:01
Subject: [Bacula-users] Need help with Bacula losing track of tapes
From: "Mingus Dew" <shon.stephens AT gmail DOT com>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Thu, 30 Oct 2008 23:12:25 -0400
I'm not entirely sure if its a Bacula specific issue, but I could use some advice (as usual)

I'm running Bacula 2.4.2 on Solaris 10_x68 with and Exabyte LTO-3 Tape libary. What is happening is that at some point an error occurs to the device and then Bacula can no longer load or even determine the correct tape to load for subsequent jobs. My only solution has been to stop Bacula and remove the bacula-dir.state file. If I don't do this, no matter how many times I update slots I can't get Bacula to locate the right slots.

This is the log output

2008-10-16 22:00:00mt-back4.director JobId 14221: Start Backup JobId 14221, Job=NAS_Weekly_Tape.2008-10-16_22.
00.40
2008-10-16 22:00:00mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 1, drive 0" command.
2008-10-16 22:01:45mt-back4.director JobId 14221: Using Device "Ultrium-TD3"
2008-10-16 22:01:45mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command.
2008-10-16 22:01:47mt-back4.storage JobId 14221: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
2008-10-16 22:01:47mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 7, drive 0" command.
2008-10-16 22:04:06mt-back4.storage JobId 14221: 3305 Autochanger "load slot 7, drive 0", status is OK.
2008-10-16 22:04:06mt-back4.storage JobId 14221: Volume "A00043" previously written, moving to end of data.
2008-10-16 22:05:32mt-back4.storage JobId 14221: Ready to append to end of Volume "A00043" at file=663.
2008-10-17 00:16:36mt-back4.storage JobId 14221: End of Volume "A00043" at 1048:9866 on device "Ultrium-TD3" (/dev/rmt
/0cbn). Write of 64512 bytes got 0.
2008-10-17 00:16:39mt-back4.storage JobId 14221: Re-read of last block succeeded.
2008-10-17 00:16:39mt-back4.storage JobId 14221: End of medium on Volume "A00043" Bytes=1,047,677,137,920 Blocks=16,24
0,034 at 17-Oct-2008 00:16.
2008-10-17 00:16:39mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 7, drive 0" command.
2008-10-17 00:18:17mt-back4.director JobId 14221: There are no more Jobs associated with Volume "A00046". Marking it p
urged.
2008-10-17 00:18:17mt-back4.director JobId 14221: All records pruned from Volume "A00046"; marking it "Purged"
2008-10-17 00:18:17mt-back4.director JobId 14221: Recycled volume "A00046"
2008-10-17 00:18:17mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command.
2008-10-17 00:18:20mt-back4.storage JobId 14221: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
2008-10-17 00:18:20mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 10, drive 0" command.
2008-10-17 00:20:39mt-back4.storage JobId 14221: 3305 Autochanger "load slot 10, drive 0", status is OK.
2008-10-17 00:20:40mt-back4.storage JobId 14221: Recycled volume "A00046" on device "Ultrium-TD3" (/dev/rmt/0cbn), all
 previous data lost.
2008-10-17 00:20:40mt-back4.storage JobId 14221: New volume "A00046" mounted on device "Ultrium-TD3" (/dev/rmt/0cbn) a
t 17-Oct-2008 00:20.
2008-10-17 05:42:21mt-back4.storage JobId 14221: End of Volume "A00046" at 727:0 on device "Ultrium-TD3" (/dev/rmt/0cb
n). Write of 64512 bytes got 0.
2008-10-17 05:42:25mt-back4.storage JobId 14221: Error: Backspace record at EOT failed. ERR=I/O error
2008-10-17 05:42:25mt-back4.storage JobId 14221: End of medium on Volume "A00046" Bytes=726,953,472,000 Blocks=11,268,
499 at 17-Oct-2008 05:42.
2008-10-17 05:42:25mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 10, drive 0" command.
2008-10-17 05:44:08mt-back4.director JobId 14221: There are no more Jobs associated with Volume "B00043". Marking it p
urged.
2008-10-17 05:44:09mt-back4.director JobId 14221: All records pruned from Volume "B00043"; marking it "Purged"
2008-10-17 05:44:09mt-back4.director JobId 14221: Recycled volume "B00043"
2008-10-17 05:44:09mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command.
2008-10-17 05:44:11mt-back4.storage JobId 14221: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
2008-10-17 05:44:11mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 19, drive 0" command.
2008-10-17 05:46:35mt-back4.storage JobId 14221: 3305 Autochanger "load slot 19, drive 0", status is OK.
2008-10-17 05:46:35mt-back4.storage JobId 14221: Recycled volume "B00043" on device "Ultrium-TD3" (/dev/rmt/0cbn), all
 previous data lost.
2008-10-17 05:46:35mt-back4.storage JobId 14221: New volume "B00043" mounted on device "Ultrium-TD3" (/dev/rmt/0cbn) a
t 17-Oct-2008 05:46.
2008-10-17 10:06:31mt-back4.storage JobId 14221: End of Volume "B00043" at 928:8964 on device "Ultrium-TD3" (/dev/rmt/
0cbn). Write of 64512 bytes got 0.
2008-10-17 10:06:34mt-back4.storage JobId 14221: Re-read of last block succeeded.
2008-10-17 10:06:34mt-back4.storage JobId 14221: End of medium on Volume "B00043" Bytes=928,518,893,568 Blocks=14,392,
963 at 17-Oct-2008 10:06.
2008-10-17 10:06:34mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 19, drive 0" command.
2008-10-17 10:12:06mt-back4.storage JobId 14221: 3995 Bad autochanger "unload slot 19, drive 0": ERR=Child exited with
 code 1
Results=Unloading Data Transfer Element into Storage Element 19...mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 81 to 19 Failed

2008-10-17 10:12:06mt-back4.director JobId 14221: There are no more Jobs associated with Volume "B00044". Marking it p
urged.
2008-10-17 10:12:06mt-back4.director JobId 14221: All records pruned from Volume "B00044"; marking it "Purged"
2008-10-17 10:12:06mt-back4.director JobId 14221: Recycled volume "B00044"
2008-10-17 10:12:06mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command.
2008-10-17 10:12:12mt-back4.storage JobId 14221: 3991 Bad autochanger "loaded? drive 0" command: ERR=Child exited with
 code 1.
Results=mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Aborted Command
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 29
mtx: Request Sense: Additional Sense Qualifier = 07
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
INQUIRY command Failed

2008-10-17 10:12:12mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command.
2008-10-17 10:12:13mt-back4.storage JobId 14221: 3991 Bad autochanger "loaded? drive 0" command: ERR=Child exited with
 code 1.
Results=mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Not Ready
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 3E
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
INQUIRY command Failed

2008-10-17 10:12:13mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 18, drive 0" command.
2008-10-17 10:16:20mt-back4.storage JobId 14221: Fatal error: 3992 Bad autochanger "load slot 18, drive 0": ERR=Child
exited with code 1.
Results=mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Not Ready
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 3E
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
INQUIRY command Failed

2008-10-17 10:16:20mt-back4.storage JobId 14221: Job write elapsed time = 12:02:37, Transfer rate = 47.04 M bytes/seco
nd
2008-10-17 10:16:20mt-back4.mentora JobId 14221: Fatal error: backup.c:892 Network send error to SD. ERR=Broken pipe
2008-10-17 10:16:20mt-back4.mentora JobId 14221: Error: bsock.c:306 Write error sending 65536 bytes to Storage daemon:
mt-back4.mentora.:9103: ERR=Broken pipe
2008-10-17 10:16:32mt-back4.director JobId 14221: Error: Bacula mt-back4.director 2.4.2 (26Jul08): 17-Oct-2008 10:16:3
2
  Build OS:               i386-pc-solaris2.10 solaris 5.10
  JobId:                  14221
  Job:                    NAS_Weekly_Tape.2008-10-16_22.00.40
  Backup Level:           Full
  Client:                 "mt-back4" 2.4.2 (26Jul08) i386-pc-solaris2.10,solaris,5.10
  FileSet:                "NAS_Files" 2008-06-12 16:00:56
  Pool:                   "Tapes" (From Job resource)
  Storage:                "Exabyte_224" (From Pool resource)
  Scheduled time:         16-Oct-2008 22:00:00
  Start time:             16-Oct-2008 22:01:45
  End time:               17-Oct-2008 10:16:32
  Elapsed time:           12 hours 14 mins 47 secs
  Priority:               10
  FD Files Written:       851,657
  SD Files Written:       851,657
  FD Bytes Written:       2,039,356,838,887 (2.039 TB)
  SD Bytes Written:       2,039,544,213,715 (2.039 TB)
  Rate:                   46257.6 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         A00043|A00046|B00043
  Volume Session Id:      203
  Volume Session Time:    1223923211
  Last Volume Bytes:      1 (1 B)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

Once this happens, other jobs begin having tape failures. This one in particular is strange to me because the tape that Bacula marked as "Recycle" should have then been made appendable.

2008-10-17 10:16:34mt-back4.director JobId 14261: Start Backup JobId 14261, Job=OracleRMAN_Weekly_Tape.2008-10-17_
10.00.20
2008-10-17 10:16:35mt-back4.director JobId 14261: There are no more Jobs associated with Volume "B00039". Marking it p
urged.
2008-10-17 10:16:35mt-back4.director JobId 14261: All records pruned from Volume "B00039"; marking it "Purged"
2008-10-17 10:16:35mt-back4.director JobId 14261: Recycled volume "B00039"
2008-10-17 10:16:35mt-back4.director JobId 14261: Using Device "Ultrium-TD3"
2008-10-17 10:16:38mt-back4.storage JobId 14261: Job OracleRMAN_Weekly_Tape.2008-10-17_10.00.20 waiting. Cannot fi
nd any appendable volumes.
Please use the "label"  command to create a new Volume for:
    Storage:      "Ultrium-TD3" (/dev/rmt/0cbn)
    Pool:         TSN_Tapes
    Media type:   LTO-3
2008-10-17 10:32:51mt-back4.storage JobId 14261: Job OracleRMAN_Weekly_Tape.2008-10-17_10.00.20 marked to be cance
led.
2008-10-17 10:32:51mt-back4.storage JobId 14261: Job OracleRMAN_Weekly_Tape.2008-10-17_10.00.20 canceled while wai
ting for mount on Storage Device ""Ultrium-TD3" (/dev/rmt/0cbn)".
2008-10-17 10:32:51mt-back4.director JobId 14261: Bacula mt-back4.director 2.4.2 (26Jul08): 17-Oct-2008 10:32:51
  Build OS:               i386-pc-solaris2.10 solaris 5.10
  JobId:                  14261
  Job:                    OracleRMAN_Weekly_Tape.2008-10-17_10.00.20
  Backup Level:           Full
  Client:                 "adm9" 2.2.8 (26Jan08) i386-pc-solaris2.8,solaris,5.8
  FileSet:                "Oracle_RMAN" 2008-03-28 07:20:53
  Pool:                   "Tapes" (From Job resource)
  Storage:                "Exabyte_224" (From Pool resource)
  Scheduled time:         17-Oct-2008 10:00:00
  Start time:             17-Oct-2008 10:16:38
  End time:               17-Oct-2008 10:32:51
  Elapsed time:           16 mins 13 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):        
  Volume Session Id:      243
  Volume Session Time:    1223923211
  Last Volume Bytes:      1 (1 B)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Canceled
  SD termination status:  Canceled
  Termination:            Backup Canceled

This is where I cancel the scheduled tape jobs. Stop bacula. Remove the bacula-dir.state file. Start the backups again and everything runs smooth. I'm just not sure why this happens so frequently.

Thanks,
Shon
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Need help with Bacula losing track of tapes, Mingus Dew <=