[Bacula-users] Need help with Bacula losing track of tapes
2008-10-30 23:16:01
I'm not entirely sure if its a Bacula specific issue, but I could use some advice (as usual)
I'm running Bacula 2.4.2 on Solaris 10_x68 with and Exabyte LTO-3 Tape libary. What is happening is that at some point an error occurs to the device and then Bacula can no longer load or even determine the correct tape to load for subsequent jobs. My only solution has been to stop Bacula and remove the bacula-dir.state file. If I don't do this, no matter how many times I update slots I can't get Bacula to locate the right slots.
This is the log output
2008-10-16 22:00:00mt-back4.director JobId 14221: Start Backup JobId 14221, Job=NAS_Weekly_Tape.2008-10-16_22. 00.40 2008-10-16 22:00:00mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 1, drive 0" command.
2008-10-16 22:01:45mt-back4.director JobId 14221: Using Device "Ultrium-TD3" 2008-10-16 22:01:45mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command. 2008-10-16 22:01:47mt-back4.storage JobId 14221: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
2008-10-16 22:01:47mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 7, drive 0" command. 2008-10-16 22:04:06mt-back4.storage JobId 14221: 3305 Autochanger "load slot 7, drive 0", status is OK.
2008-10-16 22:04:06mt-back4.storage JobId 14221: Volume "A00043" previously written, moving to end of data. 2008-10-16 22:05:32mt-back4.storage JobId 14221: Ready to append to end of Volume "A00043" at file=663.
2008-10-17 00:16:36mt-back4.storage JobId 14221: End of Volume "A00043" at 1048:9866 on device "Ultrium-TD3" (/dev/rmt /0cbn). Write of 64512 bytes got 0. 2008-10-17 00:16:39mt-back4.storage JobId 14221: Re-read of last block succeeded.
2008-10-17 00:16:39mt-back4.storage JobId 14221: End of medium on Volume "A00043" Bytes=1,047,677,137,920 Blocks=16,24 0,034 at 17-Oct-2008 00:16. 2008-10-17 00:16:39mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 7, drive 0" command.
2008-10-17 00:18:17mt-back4.director JobId 14221: There are no more Jobs associated with Volume "A00046". Marking it p urged. 2008-10-17 00:18:17mt-back4.director JobId 14221: All records pruned from Volume "A00046"; marking it "Purged"
2008-10-17 00:18:17mt-back4.director JobId 14221: Recycled volume "A00046" 2008-10-17 00:18:17mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command. 2008-10-17 00:18:20mt-back4.storage JobId 14221: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
2008-10-17 00:18:20mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 10, drive 0" command. 2008-10-17 00:20:39mt-back4.storage JobId 14221: 3305 Autochanger "load slot 10, drive 0", status is OK.
2008-10-17 00:20:40mt-back4.storage JobId 14221: Recycled volume "A00046" on device "Ultrium-TD3" (/dev/rmt/0cbn), all previous data lost. 2008-10-17 00:20:40mt-back4.storage JobId 14221: New volume "A00046" mounted on device "Ultrium-TD3" (/dev/rmt/0cbn) a
t 17-Oct-2008 00:20. 2008-10-17 05:42:21mt-back4.storage JobId 14221: End of Volume "A00046" at 727:0 on device "Ultrium-TD3" (/dev/rmt/0cb n). Write of 64512 bytes got 0. 2008-10-17 05:42:25mt-back4.storage JobId 14221: Error: Backspace record at EOT failed. ERR=I/O error
2008-10-17 05:42:25mt-back4.storage JobId 14221: End of medium on Volume "A00046" Bytes=726,953,472,000 Blocks=11,268, 499 at 17-Oct-2008 05:42. 2008-10-17 05:42:25mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 10, drive 0" command.
2008-10-17 05:44:08mt-back4.director JobId 14221: There are no more Jobs associated with Volume "B00043". Marking it p urged. 2008-10-17 05:44:09mt-back4.director JobId 14221: All records pruned from Volume "B00043"; marking it "Purged"
2008-10-17 05:44:09mt-back4.director JobId 14221: Recycled volume "B00043" 2008-10-17 05:44:09mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command. 2008-10-17 05:44:11mt-back4.storage JobId 14221: 3302 Autochanger "loaded? drive 0", result: nothing loaded.
2008-10-17 05:44:11mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 19, drive 0" command. 2008-10-17 05:46:35mt-back4.storage JobId 14221: 3305 Autochanger "load slot 19, drive 0", status is OK.
2008-10-17 05:46:35mt-back4.storage JobId 14221: Recycled volume "B00043" on device "Ultrium-TD3" (/dev/rmt/0cbn), all previous data lost. 2008-10-17 05:46:35mt-back4.storage JobId 14221: New volume "B00043" mounted on device "Ultrium-TD3" (/dev/rmt/0cbn) a
t 17-Oct-2008 05:46. 2008-10-17 10:06:31mt-back4.storage JobId 14221: End of Volume "B00043" at 928:8964 on device "Ultrium-TD3" (/dev/rmt/ 0cbn). Write of 64512 bytes got 0. 2008-10-17 10:06:34mt-back4.storage JobId 14221: Re-read of last block succeeded.
2008-10-17 10:06:34mt-back4.storage JobId 14221: End of medium on Volume "B00043" Bytes=928,518,893,568 Blocks=14,392, 963 at 17-Oct-2008 10:06. 2008-10-17 10:06:34mt-back4.storage JobId 14221: 3307 Issuing autochanger "unload slot 19, drive 0" command.
2008-10-17 10:12:06mt-back4.storage JobId 14221: 3995 Bad autochanger "unload slot 19, drive 0": ERR=Child exited with code 1 Results=Unloading Data Transfer Element into Storage Element 19...mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=0 (Unknown?!) mtx: Request Sense: Sense Key=No Sense mtx: Request Sense: FileMark=no mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00 mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no mtx: Request Sense: SKSV=no MOVE MEDIUM from Element Address 81 to 19 Failed
2008-10-17 10:12:06mt-back4.director JobId 14221: There are no more Jobs associated with Volume "B00044". Marking it p urged. 2008-10-17 10:12:06mt-back4.director JobId 14221: All records pruned from Volume "B00044"; marking it "Purged"
2008-10-17 10:12:06mt-back4.director JobId 14221: Recycled volume "B00044" 2008-10-17 10:12:06mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command. 2008-10-17 10:12:12mt-back4.storage JobId 14221: 3991 Bad autochanger "loaded? drive 0" command: ERR=Child exited with
code 1. Results=mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=70 (Current) mtx: Request Sense: Sense Key=Aborted Command mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 29 mtx: Request Sense: Additional Sense Qualifier = 07 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no INQUIRY command Failed
2008-10-17 10:12:12mt-back4.storage JobId 14221: 3301 Issuing autochanger "loaded? drive 0" command. 2008-10-17 10:12:13mt-back4.storage JobId 14221: 3991 Bad autochanger "loaded? drive 0" command: ERR=Child exited with
code 1. Results=mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=70 (Current) mtx: Request Sense: Sense Key=Not Ready mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 3E mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no INQUIRY command Failed
2008-10-17 10:12:13mt-back4.storage JobId 14221: 3304 Issuing autochanger "load slot 18, drive 0" command. 2008-10-17 10:16:20mt-back4.storage JobId 14221: Fatal error: 3992 Bad autochanger "load slot 18, drive 0": ERR=Child
exited with code 1. Results=mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=70 (Current) mtx: Request Sense: Sense Key=Not Ready mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 3E mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no INQUIRY command Failed
2008-10-17 10:16:20mt-back4.storage JobId 14221: Job write elapsed time = 12:02:37, Transfer rate = 47.04 M bytes/seco nd 2008-10-17 10:16:20mt-back4.mentora JobId 14221: Fatal error: backup.c:892 Network send error to SD. ERR=Broken pipe
2008-10-17 10:16:20mt-back4.mentora JobId 14221: Error: bsock.c:306 Write error sending 65536 bytes to Storage daemon: mt-back4.mentora.:9103: ERR=Broken pipe 2008-10-17 10:16:32mt-back4.director JobId 14221: Error: Bacula mt-back4.director 2.4.2 (26Jul08): 17-Oct-2008 10:16:3
2 Build OS: i386-pc-solaris2.10 solaris 5.10 JobId: 14221 Job: NAS_Weekly_Tape.2008-10-16_22.00.40 Backup Level: Full Client: "mt-back4" 2.4.2 (26Jul08) i386-pc-solaris2.10,solaris,5.10
FileSet: "NAS_Files" 2008-06-12 16:00:56 Pool: "Tapes" (From Job resource) Storage: "Exabyte_224" (From Pool resource) Scheduled time: 16-Oct-2008 22:00:00
Start time: 16-Oct-2008 22:01:45 End time: 17-Oct-2008 10:16:32 Elapsed time: 12 hours 14 mins 47 secs Priority: 10 FD Files Written: 851,657
SD Files Written: 851,657 FD Bytes Written: 2,039,356,838,887 (2.039 TB) SD Bytes Written: 2,039,544,213,715 (2.039 TB) Rate: 46257.6 KB/s Software Compression: None
VSS: no Storage Encryption: no Volume name(s): A00043|A00046|B00043 Volume Session Id: 203 Volume Session Time: 1223923211 Last Volume Bytes: 1 (1 B)
Non-fatal FD errors: 1 SD Errors: 0 FD termination status: Error SD termination status: Error Termination: *** Backup Error ***
Once this happens, other jobs begin having tape failures. This one in particular is strange to me because the tape that Bacula marked as "Recycle" should have then been made appendable.
2008-10-17 10:16:34mt-back4.director JobId 14261: Start Backup JobId 14261, Job=OracleRMAN_Weekly_Tape.2008-10-17_ 10.00.20 2008-10-17 10:16:35mt-back4.director JobId 14261: There are no more Jobs associated with Volume "B00039". Marking it p
urged. 2008-10-17 10:16:35mt-back4.director JobId 14261: All records pruned from Volume "B00039"; marking it "Purged" 2008-10-17 10:16:35mt-back4.director JobId 14261: Recycled volume "B00039"
2008-10-17 10:16:35mt-back4.director JobId 14261: Using Device "Ultrium-TD3" 2008-10-17 10:16:38mt-back4.storage JobId 14261: Job OracleRMAN_Weekly_Tape.2008-10-17_10.00.20 waiting. Cannot fi nd any appendable volumes.
Please use the "label" command to create a new Volume for: Storage: "Ultrium-TD3" (/dev/rmt/0cbn) Pool: TSN_Tapes Media type: LTO-3 2008-10-17 10:32:51mt-back4.storage JobId 14261: Job OracleRMAN_Weekly_Tape.2008-10-17_10.00.20 marked to be cance
led. 2008-10-17 10:32:51mt-back4.storage JobId 14261: Job OracleRMAN_Weekly_Tape.2008-10-17_10.00.20 canceled while wai ting for mount on Storage Device ""Ultrium-TD3" (/dev/rmt/0cbn)". 2008-10-17 10:32:51mt-back4.director JobId 14261: Bacula mt-back4.director 2.4.2 (26Jul08): 17-Oct-2008 10:32:51
Build OS: i386-pc-solaris2.10 solaris 5.10 JobId: 14261 Job: OracleRMAN_Weekly_Tape.2008-10-17_10.00.20 Backup Level: Full Client: "adm9" 2.2.8 (26Jan08) i386-pc-solaris2.8,solaris,5.8
FileSet: "Oracle_RMAN" 2008-03-28 07:20:53 Pool: "Tapes" (From Job resource) Storage: "Exabyte_224" (From Pool resource) Scheduled time: 17-Oct-2008 10:00:00
Start time: 17-Oct-2008 10:16:38 End time: 17-Oct-2008 10:32:51 Elapsed time: 16 mins 13 secs Priority: 10 FD Files Written: 0 SD Files Written: 0
FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS: no Storage Encryption: no Volume name(s):
Volume Session Id: 243 Volume Session Time: 1223923211 Last Volume Bytes: 1 (1 B) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: Canceled SD termination status: Canceled
Termination: Backup Canceled
This is where I cancel the scheduled tape jobs. Stop bacula. Remove the bacula-dir.state file. Start the backups again and everything runs smooth. I'm just not sure why this happens so frequently.
Thanks, Shon
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Bacula-users] Need help with Bacula losing track of tapes,
Mingus Dew <=
|
|
|