Re: [Bacula-users] Tape Jobs failing since upgrade to 2.4.0
2008-08-04 12:39:57
On Mon, Aug 4, 2008 at 12:35 PM, Mingus Dew <shon.stephens AT gmail DOT com>
wrote:
> All,
>
> I have been having problems with my tape jobs failing since upgrading
> to 2.4.0. I am running Bacula 2.4.0, compiled from source, on Solaris 10_x86
> platform. My tape drive is a SCSI attached Exabyte Magnum 224 Autoloader
> with LTO-3 drive. Previously I rarely had issues with tape jobs beyond the
> occasional volume replacement.
>
> Initially I thought I might need to increase the timeout for drive
> responses. I did so in mtx-changer script. However, I am not sure this is
> the problem. I am having multiple issues now. The first is problems like
> this:
>
> 02-Aug 19:03 adm8 JobId 9820: Fatal error: job.c:1811 Bad response to Append
> Data command. Wanted 3000 OK data , got 3903 Error append data
>
This can mean the tape is at the end.
>
> mt-back4.storage JobId 9820: Fatal error: 3992 Bad autochanger "load slot
> 13, drive 0": ERR=Child died from signal 15: Terminated.
> Results=mtx: Request Sense: Long Report=yes
> mtx: Request Sense: Valid Residual=no
> mtx: Request Sense: Error Code=0 (Unknown?!)
> mtx: Request Sense: Sense Key=No Sense
> mtx: Request Sense: FileMark=no
> mtx: Request Sense: EOM=no
> mtx: Request Sense: ILI=no
> mtx: Request Sense: Additional Sense Code = 00
> mtx: Request Sense: Additional Sense Qualifier = 00
> mtx: Request Sense: BPV=no
> mtx: Request Sense: Error in CDB=no
> mtx: Request Sense: SKSV=no
> MOVE MEDIUM from Element Address 13 to 81 Failed Program killed by Bacula
> watchdog (timeout)
>
This can happen if it took more than 5 minutes to unload the tape so
bacula killed the operation. There are bacula configuration options to
extend this time. I believe maximin changer time or something similar.
> First, I'm not sure why this would start timing out now, when its been
> running properly for over a year. Secondly, I know that Element Address 13
> is the slot that Bacula wanted to load, but I don't know what destination 81
> is. Usually the tape drive is Data Element 0. I've run mtx manually to load,
> unload tapes and run mt to check tape status and have done all the testing
> in the manual that I did when initially configuring the drive. All these
> tests performed as expected. I also recorded the response time of the drive
> for loading/forwarding/rewinding/unloading tapes, and made sure that
> mtx-changer was configured to account for these times with a margin for
> error.
>
> I am also now experiencing problems with Bacula being able to pick
> tapes from Pools correctly. The job is waiting for an "Appendable Volume"
>
> 04-Aug 11:40 mt-back4.storage JobId 9877: Job
> Oracle_Weekly_Tape.2008-08-03_08.00.05 waiting. Cannot find any appendable
> volumes.
> Please use the "label" command to create a new Volume for:
> Storage: "Ultrium-TD3" (/dev/rmt/0cbn)
> Pool: Oracle_Tapes
> Media type: LTO-3
>
> However, when I query for what tapes Bacula things are in the Changer
> (and they are in the changer). I see that the Oracle_Tapes pool has 3
> volumes that are in an "Append" status...
>
> Choose a query (1-16): 15
> +---------+------------+-----------+-------------+------+---------------+-----------+-----------+
> | MediaId | VolumeName | GB | Storage | Slot | Pool |
> MediaType | VolStatus |
> +---------+------------+-----------+-------------+------+---------------+-----------+-----------+
> | 1 | A00001 | 0.00 | Exabyte_224 | 1 | Full_Tapes |
> LTO-3 | Recycle |
> | 2 | A00002 | 0.00 | Exabyte_224 | 2 | Incr_Tapes |
> LTO-3 | Recycle |
> | 3 | A00003 | 1067.91 | Exabyte_224 | 3 | Incr_Tapes |
> LTO-3 | Full |
> | 4 | A00004 | 427.60 | Exabyte_224 | 4 | Dump_Tapes |
> LTO-3 | Full |
> | 5 | A00005 | 626.40 | Exabyte_224 | 5 | Oracle_Tapes |
> LTO-3 | Full |
> | 6 | A00006 | 0.00 | Exabyte_224 | 6 | Full_Tapes |
> LTO-3 | Recycle |
> | 7 | A00007 | 735.95 | Exabyte_224 | 7 | Full_Tapes |
> LTO-3 | Append |
> | 8 | A00008 | 57.18 | Exabyte_224 | 8 | Dump_Tapes |
> LTO-3 | Append |
> | 9 | A00009 | 0.00 | Exabyte_224 | 9 | Full_Tapes |
> LTO-3 | Recycle |
> | 10 | A00010 | 0.00 | Exabyte_224 | 10 | Full_Tapes |
> LTO-3 | Recycle |
> | 11 | A00011 | 972.31 | Exabyte_224 | 11 | Full_Tapes |
> LTO-3 | Full |
> | 12 | A00012 | 1055.86 | Exabyte_224 | 12 | Incr_Tapes |
> LTO-3 | Full |
> | 95 | B00001 | 297.14 | Exabyte_224 | 13 | Diff_Tapes |
> LTO-3 | Append |
> | 94 | B00002 | 551.02 | Exabyte_224 | 14 | Incr_Tapes |
> LTO-3 | Append |
> | 98 | B00003 | 499.80 | Exabyte_224 | 15 | Diff_Tapes |
> LTO-3 | Full |
> | 97 | B00004 | 413.32 | Exabyte_224 | 16 | Diff_Tapes |
> LTO-3 | Full |
> | 96 | B00005 | 0.00 | Exabyte_224 | 17 | Diff_Tapes |
> LTO-3 | Recycle |
> | 101 | B00006 | 0.00 | Exabyte_224 | 18 | Oracle_Tapes |
> LTO-3 | Append |
> | 100 | B00007 | 0.00 | Exabyte_224 | 19 | Oracle_Tapes |
> LTO-3 | Append |
> | 99 | B00008 | 36.32 | Exabyte_224 | 20 | Oracle_Tapes |
> LTO-3 | Append |
> +---------+------------+-----------+-------------+------+---------------+-----------+-----------+
>
> I have absolutely no idea why this is happening and any help or advice
> is very much appreciated.
>
I am not sure of this one.
John
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
|
|