Bacula-users

Re: [Bacula-users] Tape Jobs failing since upgrade to 2.4.0

2008-08-04 13:03:33
Subject: Re: [Bacula-users] Tape Jobs failing since upgrade to 2.4.0
From: "Mingus Dew" <shon.stephens AT gmail DOT com>
To: "John Drescher" <drescherjm AT gmail DOT com>
Date: Mon, 4 Aug 2008 13:03:26 -0400


On Mon, Aug 4, 2008 at 12:39 PM, John Drescher <drescherjm AT gmail DOT com> wrote:
On Mon, Aug 4, 2008 at 12:35 PM, Mingus Dew <shon.stephens AT gmail DOT com> wrote:
> All,
>
>      I have been having problems with my tape jobs failing since upgrading
> to 2.4.0. I am running Bacula 2.4.0, compiled from source, on Solaris 10_x86
> platform. My tape drive is a SCSI attached Exabyte Magnum 224 Autoloader
> with LTO-3 drive. Previously I rarely had issues with tape jobs beyond the
> occasional volume replacement.
>
>      Initially I thought I might need to increase the timeout for drive
> responses. I did so in mtx-changer script. However, I am not sure this is
> the problem. I am having multiple issues now. The first is problems like
> this:
>
> 02-Aug 19:03 adm8 JobId 9820: Fatal error: job.c:1811 Bad response to Append
> Data command. Wanted 3000 OK data , got 3903 Error append data
>
This can mean the tape is at the end.

If the tape is at the end, how come Bacula fails the job and doesn't try to get another tape from the pool? Am I missing something or don't have it configured correctly? This would seem to be a simple matter of "end of tape, rewind, load another tape"


>
> mt-back4.storage JobId 9820: Fatal error: 3992 Bad autochanger "load slot
> 13, drive 0": ERR=Child died from signal 15: Terminated.
> Results=mtx: Request Sense: Long Report=yes
> mtx: Request Sense: Valid Residual=no
> mtx: Request Sense: Error Code=0 (Unknown?!)
> mtx: Request Sense: Sense Key=No Sense
> mtx: Request Sense: FileMark=no
> mtx: Request Sense: EOM=no
> mtx: Request Sense: ILI=no
> mtx: Request Sense: Additional Sense Code = 00
> mtx: Request Sense: Additional Sense Qualifier = 00
> mtx: Request Sense: BPV=no
> mtx: Request Sense: Error in CDB=no
> mtx: Request Sense: SKSV=no
> MOVE MEDIUM from Element Address 13 to 81 Failed Program killed by Bacula
> watchdog (timeout)
>
This can happen if it took more than 5 minutes to unload the tape so
bacula killed the operation. There are bacula configuration options to
extend this time. I believe maximin changer time or something similar.

Thank you. I'm going to look into setting  these options in my Storage configuration



>      First, I'm not sure why this would start timing out now, when its been
> running properly for over a year. Secondly, I know that Element Address 13
> is the slot that Bacula wanted to load, but I don't know what destination 81
> is. Usually the tape drive is Data Element 0. I've run mtx manually to load,
> unload tapes and run mt to check tape status and have done all the testing
> in the manual that I did when initially configuring the drive. All these
> tests performed as expected. I also recorded the response time of the drive
> for loading/forwarding/rewinding/unloading tapes, and made sure that
> mtx-changer was configured to account for these times with a margin for
> error.
>
>      I am also now experiencing problems with Bacula being able to pick
> tapes from Pools correctly. The job is waiting for an "Appendable Volume"
>
> 04-Aug 11:40 mt-back4.storage JobId 9877: Job
> Oracle_Weekly_Tape.2008-08-03_08.00.05 waiting. Cannot find any appendable
> volumes.
> Please use the "label"  command to create a new Volume for:
>     Storage:      "Ultrium-TD3" (/dev/rmt/0cbn)
>     Pool:         Oracle_Tapes
>     Media type:   LTO-3
>
>      However, when I query for what tapes Bacula things are in the Changer
> (and they are in the changer). I see that the Oracle_Tapes pool has 3
> volumes that are in an "Append" status...
>
> Choose a query (1-16): 15
> +---------+------------+-----------+-------------+------+---------------+-----------+-----------+
> | MediaId | VolumeName | GB        | Storage     | Slot | Pool          |
> MediaType | VolStatus |
> +---------+------------+-----------+-------------+------+---------------+-----------+-----------+
> |       1 | A00001     | 0.00      | Exabyte_224 |    1 | Full_Tapes |
> LTO-3     | Recycle   |
> |       2 | A00002     | 0.00      | Exabyte_224 |    2 | Incr_Tapes  |
> LTO-3     | Recycle   |
> |       3 | A00003     | 1067.91   | Exabyte_224 |    3 | Incr_Tapes  |
> LTO-3     | Full      |
> |       4 | A00004     | 427.60    | Exabyte_224 |    4 | Dump_Tapes |
> LTO-3     | Full      |
> |       5 | A00005     | 626.40    | Exabyte_224 |    5 | Oracle_Tapes |
> LTO-3     | Full      |
> |       6 | A00006     | 0.00      | Exabyte_224 |    6 | Full_Tapes |
> LTO-3     | Recycle   |
> |       7 | A00007     | 735.95    | Exabyte_224 |    7 | Full_Tapes |
> LTO-3     | Append    |
> |       8 | A00008     | 57.18     | Exabyte_224 |    8 | Dump_Tapes |
> LTO-3     | Append    |
> |       9 | A00009     | 0.00      | Exabyte_224 |    9 | Full_Tapes |
> LTO-3     | Recycle   |
> |      10 | A00010     | 0.00      | Exabyte_224 |   10 | Full_Tapes |
> LTO-3     | Recycle   |
> |      11 | A00011     | 972.31    | Exabyte_224 |   11 | Full_Tapes |
> LTO-3     | Full      |
> |      12 | A00012     | 1055.86   | Exabyte_224 |   12 | Incr_Tapes  |
> LTO-3     | Full      |
> |      95 | B00001     | 297.14    | Exabyte_224 |   13 | Diff_Tapes     |
> LTO-3     | Append    |
> |      94 | B00002     | 551.02    | Exabyte_224 |   14 | Incr_Tapes  |
> LTO-3     | Append    |
> |      98 | B00003     | 499.80    | Exabyte_224 |   15 | Diff_Tapes     |
> LTO-3     | Full      |
> |      97 | B00004     | 413.32    | Exabyte_224 |   16 | Diff_Tapes     |
> LTO-3     | Full      |
> |      96 | B00005     | 0.00      | Exabyte_224 |   17 | Diff_Tapes     |
> LTO-3     | Recycle   |
> |     101 | B00006     | 0.00      | Exabyte_224 |   18 | Oracle_Tapes |
> LTO-3     | Append    |
> |     100 | B00007     | 0.00      | Exabyte_224 |   19 | Oracle_Tapes |
> LTO-3     | Append    |
> |      99 | B00008     | 36.32     | Exabyte_224 |   20 | Oracle_Tapes |
> LTO-3     | Append    |
> +---------+------------+-----------+-------------+------+---------------+-----------+-----------+
>
>      I have absolutely no idea why this is happening and any help or advice
> is very much appreciated.
>
I am not sure of this one.

I cancelled that job and tried to run another tape job and had the same thing happen. Bacula does not seem to know of ANY appendable volumes. Is there a way to tell if Bacula as invalidated the changer slot?




John

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users