Bacula-users

[Bacula-users] Tape Jobs failing since upgrade to 2.4.0

2008-08-04 12:35:21
Subject: [Bacula-users] Tape Jobs failing since upgrade to 2.4.0
From: "Mingus Dew" <shon.stephens AT gmail DOT com>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Mon, 4 Aug 2008 12:35:12 -0400
All,

     I have been having problems with my tape jobs failing since upgrading to 2.4.0. I am running Bacula 2.4.0, compiled from source, on Solaris 10_x86 platform. My tape drive is a SCSI attached Exabyte Magnum 224 Autoloader with LTO-3 drive. Previously I rarely had issues with tape jobs beyond the occasional volume replacement.

     Initially I thought I might need to increase the timeout for drive responses. I did so in mtx-changer script. However, I am not sure this is the problem. I am having multiple issues now. The first is problems like this:

02-Aug 19:03 adm8 JobId 9820: Fatal error: job.c:1811 Bad response to Append Data command. Wanted 3000 OK data , got 3903 Error append data

mt-back4.storage JobId 9820: Fatal error: 3992 Bad autochanger "load slot 13, drive 0": ERR=Child died from signal 15: Terminated.
Results=mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 13 to 81 Failed Program killed by Bacula watchdog (timeout)

     First, I'm not sure why this would start timing out now, when its been running properly for over a year. Secondly, I know that Element Address 13 is the slot that Bacula wanted to load, but I don't know what destination 81 is. Usually the tape drive is Data Element 0. I've run mtx manually to load, unload tapes and run mt to check tape status and have done all the testing in the manual that I did when initially configuring the drive. All these tests performed as expected. I also recorded the response time of the drive for loading/forwarding/rewinding/unloading tapes, and made sure that mtx-changer was configured to account for these times with a margin for error.

     I am also now experiencing problems with Bacula being able to pick tapes from Pools correctly. The job is waiting for an "Appendable Volume"

04-Aug 11:40 mt-back4.storage JobId 9877: Job Oracle_Weekly_Tape.2008-08-03_08.00.05 waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
    Storage:      "Ultrium-TD3" (/dev/rmt/0cbn)
    Pool:         Oracle_Tapes
    Media type:   LTO-3

     However, when I query for what tapes Bacula things are in the Changer (and they are in the changer). I see that the Oracle_Tapes pool has 3 volumes that are in an "Append" status...

Choose a query (1-16): 15
+---------+------------+-----------+-------------+------+---------------+-----------+-----------+
| MediaId | VolumeName | GB        | Storage     | Slot | Pool          | MediaType | VolStatus |
+---------+------------+-----------+-------------+------+---------------+-----------+-----------+
|       1 | A00001     | 0.00      | Exabyte_224 |    1 | Full_Tapes | LTO-3     | Recycle   |
|       2 | A00002     | 0.00      | Exabyte_224 |    2 | Incr_Tapes  | LTO-3     | Recycle   |
|       3 | A00003     | 1067.91   | Exabyte_224 |    3 | Incr_Tapes  | LTO-3     | Full      |
|       4 | A00004     | 427.60    | Exabyte_224 |    4 | Dump_Tapes | LTO-3     | Full      |
|       5 | A00005     | 626.40    | Exabyte_224 |    5 | Oracle_Tapes | LTO-3     | Full      |
|       6 | A00006     | 0.00      | Exabyte_224 |    6 | Full_Tapes | LTO-3     | Recycle   |
|       7 | A00007     | 735.95    | Exabyte_224 |    7 | Full_Tapes | LTO-3     | Append    |
|       8 | A00008     | 57.18     | Exabyte_224 |    8 | Dump_Tapes | LTO-3     | Append    |
|       9 | A00009     | 0.00      | Exabyte_224 |    9 | Full_Tapes | LTO-3     | Recycle   |
|      10 | A00010     | 0.00      | Exabyte_224 |   10 | Full_Tapes | LTO-3     | Recycle   |
|      11 | A00011     | 972.31    | Exabyte_224 |   11 | Full_Tapes | LTO-3     | Full      |
|      12 | A00012     | 1055.86   | Exabyte_224 |   12 | Incr_Tapes  | LTO-3     | Full      |
|      95 | B00001     | 297.14    | Exabyte_224 |   13 | Diff_Tapes     | LTO-3     | Append    |
|      94 | B00002     | 551.02    | Exabyte_224 |   14 | Incr_Tapes  | LTO-3     | Append    |
|      98 | B00003     | 499.80    | Exabyte_224 |   15 | Diff_Tapes     | LTO-3     | Full      |
|      97 | B00004     | 413.32    | Exabyte_224 |   16 | Diff_Tapes     | LTO-3     | Full      |
|      96 | B00005     | 0.00      | Exabyte_224 |   17 | Diff_Tapes     | LTO-3     | Recycle   |
|     101 | B00006     | 0.00      | Exabyte_224 |   18 | Oracle_Tapes | LTO-3     | Append    |
|     100 | B00007     | 0.00      | Exabyte_224 |   19 | Oracle_Tapes | LTO-3     | Append    |
|      99 | B00008     | 36.32     | Exabyte_224 |   20 | Oracle_Tapes | LTO-3     | Append    |
+---------+------------+-----------+-------------+------+---------------+-----------+-----------+

     I have absolutely no idea why this is happening and any help or advice is very much appreciated.

Thank you,
Shon




-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users