Bacula-users

[Bacula-users] 2.4.0 with recycle_patch recycled wrong volume

2008-07-07 06:14:11
Subject: [Bacula-users] 2.4.0 with recycle_patch recycled wrong volume
From: Thomas <thomas AT ic3s DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 07 Jul 2008 12:13:52 +0200
Hi,

today i found some strange errors in the bacula log.

it starts with the recycling of volume FUL112,
which was the oldest tape in the pool and the right choice for the next jobs:

2008-07-05 04:18:13backup-dir JobId 46360: There are no more Jobs associated 
with Volume "FUL112". Marking it purged.
2008-07-05 04:18:13backup-dir JobId 46360: All records pruned from Volume 
"FUL112"; marking it "Purged"
2008-07-05 04:18:13backup-dir JobId 46360: Recycled volume "FUL112"
2008-07-05 04:18:13backup-sd-bb JobId 46360: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 04:18:13backup-sd-bb JobId 46360: 3302 Autochanger "loaded? drive 
0", result: nothing loaded.
2008-07-05 04:18:13backup-sd-bb JobId 46360: 3304 Issuing autochanger "load 
slot 23, drive 0" command.
2008-07-05 04:19:22backup-sd-bb JobId 46360: 3305 Autochanger "load slot 23, 
drive 0", status is OK.
2008-07-05 04:19:33backup-sd-bb JobId 46360: Recycled volume "FUL112" on device 
"JBBB-1" (/dev/nst0), all previous data lost.
2008-07-05 04:19:33backup-sd-bb JobId 46360: New volume "FUL112" mounted on 
device "JBBB-1" (/dev/nst0) at 05-Jul-2008 04:19.

now some jobs are running on FUL112

...
   Volume name(s):         FUL114|FUL112
...
   Volume name(s):         FUL112
...
   Volume name(s):         FUL112
...

Bacula unloads FUL112 to run some incremental jobs:

2008-07-05 10:34:34backup-dir JobId 46387: Using Device "JBBB-1"
2008-07-05 10:34:37backup-sd-bb JobId 46387: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 10:34:37backup-sd-bb JobId 46387: 3302 Autochanger "loaded? drive 
0", result: nothing loaded.
2008-07-05 10:34:37backup-sd-bb JobId 46387: 3304 Issuing autochanger "load 
slot 1, drive 0" command.
2008-07-05 10:34:58backup-sd-bb JobId 46387: 3305 Autochanger "load slot 1, 
drive 0", status is OK.
2008-07-05 10:35:08backup-sd-bb JobId 46387: Volume "INC007" previously 
written, moving to end of data.
2008-07-05 10:38:58backup-sd-bb JobId 46387: Ready to append to end of Volume 
"INC007" at file=186.


the next full backup starts,
bacula tries to load the tape FUL112 but this fails with MTEOM
(i think this is ok and can happen with a defective volume)
the failed unload command is strange, i have never seen that before:

2008-07-05 13:56:50backup-dir JobId 46368: Using Device "JBBB-1"
2008-07-05 13:56:50backup-sd-bb JobId 46368: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 13:56:50backup-sd-bb JobId 46368: 3302 Autochanger "loaded? drive 
0", result: nothing loaded.
2008-07-05 13:56:50backup-sd-bb JobId 46368: 3304 Issuing autochanger "load 
slot 23, drive 0" command.
2008-07-05 13:57:59backup-sd-bb JobId 46368: 3305 Autochanger "load slot 23, 
drive 0", status is OK.
2008-07-05 13:58:10backup-sd-bb JobId 46368: Volume "FUL112" previously 
written, moving to end of data.
2008-07-05 17:51:31backup-sd-bb JobId 46368: Error: Unable to position to end 
of data on device "JBBB-1" (/dev/nst0): ERR=dev.c:895 ioctl MTEOM error on 
"JBBB-1" (/dev/nst0). ERR=Eingabe-/Ausgabefehler.

2008-07-05 17:51:31backup-sd-bb JobId 46368: Marking Volume "FUL112" in Error 
in Catalog.
2008-07-05 17:51:31backup-sd-bb JobId 46368: 3307 Issuing autochanger "unload 
slot 23, drive 0" command.
2008-07-05 17:51:34backup-sd-bb JobId 46368: 3995 Bad autochanger "unload slot 
23, drive 0": ERR=Child exited with code 1
Results=Unloading Data Transfer Element into Storage Element 23...mtx: Request 
Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Illegal Request
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 53
mtx: Request Sense: Additional Sense Qualifier = 02
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 480 to 54 Failed

FUL112 fails and the next tape, FUL115, will be used:

2008-07-05 17:51:53backup-dir JobId 46368: There are no more Jobs associated 
with Volume "FUL115". Marking it purged.
2008-07-05 17:51:53backup-dir JobId 46368: All records pruned from Volume 
"FUL115"; marking it "Purged"
2008-07-05 17:51:53backup-dir JobId 46368: Recycled volume "FUL115"
2008-07-05 17:51:53backup-sd-bb JobId 46368: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 17:51:53backup-sd-bb JobId 46368: 3302 Autochanger "loaded? drive 
0", result is Slot 23.
2008-07-05 17:51:53backup-sd-bb JobId 46368: 3307 Issuing autochanger "unload 
slot 23, drive 0" command.
2008-07-05 17:51:56backup-sd-bb JobId 46368: 3995 Bad autochanger "unload slot 
23, drive 0": ERR=Child exited with code 1
Results=Unloading Data Transfer Element into Storage Element 23...mtx: Request 
Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Illegal Request
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 53
mtx: Request Sense: Additional Sense Qualifier = 02
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 480 to 54 Failed

2008-07-05 17:51:56backup-sd-bb JobId 46368: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 17:51:56backup-sd-bb JobId 46368: 3302 Autochanger "loaded? drive 
0", result is Slot 23.
2008-07-05 17:52:02backup-sd-bb JobId 46368: Recycled volume "FUL112" on device 
"JBBB-1" (/dev/nst0), all previous data lost.

bacula still can't unload FUL112, so why not using this tape again? :)

now 2 jobs runs successful on tape FUL112.

after that some other jobs were cancelled because of their configuration:
2008-07-05 17:52:13backup-dir JobId 46370: Fatal error: Job canceled because 
max start delay time exceeded.
2008-07-05 17:52:14backup-dir JobId 46371: Fatal error: Job canceled because 
max start delay time exceeded.
2008-07-05 17:52:14backup-dir JobId 46372: Fatal error: Job canceled because 
max start delay time exceeded.
2008-07-05 17:52:14backup-dir JobId 46373: Fatal error: Job canceled because 
max start delay time exceeded.

the next incremental job starts,
bacula can now unload FUL112 without any problems:

2008-07-05 17:52:16backup-sd-bb JobId 46384: 3307 Issuing autochanger "unload 
slot 23, drive 0" command.
2008-07-05 17:52:42backup-dir JobId 46384: Using Device "JBBB-1"
2008-07-05 17:52:42backup-sd-bb JobId 46384: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 17:52:42backup-sd-bb JobId 46384: 3302 Autochanger "loaded? drive 
0", result: nothing loaded.
2008-07-05 17:52:42backup-sd-bb JobId 46384: 3304 Issuing autochanger "load 
slot 1, drive 0" command.
2008-07-05 17:53:03backup-sd-bb JobId 46384: 3305 Autochanger "load slot 1, 
drive 0", status is OK.
2008-07-05 17:53:13backup-sd-bb JobId 46384: Volume "INC007" previously 
written, moving to end of data.
2008-07-05 17:56:26backup-sd-bb JobId 46384: Ready to append to end of Volume 
"INC007" at file=330.


till now the only problem was that FUL112 was recycled twice.
a lot if jobs are successful and a few are cancelled.


two hours later i tried to restart the cancelled jobs:
*run job=jobname Level=Full Pool=FUL Priority=1 yes


2008-07-05 19:54:11backup-dir JobId 46394: Start Backup JobId 46394, 
Job=jobname.2008-07-05_19.54.31
2008-07-05 19:54:16backup-dir JobId 46394: Using Device "JBBB-1"
2008-07-05 19:54:16backup-sd-bb JobId 46394: Invalid slot=0 defined in catalog 
for Volume "FUL112" on "JBBB-1" (/dev/nst0). Manual load may be required.
2008-07-05 19:54:16backup-sd-bb JobId 46394: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 19:54:16backup-sd-bb JobId 46394: 3302 Autochanger "loaded? drive 
0", result: nothing loaded.
2008-07-05 19:54:16backup-sd-bb JobId 46394: 3301 Issuing autochanger "loaded? 
drive 0" command.
2008-07-05 19:54:16backup-sd-bb JobId 46394: 3302 Autochanger "loaded? drive 
0", result: nothing loaded.
2008-07-05 19:54:16backup-sd-bb JobId 46394: Error: Could not open device 
"JBBB-1" (/dev/nst0): ERR=dev.c:432 Unable to open device "JBBB-1" (/dev/nst0): 
ERR=Kein Medium gefunden

2008-07-05 19:54:16backup-sd-bb JobId 46394: Fatal error: Could not ready 
device "JBBB-1" (/dev/nst0) for append.
2008-07-05 19:54:16devstar-fd JobId 46394: Fatal error: job.c:1811 Bad response 
to Append Data command. Wanted 3000 OK data
, got 3903 Error append data

2008-07-05 19:54:16backup-dir JobId 46394: Error: Bacula backup-dir 2.4.0 
(04Jun08): 05-Jul-2008 19:54:16
   Build OS:               x86_64-unknown-linux-gnu debian 4.0
   JobId:                  46394
   Backup Level:           Full
   Pool:                   "FUL" (From User input)
   Storage:                "JBBB" (From Job resource)
   Scheduled time:         05-Jul-2008 19:54:09
   Start time:             05-Jul-2008 19:54:16
   End time:               05-Jul-2008 19:54:16
   Elapsed time:           0 secs
   Priority:               1
   FD Files Written:       0
   SD Files Written:       0
   FD Bytes Written:       0 (0 B)
   SD Bytes Written:       0 (0 B)
   Rate:                   0.0 KB/s
   Software Compression:   None
   VSS:                    no
   Storage Encryption:     no
   Volume name(s):
   Volume Session Id:      218
   Volume Session Time:    1214824689
   Last Volume Bytes:      5,999,616 (5.999 MB)
   Non-fatal FD errors:    0
   SD Errors:              0
   FD termination status:  Error
   SD termination status:  Error
   Termination:            *** Backup Error ***

why has this happen?


after sending some commands (multiple "release storage" and "update slots"
via my self written admin gui, so i could not see the console output)
i was able to restart the jobs and they run successful on FUL112.

the recycled patch fix the bug #1106, so this one must be something else.


Thomas



-- 
[:O]###[O:]

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] 2.4.0 with recycle_patch recycled wrong volume, Thomas <=