Bacula-users

Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?

2012-07-05 14:37:59
Subject: Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
From: Stephen Thompson <stephen AT seismo.berkeley DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 05 Jul 2012 11:35:15 -0700

Hello again,

Here's something even stranger...  Another Full job logs that it's 
written to a volume in the Full pool (FB0956), but then the status 
output of the job lists a volume in the Incremental pool (IM0093).  This 
Incremental volume was never even mentioned in the log as a volume to 
which the job despooled.


22-Jun 20:00 <DIR> JobId 242323: Start Backup JobId 242323, 
Job=<JOB>.2012-06-22_20.00.02_06
22-Jun 20:01 <DIR> JobId 242323: Using Device "SL500-Drive-1"
22-Jun 20:06 <SD> JobId 242323: 3301 Issuing autochanger "loaded? drive 
1" command.
22-Jun 20:06 <SD> JobId 242323: 3302 Autochanger "loaded? drive 1", 
result: nothing loaded.
22-Jun 20:06 <SD> JobId 242323: 3301 Issuing autochanger "loaded? drive 
1" command.
22-Jun 20:06 <SD> JobId 242323: 3302 Autochanger "loaded? drive 1", 
result: nothing loaded.
22-Jun 20:06 <SD> JobId 242323: 3304 Issuing autochanger "load slot 138, 
drive 1" command.
22-Jun 20:07 <SD> JobId 242323: 3305 Autochanger "load slot 138, drive 
1", status is OK.
22-Jun 20:07 <SD> JobId 242323: Volume "FB0956" previously written, 
moving to end of data.
22-Jun 20:08 <SD> JobId 242323: Ready to append to end of Volume 
"FB0956" at file=4.
22-Jun 20:08 <SD> JobId 242323: Spooling data ...
23-Jun 00:01 <SD> JobId 242323: Job write elapsed time = 03:53:01, 
Transfer rate = 10.80 M Bytes/second
23-Jun 00:01 <SD> JobId 242323: Committing spooled data to Volume 
"FB0956". Despooling 151,089,481,092 bytes ...
23-Jun 01:28 <SD> JobId 242323: Despooling elapsed time = 01:27:07, 
Transfer rate = 28.90 M Bytes/second
23-Jun 01:28 <SD> JobId 242323: Sending spooled attrs to the Director. 
Despooling 99,242,108 bytes ...
23-Jun 01:30 <DIR> JobId 242323: Bacula <DIR> 5.2.9 (11Jun12):

   Build OS:               x86_64-unknown-linux-gnu redhat Enterprise 
release
   JobId:                  242323
   Job:                    <JOB>.2012-06-22_20.00.02_06
   Backup Level:           Full
   Client:                 "<FD>" 5.2.6 (21Feb12) 
x86_64-unknown-linux-gnu,redhat,
   FileSet:                "<FS>" 2012-01-22 20:00:03
   Pool:                   "Full-Pool-2012-06" (From Job resource)
   Catalog:                "MyCatalog" (From Client resource)
   Storage:                "SL500-changer" (From Job resource)
   Scheduled time:         22-Jun-2012 20:00:02
   Start time:             22-Jun-2012 20:01:29
   End time:               23-Jun-2012 01:30:15
   Elapsed time:           5 hours 28 mins 46 secs
   Priority:               10
   FD Files Written:       337,622
   SD Files Written:       337,622
   FD Bytes Written:       150,974,593,977 (150.9 GB)
   SD Bytes Written:       151,023,845,596 (151.0 GB)
   Rate:                   7653.6 KB/s
   Software Compression:   None
   VSS:                    no
   Encryption:             no
   Accurate:               yes
   Volume name(s):         IM0093
   Volume Session Id:      9
   Volume Session Time:    1340291913
   Last Volume Bytes:      723,076,936,704 (723.0 GB)
   Non-fatal FD errors:    0
   SD Errors:              0
   FD termination status:  OK
   SD termination status:  OK
   Termination:            Backup OK

23-Jun 01:30 <DIR> JobId 242323: Begin pruning Jobs older than 10 years .
23-Jun 01:30 <DIR> JobId 242323: No Jobs found to prune.
23-Jun 01:30 <DIR> JobId 242323: Begin pruning Files.
23-Jun 01:30 <DIR> JobId 242323: No Files found to prune.
23-Jun 01:30 <DIR> JobId 242323: End auto prune.        



Note:  FB0956 is in the Full pool, IM0093 in the Incremental.


The vast majority of our jobs are being successful, but when something 
like this happens, I lost all faith that I even have the backups I think 
I have!!!

I just tried a test restore of this particular job, and it did in fact 
use IM0093 to restore from.  Ugh.

Stephen





On 07/05/2012 10:46 AM, Stephen Thompson wrote:
>
>
> Hello,
>
> Running 5.2.9, though I believe we've seen this sporadically in earlier
> versions.  Jobs are using volumes that are in pools to which they have
> not been assigned.
>
> This is likely a bug as I don't see anything peculiar about our
> configuration.  We are using a tape library with 2 drives, both set to
> autoselect. The library contains volumes that are properly assigned
> (i.e. database entries for volumes look fine) to various pools,
> including a Full pool and an Incremental pool.
>
> Twice in the past week, Full jobs which specify the use of the Full pool
> using jobdefs are using volumes from the Incremental pool.  I haven't
> narrowed down all the details, but I believe it's if the Incremental
> volume is already loaded in a drive when the Full job in question is
> launched.
>
>
> Example:
>
> 01-Jul 22:00 <DIR> JobId 244098: Fatal error: JobId 243957 already
> running. Duplicate job not allowed.
> 01-Jul 20:03 <DIR> JobId 243957: Start Backup JobId 243957,
> Job=<JOB>.2012-07-01_20.00.04_03
> 01-Jul 20:04 <DIR> JobId 243957: Using Device "SL500-Drive-0"
> 01-Jul 20:42 <SD> JobId 243957: Spooling data ...
> 02-Jul 14:01 <SD> JobId 243957: Job write elapsed time = 17:19:00,
> Transfer rate = 33.39 M Bytes/second
> 02-Jul 14:01 <SD> JobId 243957: Committing spooled data to Volume
> "FB0161". Despooling 2,082,572,994,002 bytes ...
> 02-Jul 15:38 <SD> JobId 243957: End of Volume "IM0094" at 462:1559 on
> device "SL500-Drive-0" (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
> 02-Jul 15:38 <SD> JobId 243957: Re-read of last block succeeded.
> 02-Jul 15:38 <SD> JobId 243957: End of medium on Volume "IM0094"
> Bytes=924,097,352,704 Blocks=3,525,155 at 02-Jul-2012 15:38.
> 02-Jul 15:38 <SD> JobId 243957: 3307 Issuing autochanger "unload slot
> 148, drive 0" command.
> 02-Jul 15:39 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 02-Jul 15:39 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 02-Jul 15:39 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 02-Jul 15:39 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 02-Jul 15:39 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 1" command.
> 02-Jul 15:39 <SD> JobId 243957: 3302 Autochanger "loaded? drive 1",
> result: nothing loaded.
> 02-Jul 15:39 <SD> JobId 243957: 3304 Issuing autochanger "load slot 6,
> drive 0" command.
> 02-Jul 15:40 <SD> JobId 243957: 3305 Autochanger "load slot 6, drive 0",
> status is OK.
> 02-Jul 15:40 <SD> JobId 243957: Volume "FB0161" previously written,
> moving to end of data.
> 02-Jul 15:40 <SD> JobId 243957: Ready to append to end of Volume
> "FB0161" at file=7.
> 02-Jul 15:40 <SD> JobId 243957: New volume "FB0161" mounted on device
> "SL500-Drive-0" (/dev/SL500-Drive-0) at 02-Jul-2012 15:40.
> 02-Jul 18:19 <SD> JobId 243957: End of Volume "FB0161" at 274:5411 on
> device "SL500-Drive-0" (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
> 02-Jul 18:19 <SD> JobId 243957: Re-read of last block succeeded.
> 02-Jul 18:19 <SD> JobId 243957: End of medium on Volume "FB0161"
> Bytes=535,390,861,312 Blocks=2,042,367 at 02-Jul-2012 18:19.
> 02-Jul 18:19 <SD> JobId 243957: 3307 Issuing autochanger "unload slot 6,
> drive 0" command.
> 02-Jul 18:20 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 02-Jul 18:20 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 02-Jul 18:20 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 02-Jul 18:20 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 02-Jul 18:20 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 1" command.
> 02-Jul 18:20 <SD> JobId 243957: 3302 Autochanger "loaded? drive 1",
> result: nothing loaded.
> 02-Jul 18:20 <SD> JobId 243957: 3304 Issuing autochanger "load slot 4,
> drive 0" command.
> 02-Jul 18:21 <SD> JobId 243957: 3305 Autochanger "load slot 4, drive 0",
> status is OK.
> 02-Jul 18:21 <SD> JobId 243957: Volume "FB0158" previously written,
> moving to end of data.
> 02-Jul 18:21 <SD> JobId 243957: Ready to append to end of Volume
> "FB0158" at file=1.
> 02-Jul 18:21 <SD> JobId 243957: New volume "FB0158" mounted on device
> "SL500-Drive-0" (/dev/SL500-Drive-0) at 02-Jul-2012 18:21.
> 02-Jul 21:14 <SD> JobId 243957: End of Volume "FB0158" at 274:1785 on
> device "SL500-Drive-0" (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
> 02-Jul 21:14 <SD> JobId 243957: Re-read of last block succeeded.
> 02-Jul 21:14 <SD> JobId 243957: End of medium on Volume "FB0158"
> Bytes=546,439,698,432 Blocks=2,084,507 at 02-Jul-2012 21:14.
> 02-Jul 21:14 <SD> JobId 243957: 3307 Issui02-Jul 22:00 <DIR> JobId
> 244271: Fatal error: JobId 244133 already running. Duplicate job not
> allowed.
> 02-Jul 22:00 <DIR> JobId 244268: Fatal error: JobId 243957 already
> running. Duplicate job not allowed.
> 02-Jul 21:16 <DIR> JobId 243957: Recycled volume "FB0162"
> 02-Jul 21:16 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 02-Jul 21:16 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 02-Jul 21:16 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 02-Jul 21:16 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 02-Jul 21:16 <SD> JobId 243957: 3304 Issuing autochanger "load slot 7,
> drive 0" command.
> 02-Jul 21:16 <SD> JobId 243957: 3305 Autochanger "load slot 7, drive 0",
> status is OK.
> 02-Jul 21:16 <SD> JobId 243957: Recycled volume "FB0162" on device
> "SL500-Drive-0" (/dev/SL500-Drive-0), all previous data lost.
> 02-Jul 21:16 <SD> JobId 243957: New volume "FB0162" mounted on device
> "SL500-Drive-0" (/dev/SL500-Drive-0) at 02-Jul-2012 21:16.
> 03-Jul 00:36 <SD> JobId 243957: End of Volume "FB0162" at 295:4268 on
> device "SL500-Drive-0" (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
> 03-Jul 00:37 <SD> JobId 243957: Re-read of last block succeeded.
> 03-Jul 00:37 <SD> JobId 243957: End of medium on Volume "FB0162"
> Bytes=591,088,321,536 Blocks=2,254,823 at 03-Jul-2012 00:37.
> 03-Jul 00:37 <SD> JobId 243957: 3307 Issuing autochanger "unload slot 7,
> drive 0" command.
> 03-Jul 00:38 <DIR> JobId 243957: Recycled volume "FB0164"
> 03-Jul 00:38 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 03-Jul 00:38 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 03-Jul 00:38 <SD> JobId 243957: 3301 Issuing autochanger "loaded? drive
> 0" command.
> 03-Jul 00:38 <SD> JobId 243957: 3302 Autochanger "loaded? drive 0",
> result: nothing loaded.
> 03-Jul 00:38 <SD> JobId 243957: 3304 Issuing autochanger "load slot 9,
> drive 0" command.
> 03-Jul 00:39 <SD> JobId 243957: 3305 Autochanger "load slot 9, drive 0",
> status is OK.
> 03-Jul 00:39 <SD> JobId 243957: Recycled volume "FB0164" on device
> "SL500-Drive-0" (/dev/SL500-Drive-0), all previous data lost.
> 03-Jul 00:39 <SD> JobId 243957: New volume "FB0164" mounted on device
> "SL500-Drive-0" (/dev/SL500-Drive-0) at 03-Jul-2012 00:39.
> 03-Jul 01:08 <SD> JobId 243957: Despooling elapsed time = 10:58:19,
> Transfer rate = 52.72 M Bytes/second
> 03-Jul 01:08 <SD> JobId 243957: Sending spooled attrs to the Director.
> Despooling 2,500,634,015 bytes ...
> 03-Jul 01:31 <DIR> JobId 243957: Bacula <DIR> 5.2.9 (11Jun12):
>
>     Build OS:               x86_64-unknown-linux-gnu redhat Enterprise
> release
>     JobId:                  243957
>     Job:                    <JOB>.2012-07-01_20.00.04_03
>     Backup Level:           Full
>     Client:                 "<FD>" 5.2.6 (21Feb12)
> i386-pc-solaris2.10,solaris,5.10
>     FileSet:                "<FS>" 2012-02-01 20:00:23
>     Pool:                   "Full-Pool" (From Job resource)
>     Catalog:                "MyCatalog" (From Client resource)
>     Storage:                "SL500-changer" (From Job resource)
>     Scheduled time:         01-Jul-2012 20:00:04
>     Start time:             01-Jul-2012 20:04:02
>     End time:               03-Jul-2012 01:31:35
>     Elapsed time:           1 day 5 hours 27 mins 33 secs
>     Priority:               10
>     FD Files Written:       6,915,330
>     SD Files Written:       6,915,330
>     FD Bytes Written:       2,080,113,652,089 (2.080 TB)
>     SD Bytes Written:       2,081,613,855,190 (2.081 TB)
>     Rate:                   19613.9 KB/s
>     Software Compression:   None
>     VSS:                    no
>     Encryption:             no
>     Accurate:               yes
>     Volume name(s):         IM0094|FB0161|FB0158|FB0162|FB0164
>     Volume Session Id:      147
>     Volume Session Time:    1340291913
>     Last Volume Bytes:      96,966,779,904 (96.96 GB)
>     Non-fatal FD errors:    0
>     SD Errors:              0
>     FD termination status:  OK
>     SD termination status:  OK
>     Termination:            Backup OK
>
> 03-Jul 01:31 <DIR> JobId 243957: Begin pruning Jobs older than 10 years .
> 03-Jul 01:31 <DIR> JobId 243957: No Jobs found to prune.
> 03-Jul 01:31 <DIR> JobId 243957: Begin pruning Files.
> 03-Jul 01:31 <DIR> JobId 243957: No Files found to prune.
> 03-Jul 01:31 <DIR> JobId 243957: End auto prune
>
>
> Note: FB0161|FB0158|FB0162|FB0164 are all in Full-Pool, whereas the
> first tape used in the job, IM0094 is in Incremental-Pool.
>
>
> Anyone have ideas why it would be using a volume from a pool to which
> the job has not been associated?
>
> thanks,
> Stephen
>


-- 
Stephen Thompson               Berkeley Seismological Laboratory
stephen AT seismo.berkeley DOT edu    215 McCone Hall # 4760
404.538.7077 (phone)           University of California, Berkeley
510.643.5811 (fax)             Berkeley, CA 94720-4760



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users