Bacula-users

Re: [Bacula-users] Requests for tape volumes in use by another job and drive

2013-05-21 13:47:54
Subject: Re: [Bacula-users] Requests for tape volumes in use by another job and drive
From: Bill Arlofski <waa-bacula AT revpol DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 21 May 2013 13:41:02 -0400
On 05/20/13 14:29, Kern Sibbald wrote:
> Hello,
> 
> On 05/17/2013 06:34 PM, Clark, Patricia A. wrote:
>> I use spooling.
>>
>> Patti
>>
>> From: Simone Caronni <negativo17 AT gmail DOT com<mailto:negativo17 AT gmail 
>> DOT com>>
>> Date: Friday, May 17, 2013 8:27 AM
>> To: "Clark, Patricia A." <clarkpa AT ornl DOT gov<mailto:clarkpa AT ornl DOT 
>> gov>>
>> Cc: "bacula-users AT lists.sourceforge DOT net<mailto:bacula-users AT 
>> lists.sourceforge DOT net>" <bacula-users AT lists.sourceforge DOT 
>> net<mailto:bacula-users AT lists.sourceforge DOT net>>
>> Subject: Re: [Bacula-users] Requests for tape volumes in use by another job 
>> and drive
>>
>> Hello,
>>
>> are you using spooling on your jobs? Or writing directly to the storage 
>> device?
>>
>> Regards,
>> --Simone
>>
>>
>>
>> On 17 May 2013 14:05, Clark, Patricia A. <clarkpa AT ornl DOT 
>> gov<mailto:clarkpa AT ornl DOT gov>> wrote:
>> I am using the latest version, 5.2.13-9, from the epel-bacula repository.
> 
>> There are a number of long reported issues that seem to be particularly 
>> acute in this release,
> Can you be specific and list the "number of long reported issues"?
>> most notably the insistence on a job requesting a volume that is already in 
>> use by another backup job.  Beyond the usual question on when might we see 
>> this fixed, is a more current question, is there any work around to prevent 
>> it that can be suggested?
> Is there a bug report open on this particular issue?
> 
> I ask because I am not aware of any specific issues of this nature in 
> 5.2.13.
> 
> Best regards,
> Kern


Hi Kern,

I am not sure if this is the same issue that Patti was reporting, abut I think
it may be at least similar.  I have been experiencing this for quite some time
now, but never thought it was worth writing - mostly because it was a little
confusing to describe accurately and it was not causing jobs to fail.

Patti's post and your reply got me to start thinking again and to
re-investigate.  Hope this _is_ related and I am not hijacking a thread :)


In my case, it is not holding up jobs for very long (~ 8mins), but I get
notified via our helpdesk every night on this one particular job due to the
"intervention necessary" email that is triggered.

I am using a two-drive configuration with vchanger.

Here's part of the the log file for last night's run of the job which starts
at 2:45am:

--[snip]--
21-May 02:45 bacula-dir JobId 20778: Start Backup JobId 20778,
Job=Zimbra.2013-05-21_02.45.00_46
21-May 02:45 bacula-dir JobId 20778: Using Device "eSATA-0" to write.
21-May 02:45 bacula-sd JobId 20778: 3307 Issuing autochanger "unload slot 14,
drive 0" command.
21-May 02:48 bacula-sd JobId 20778: Warning: Volume "c0_0008_0015" wanted on
"eSATA-0" (/var/lib/bacula/vchanger/c0/0/drive0) is in use by device "eSATA-1"
(/var/lib/bacula/vchanger/c0/1/drive1)
21-May 02:48 bacula-sd JobId 20778: Warning: mount.c:219 Open device "eSATA-0"
(/var/lib/bacula/vchanger/c0/0/drive0) Volume "c0_0008_0015" failed:
ERR=dev.c:575 Could not open: /var/lib/bacula/vchanger/c0/0/drive0, ERR=No
such file or directory

21-May 02:51 bacula-sd JobId 20778: Warning: Volume "c0_0008_0015" wanted on
"eSATA-0" (/var/lib/bacula/vchanger/c0/0/drive0) is in use by device "eSATA-1"
(/var/lib/bacula/vchanger/c0/1/drive1)
21-May 02:51 bacula-sd JobId 20778: Please mount append Volume "c0_0008_0015"
or label a new one for:
    Job:          Zimbra.2013-05-21_02.45.00_46
    Storage:      "eSATA-0" (/var/lib/bacula/vchanger/c0/0/drive0)
    Pool:         Offsite-eSATA
    Media type:   Offsite-File
21-May 02:56 bacula-sd JobId 20778: Warning: mount.c:219 Open device "eSATA-0"
(/var/lib/bacula/vchanger/c0/0/drive0) Volume "c0_0008_0015" failed:
ERR=dev.c:575 Could not open: /var/lib/bacula/vchanger/c0/0/drive0, ERR=No
such file or directory

21-May 02:56 bacula-sd JobId 20778: 3304 Issuing autochanger "load slot 15,
drive 0" command.
21-May 02:56 bacula-sd JobId 20778: 3305 Autochanger "load slot 15, drive 0",
status is OK.
21-May 02:56 bacula-sd JobId 20778: Volume "c0_0008_0015" previously written,
moving to end of data.
21-May 02:56 bacula-sd JobId 20778: Ready to append to end of Volume
"c0_0008_0015" size=4643639201
21-May 03:29 bacula-sd JobId 20778: User defined maximum volume capacity
10,737,418,240 exceeded on device "eSATA-0"
(/var/lib/bacula/vchanger/c0/0/drive0).
21-May 03:29 bacula-sd JobId 20778: End of medium on Volume "c0_0008_0015"
Bytes=10,737,377,613 Blocks=166,442 at 21-May-2013 03:29.
21-May 03:29 bacula-sd JobId 20778: 3307 Issuing autochanger "unload slot 15,
drive 0" command.
21-May 03:29 bacula-dir JobId 20778: Recycled volume "c0_0008_0018"
21-May 03:29 bacula-dir JobId 20778: Using Volume "c0_0008_0018" from
'Scratch' pool.
--[snip]--

...And the job continues fine to completion, swapping volumes as necessary and
not sending any "intervention necessary" notifications.


All prior jobs used volume c0_0008_0015 which was initially recycled and
loaded into drive 1 when the first nightly job started at 8:30pm:

--[snip]--
20-May 20:30 bacula-dir JobId 20772: Start Backup JobId 20772,
Job=SpeedyFull.2013-05-20_20.30.01_00
0-May 20:30 bacula-dir JobId 20772: Recycled volume "c0_0008_0015"
20-May 20:30 bacula-dir JobId 20772: Using Volume "c0_0008_0015" from
'Scratch' pool.
20-May 20:30 bacula-dir JobId 20772: Using Device "eSATA-1" to write.
20-May 20:29 speedy-fd JobId 20772: DIR and FD clocks differ by -37 seconds,
FD automatically compensating.
20-May 20:30 bacula-sd JobId 20772: 3304 Issuing autochanger "load slot 15,
drive 1" command.
20-May 20:30 bacula-sd JobId 20772: 3305 Autochanger "load slot 15, drive 1",
status is OK.
20-May 20:30 bacula-sd JobId 20772: Recycled volume "c0_0008_0015" on device
"eSATA-1" (/var/lib/bacula/vchanger/c0/1/drive1), all previous data lost.
--[snip]--


So, it appears that the first job uses drive1 in my changer "c0", and all
other jobs that start at 8:30 are happy to also use that volume in drive1 of
changer c0, but for some reason, a few things happen each night when the
Zimbra job runs by itself several hours later:

1. The Zimbra job wants to append to the volume that was just recently used by
all the other jobs - this is correct behavior since the job writes to the same
pool as all the other jobs and that volume is still appendable.

2. The Zimbra job always wants to use drive0 even though all the others used
drive1. There are no other jobs running that are still writing to drive1

3. Instead of just moving the volume to the drive it wants, the Zimbra job
complains each night and asks for intervention.

4. After about 8 minutes, the Zimbra job unloads the volume from drive 1 and
loads it into drive0 and continues without issue.

Is there anything I may have mis configured, or is there any more information
I can provide to see if I can get to the bottom of this?

This is with Bacula 5.2.13 currently.

Thanks!

--
Bill Arlofski
Reverse Polarity, LLC
http://www.revpol.com/

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users