Bacula-users

Re: [Bacula-users] SD Losing Track of Pool

2011-01-20 12:54:19
Subject: Re: [Bacula-users] SD Losing Track of Pool
From: Steve Ellis <ellis AT brouhaha DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 20 Jan 2011 09:38:35 -0800
On 1/20/2011 7:18 AM, Peter Zenge wrote:
>>
>>> Second, in the Device Status section at the bottom, the pool of LF-F-
>> 0239 is
>>> listed as "*unknown*"; similarly, under "Jobs waiting to reserve a
>> drive",
>>> each job wants the correct pool, but the current pool is listed as
>> "".
>>
> Admittedly I confused the issue by posting an example with two Pools 
> involved.  Even in that example though, there were jobs using the same pool 
> as the mounted volume, and they wouldn't run until the 2 current jobs were 
> done (which presumably allowed the SD to re-mount the same volume, set the 
> current mounted pool correctly, and then 4 jobs were able to write to that 
> volume concurrently, as designed.
>
> I saw this issue two other times that day; each time the SD changed the 
> mounted pool from "LF-Inc" to "*unknown*" and that brought concurrency to a 
> screeching halt.
>
> Certainly I could bypass this issue by having a dedicated volume and device 
> for each backup client, but I have over 50 clients right now and it seems 
> like that should be unnecessary.  Is that what other people who write to disk 
> volumes do?
I've been seeing this issue myself--it only seems to show up for me if a 
volume change happens during a running backup.  Once that happens, 
parallelism using that device is lost.  For me this doesn't happen too 
often, as I don't have that many parallel jobs, and most of my backups 
are to LTO3, so volume changes don't happen all that often either.  
However, it is annoying.

I thought I had seen something that suggested to me that this issue 
might be fixed in 5.0.3, I've recently switched to 5.0.3, but haven't 
seen any pro or con results yet.

On a somewhat related note, it seemed to me that during despooling, all 
other spooling jobs stop spooling--this might be intentional, I suppose, 
but I think my disk subsystem would be fast enough to keep up one 
despool to LTO3, while other jobs could continue to spool--I could 
certainly understand if no other job using the same device was allowed 
to start despooling during a despool, but that isn't what I observe.

If my observations are correct, it would be nice if this was a 
configurable choice (with faster tape drives, few disk subsystems would 
be able to handle a despool and spooling at the same time)--some of my 
jobs stall long enough when this happens to allow some of my desktop 
backup clients to go to standby--which means those jobs will fail (my 
backup strategy uses Wake-on-LAN to wake them up in the first place).  I 
certainly could spread my jobs out more in time, if necessary, to 
prevent this, but I like for the backups to happen at night when no one 
is likely to be using the systems for anything else.  I guess another 
option would be to launch a keepalive WoL script when a job starts, and 
arrange that the keepalive program be killed when the job completes.

-se

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users