Bacula-users

Re: [Bacula-users] SD Losing Track of Pool

2011-01-24 12:42:43
Subject: Re: [Bacula-users] SD Losing Track of Pool
From: Peter Zenge <pzenge AT ilinc DOT com>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Mon, 24 Jan 2011 10:38:57 -0700
> -----Original Message-----
> From: Peter Zenge [mailto:pzenge AT ilinc DOT com]
> Sent: Thursday, January 20, 2011 10:59 AM
> To: bacula-users AT lists.sourceforge DOT net
> Subject: Re: [Bacula-users] SD Losing Track of Pool
> 
> > -----Original Message-----
> > From: Steve Ellis [mailto:ellis AT brouhaha DOT com]
> > Sent: Thursday, January 20, 2011 10:39 AM
> > To: bacula-users AT lists.sourceforge DOT net
> > Subject: Re: [Bacula-users] SD Losing Track of Pool
> >
> > On 1/20/2011 7:18 AM, Peter Zenge wrote:
> > >>
> > >>> Second, in the Device Status section at the bottom, the pool of
> LF-
> > F-
> > >> 0239 is
> > >>> listed as "*unknown*"; similarly, under "Jobs waiting to reserve
> a
> > >> drive",
> > >>> each job wants the correct pool, but the current pool is listed
> as
> > >> "".
> > >>
> > > Admittedly I confused the issue by posting an example with two
> Pools
> > involved.  Even in that example though, there were jobs using the
> same
> > pool as the mounted volume, and they wouldn't run until the 2 current
> > jobs were done (which presumably allowed the SD to re-mount the same
> > volume, set the current mounted pool correctly, and then 4 jobs were
> > able to write to that volume concurrently, as designed.
> > >
> > > I saw this issue two other times that day; each time the SD changed
> > the mounted pool from "LF-Inc" to "*unknown*" and that brought
> > concurrency to a screeching halt.
> > >
> > > Certainly I could bypass this issue by having a dedicated volume
> and
> > device for each backup client, but I have over 50 clients right now
> and
> > it seems like that should be unnecessary.  Is that what other people
> > who write to disk volumes do?
> > I've been seeing this issue myself--it only seems to show up for me
> if
> > a
> > volume change happens during a running backup.  Once that happens,
> > parallelism using that device is lost.  For me this doesn't happen
> too
> > often, as I don't have that many parallel jobs, and most of my
> backups
> > are to LTO3, so volume changes don't happen all that often either.
> > However, it is annoying.
> >
> > I thought I had seen something that suggested to me that this issue
> > might be fixed in 5.0.3, I've recently switched to 5.0.3, but haven't
> > seen any pro or con results yet.
> >
> > On a somewhat related note, it seemed to me that during despooling,
> all
> > other spooling jobs stop spooling--this might be intentional, I
> > suppose,
> > but I think my disk subsystem would be fast enough to keep up one
> > despool to LTO3, while other jobs could continue to spool--I could
> > certainly understand if no other job using the same device was
> allowed
> > to start despooling during a despool, but that isn't what I observe.
> >
> > If my observations are correct, it would be nice if this was a
> > configurable choice (with faster tape drives, few disk subsystems
> would
> > be able to handle a despool and spooling at the same time)--some of
> my
> > jobs stall long enough when this happens to allow some of my desktop
> > backup clients to go to standby--which means those jobs will fail (my
> > backup strategy uses Wake-on-LAN to wake them up in the first place).
> > I
> > certainly could spread my jobs out more in time, if necessary, to
> > prevent this, but I like for the backups to happen at night when no
> one
> > is likely to be using the systems for anything else.  I guess another
> > option would be to launch a keepalive WoL script when a job starts,
> and
> > arrange that the keepalive program be killed when the job completes.
> >
> > -se
> >
> 
> 
> Agree about the volume change.  In fact I'm running a backup right now
> that should force a volume change in a couple of hours, and I'm
> watching the SD status to see if the mounted pool becomes unknown
> around that time.  I have certainly noticed that long-running jobs seem
> to cause this issue, and it occurred to me that long-running jobs also
> have a higher chance of spanning volumes.
> 
> If that's what I see, then I will upgrade to 5.0.3.  I can do that
> pretty quickly, and will report back...
> 
> 

Steve, I can confirm that it is the volume change that causes this issue.  
Luckily I can also confirm that it is fixed in 5.0.3.  Shaved 18 hours off my 
backup window this past weekend!

I should have upgraded to 5.0.3 before bothering the list.  Thanks to everyone 
who responded.


------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users