Bacula-users

Re: [Bacula-users] how to debug a job

2015-01-22 08:35:08
Subject: Re: [Bacula-users] how to debug a job
From: "Clark, Patricia A." <clarkpa AT ornl DOT gov>
To: Dimitri Maziuk <dmaziuk AT bmrb.wisc DOT edu>, "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Thu, 22 Jan 2015 13:32:12 +0000
I back up over 300T each month.  For that reason I also stay as current as
is reasonably possible on versions of Bacula.  Having upgraded to version
7, I've found that the 6-day hard coded failure appears to have been
removed which has allowed several of my long-running jobs to complete
successfully.  With each release there are not only new features, but
important fixes.  The client version is less critical.

On that note, I also break large jobs into smaller jobs whenever possible.
 If a FULL backup takes 5 days, it will take at least 10 days to recover.
Although, I don't consider 316 GB to be overly large and it should only
take a few hours - but it depends.  You never mentioned how many files
were indicated in the 33 GB.  I have some backups in the 300-400GB range
that take 3-4 days only because it's 7-9 million little files.

Spool disks are shared and the more jobs in process, the slower all of the
jobs will be.  My spool disk is rather simplistic and I cannot speak to
your tiered storage, but I am betting it's also the final destination.
Run some I/O performance tests when your backups are running and when they
are not.  There may be issues there with the number of jobs.

Spooling is for the benefit of tape drives and databases.  What is the
benefit of spooling data for virtual tapes that are really disks?

Patti Clark
Linux System Administrator
R&D Systems Support Oak Ridge National Laboratory


On 1/21/15, 9:42 PM, "Dimitri Maziuk" <dmaziuk AT bmrb.wisc DOT edu> wrote:

>On 01/21/2015 06:41 PM, Bill Arlofski wrote:
>
>> Bacula has a hard-coded 6 day limit on a job's run time.   518401
>>seconds =
>> 6.00001157 days, so it appears that is the cause for the watchdog
>>killing the job.
>
>Hard-coded, huh? Nobody's tried backing up that big data I keep hearing
>about?
>
>> Does it ask you for a new volume?
>
>No. Good guess, but the storage is a vchanger and it's working just fine.
>
>I killed the lot and replaced the spool disk in the unlikely case that
>was slowing things down -- unlikely because it's a tler disk and
>shouldn't have this particular failure mode. I restarted the offending
>job, we'll see what happens...
>
>-- 
>Dimitri Maziuk
>Programmer/sysadmin
>BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>


------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users