Bacula-users

Re: [Bacula-users] Job is waiting on Storage

2010-08-31 14:35:09
Subject: Re: [Bacula-users] Job is waiting on Storage
From: Marco Lertora <marco.lertora AT infoporto DOT it>
To: Bacula-users AT lists.sourceforge DOT net
Date: Tue, 31 Aug 2010 20:31:06 +0200
  Il 31/08/2010 17.27, Bill Arlofski ha scritto:
> On 08/31/10 08:44, Marco Lertora wrote:
>>    Hi!
>>
>> I've the same problem! anyone found a solution?
>>
>> I have 3 concurrent jobs, which backup from different fd to the same
>> device on sd.
>> All jobs use the same pool and the pool use "Maximum Volume Bytes" as
>> volume splitting policy, as suggested in docs.
>> All job has the same priority.
>>
>> Everything starts good, but after some volumes changes (becouse they
>> reach the max volume size) the storage lost the pool information of the
>> mounted volume
>> So, the jobs started after that, wait on sd for a mounted volume with
>> the same pool as the one wanted by the job.
>>
>> Regards
>> Marco Lertora
>
> Sorry for a "me too" post... But:
>
>
> I have been noticing the same thing here.  I just have not been able to
> monitor it and accurately document it.
>
> Basically it appears to be exactly what you have stated above. I am also using
> only disk storage with my "file tapes" configured to be a maximum of 10GB 
> each.
>
> I have seen a "status dir"  show me  "job xxx waiting on storage" and have
> noted that the job(s) waiting are of the same priority as the job(s) currently
> running and are configured to use the same device and pool.
>
> I have also noticed exactly what Lukas Kolbe described here where the job
> wants one pool, but thinks it has a "null named pool":
>
>> 3608 JobId=308 wants Pool="dp" but have Pool=""
> and here where the device is mounted, the volume name is known but the pool is
> unknown:
>
>> Device "dp1" (/var/bacula/diskpool/fs1) is mounted with:
>>       Volume:      Vol0349
>>       Pool:        *unknown*
>>       Media type:  File
>>       Total Bytes=11,726,668,867 Blocks=181,775 Bytes/block=64,512
>>       Positioned at File=2 Block=3,136,734,274
>
>
> So by all indications the job(s) that are "waiting on storage" should be
> running but are instead needlessly waiting.
>
>
> Initially, my thought was that I had the Pool in the jobs defined like:
>
> Pool = Default
>
> and the Default pool had no tapes in it - Bacula requires a Pool to be defined
> in a Job definition - Which is why I used "Default", but I was overriding the
> Pool in the Schedule like so:
>
> Schedule {
>    Name = WeeklyToOffsiteDisk
>          Run = Full              pool=Offsite-eSATA      sun     at 20:30
>          Run = Incremental       pool=Offsite-eSATA-Inc  mon-fri at 20:30
>          Run = Differential      pool=Offsite-eSATA-Diff sat     at 20:30
> }
>
>
> I have recently reconfigured my system to use one pool "Offsite-eSATA" and
> have set:
>
> Pool = Offsite-eSATA
>
> directly in all of the the Job definitions instead of using the Schedule
> override, but I am still seeing what you both have described.

Hi,
I've try to increse sd log with setdebug option but, no luck.
I've try to look in source, but they are quite complex so, no luck

this is the code where the match fail:

> static int is_pool_ok(DCR *dcr)
> {
>    DEVICE *dev = dcr->dev;
>    JCR *jcr = dcr->jcr;
>
>    /* Now check if we want the same Pool and pool type */
>    if (strcmp(dev->pool_name, dcr->pool_name) == 0 &&
>        strcmp(dev->pool_type, dcr->pool_type) == 0) {
>       /* OK, compatible device */
>       Dmsg1(dbglvl, "OK dev: %s num_writers=0, reserved, pool 
> matches\n", dev->print_name());
>       return 1;
>    } else {
>       /* Drive Pool not suitable for us */
>       Mmsg(jcr->errmsg, _(
> "3608 JobId=%u wants Pool=\"%s\" but have Pool=\"%s\" nreserve=%d on 
> drive %s.\n"),
>             (uint32_t)jcr->JobId, dcr->pool_name, dev->pool_name,
>             dev->num_reserved(), dev->print_name());
>       queue_reserve_message(jcr);
>       Dmsg2(dbglvl, "failed: busy num_writers=0, reserved, pool=%s 
> wanted=%s\n",
>          dev->pool_name, dcr->pool_name);
>    }
>    return 0;
> }

I suppose dev->pool_name was empty. this is confirmed by the code where
status message is build

>          if (dev->is_labeled()) {
>             len = Mmsg(msg, _("Device %s is mounted with:\n"
>                               "    Volume:      %s\n"
>                               "    Pool:        %s\n"
>                               "    Media type:  %s\n"),
>                dev->print_name(),
>                dev->VolHdr.VolumeName,
>                dev->pool_name[0]?dev->pool_name:"*unknown*",
>                dev->device->media_type);
>             sendit(msg, len, sp);
>          } else {

but I can't find where this property is set.
it happen in some but not all volume change and I think when storage or
probably a device end all running jobs

any bacula guru or developer can hear us?

Marco

>
> --
> Bill Arlofski
> Reverse Polarity, LLC
>
> ------------------------------------------------------------------------------
> This SF.net Dev2Dev email is sponsored by:
>
> Show off your parallel programming skills.
> Enter the Intel(R) Threading Challenge 2010.
> http://p.sf.net/sfu/intel-thread-sfd
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>