Bacula-users

Re: [Bacula-users] Max Wait Time setting

2015-07-09 12:19:02
Subject: Re: [Bacula-users] Max Wait Time setting
From: Brendan Martin <brendan AT physics.tamu DOT edu>
To: "Bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Thu, 09 Jul 2015 11:16:17 -0500
These backups are disk-based, so this is not a timing issue with a tape library.

From the documentation:
Max Wait Time = <time> The time specifies the maximum allowed time that a job may block waiting for a resource (such as waiting for a tape to be mounted, or waiting for the storage or file daemons to perform their duties), counted from the when the job starts, (not necessarily the same as when the job was scheduled). This directive works as expected since bacula 2.3.18.
Given that, my expectation is that if there are no available volumes when the job starts, the reported End time should be (in this case) 600 seconds after the reported Start time.  As seen in the log entries, that is not the case.  The Maximum Volumes setting was more than adequate for the job to run successfully, but instead it was cancelled immediately upon starting when Max Wait Time was set.  No other "max time" parameters are defined.

The wait timer appears to be starting at the Scheduled time, contrary to the documentation.  If I wanted it to work that way, I would use the Max Start Delay or Max Run Sched Time parameters instead.  I, too, suspect a bug, but notes in bug report 1586 indicate that the investigator was unable to reproduce the problem.

On 7/8/2015 9:06 PM, Ana Emília M. Arruda wrote:
Hello Brendan,

If you're working with tape libraries, I would recomend you to slightly increase this value. There are three device directives (maximum open/rewind/changer wait) that defaults to 600 seconds. So maybe you're not giving enough time for your volumes being mounted by Bacula.

About you log, for JobId 542, it exceeded the maximum wait time at 22:15, 10 minutes after the scheduled time, 22:05. So I suppose it is working as expected, since after the start time, a few minutes later, bacula tries the mount operation and then starts counting the Max Wait Time.

I don't know why the log shows start time equal to end time for these jobs, maybe this is a bug in this version.

Best regards,
Ana

On Wed, Jul 8, 2015 at 4:56 PM, Brendan Martin <brendan AT physics.tamu DOT edu> wrote:
Here are the logs for the jobs involved.
07-Jul 22:15 backup-dir JobId 542: Fatal error: Max wait time exceeded. Job canceled.
07-Jul 22:19 backup-dir JobId 542: Bacula backup-dir 5.2.6 (21Feb12):
  Build OS:               x86_64-pc-linux-gnu debian jessie/sid
  JobId:                  542
  Job:                    Pollux_OS_Backup.2015-07-07_22.05.05_02
  Backup Level:           Incremental, since=2015-07-06 22:16:45
  Client:                 "pollux-fd" 5.0.2 (28Apr10) x86_64-pc-linux-gnu,debian,6.0.6
  FileSet:                "Pollux OS Full Set" 2015-06-08 17:00:43
  Pool:                   "Pollux Daily" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File" (From Job resource)
  Scheduled time:         07-Jul-2015 22:05:05
  Start time:             07-Jul-2015 22:19:14
  End time:               07-Jul-2015 22:19:14
  Elapsed time:           0 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         
  Volume Session Id:      0
  Volume Session Time:    0
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  
  SD termination status:  
  Termination:            Backup Canceled


07-Jul 22:21 backup-dir JobId 544: Fatal error: Max wait time exceeded. Job canceled.
07-Jul 22:21 backup-dir JobId 544: Bacula backup-dir 5.2.6 (21Feb12):
  Build OS:               x86_64-pc-linux-gnu debian jessie/sid
  JobId:                  544
  Job:                    Frodo_Backup.2015-07-07_22.10.00_04
  Backup Level:           Incremental, since=2015-07-06 22:24:32
  Client:                 "frodo-fd" 5.2.6 (21Feb12) Microsoft Windows Server 2008 Standard Edition Service Pack 2 (build 6002), 32-bit,Cross-compile,Win32
  FileSet:                "Frodo Full Set" 2015-06-13 19:20:11
  Pool:                   "Frodo Daily" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File" (From Job resource)
  Scheduled time:         07-Jul-2015 22:10:00
  Start time:             07-Jul-2015 22:21:55
  End time:               07-Jul-2015 22:21:55
  Elapsed time:           0 secs
  Priority:               11
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         
  Volume Session Id:      0
  Volume Session Time:    0
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  
  SD termination status:  
  Termination:            Backup Canceled



On 7/8/2015 2:16 PM, Ana Emília M. Arruda wrote:
Hello Brendan,

On Wed, Jul 8, 2015 at 12:27 PM, Brendan Martin <brendan AT physics.tamu DOT edu> wrote:
I have a Bacula Director 5.2.6 running on Debian 8.1.

To avoid an excessive delay in the backup schedule in the event of
insufficient volumes, I added a Max Wait Time value of 600 (seconds) in
the JobDefs for a couple of clients.  Jobs for those clients
subsequently were cancelled immediately upon starting, even though the
documentation indicates that the countdown for this setting begins when
a job actually starts, rather than when it is scheduled.

​The documentation​ says that the countdown for this setting begins when the job asks for a volume mount. And this time is a slightly different time than the time that the jobs starts. But it should be used to avoid that a job stucks waiting for a volume as you want.  Also, it is working as expected in version 5.2.6 (i have been using this version for a long time). Are you having issues with this? Could you send to the list your log files for this jobs/issues?
 

This appears to be exactly the same issue described in bug report 1586,
which was never satisfactorily resolved.

Has anyone else encountered this and been able to get the setting to
work as documented?

Thanks,
Brendan

​ Best regards,
Ana ​
 



------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users



--

Brendan E. Martin | Senior Systems Administrator

Department of Physics and Astronomy | Texas A&M University
4242 TAMU | College Station, Texas  77843-4242

979.845.7717 main | 979.845.2590 fax | brendan AT physics.tamu DOT edu

http://physics.tamu.edu

------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>