Bacula-users

[Bacula-users] My experience with Bacula error reports around a summer time change 'incident'

2009-04-07 21:54:15
Subject: [Bacula-users] My experience with Bacula error reports around a summer time change 'incident'
From: Mark V <mvyver AT gmail DOT com>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 8 Apr 2009 12:50:42 +1100
Hi group,

Following up on my run in with backing up around a daylight saving time change,
the following sets out what the error reports were and why I found it
difficult to
work out exactly what was causing the problem, and this a discovered by
accident via bat.....

A Bacula (v 2.4.4) job blocks all others due to a summer time change.  The time
change causes the already completed job to be re-run, it doesn't, and blocks.
This prevents the error messages indicating that has gone wrong from appearing
in the Bacula log files or notification emails - only bat shows this error and
only when opened _once_ after the error occurred.

The following error reporting behavior obscured what was the problem.

1) Initial run of a full backup job runs and completes successfully.  The next
   job (catalog backup) also runs and completes with out error.

2) Received two emails indicating each job completed.

3) Expected a futher 3 emails indicating the three services, bacula-dir, fd
   and sd were running.  However no such emails were received.

4) Bacula log file showed the same content as the email received at 2) i.e. the
   last few lines of the log file suggest nothing is amiss:
05-Apr 02:51 bacula-dir JobId 432: Begin pruning Jobs.
05-Apr 02:51 bacula-dir JobId 432: No Jobs found to prune.
05-Apr 02:51 bacula-dir JobId 432: Begin pruning Files.
05-Apr 02:51 bacula-dir JobId 432: No Files found to prune.
05-Apr 02:51 bacula-dir JobId 432: End auto prune.

05-Apr 02:51 bacula-dir JobId 432: AfterJob: run command
"/home/mv/bin/bacula/delete_catalog_backup"

5) Received 'intervention needed' emails with the contents below.  I may be
   dense, but I didn't interpret 'Cannot find any appendable volumes' to mean
   'Volume "Full-MVLaptop-2009-04-05" already exists' and this is quite
   possibly my fault for how I have the email alerts configured (I assume I can
   change their contents?) :
05-Apr 02:05 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
   Storage:      "FileDevice" (/media/disk)
   Pool:         Full-Pool
   Media type:   File

6) Bacula-tray-monitor showed an ambiguous message indicating the same as the
   intervention email I received:

7) Opened bat and saw the following line that was my clue about a job trying to
   run twice:
05-Apr 02:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.

8) Closed and opened bat and did not see this error message again.....

9) The Bacula log file still showed only that the last job had completed
   successfully, the time stamp was 02:51.

10) Deleted the stalled job using bat console

11) NOW the log file shows the much more informative errors at 2:05, which
    comes after 2:51.... :)
05-Apr 02:51 bacula-dir JobId 432: Begin pruning Jobs.
05-Apr 02:51 bacula-dir JobId 432: No Jobs found to prune.
05-Apr 02:51 bacula-dir JobId 432: Begin pruning Files.
05-Apr 02:51 bacula-dir JobId 432: No Files found to prune.
05-Apr 02:51 bacula-dir JobId 432: End auto prune.

05-Apr 02:51 bacula-dir JobId 432: AfterJob: run command
"/home/mv/bin/bacula/delete_catalog_backup"
05-Apr 02:05 bacula-dir JobId 433: Start Backup JobId 433,
Job=MVLaptop.2009-04-05_02.05.00.10
05-Apr 02:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 02:05 bacula-dir JobId 433: Using Device "FileDevice"
05-Apr 02:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 02:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 02:05 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
    Storage:      "FileDevice" (/media/disk)
    Pool:         Full-Pool
    Media type:   File

12) I now get one email about the cancelled job and this contains what I wish I
    had got in the 'intervention needed' emails, as well as the log file:
05-Apr 02:05 bacula-dir JobId 433: Start Backup JobId 433,
Job=MVLaptop.2009-04-05_02.05.00.10
05-Apr 02:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 02:05 bacula-dir JobId 433: Using Device "FileDevice"
05-Apr 02:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 02:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 02:05 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
   Storage:      "FileDevice" (/media/disk)
   Pool:         Full-Pool
   Media type:   File
05-Apr 03:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 03:05 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
   Storage:      "FileDevice" (/media/disk)
   Pool:         Full-Pool
   Media type:   File
05-Apr 05:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 05:05 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
   Storage:      "FileDevice" (/media/disk)
   Pool:         Full-Pool
   Media type:   File
05-Apr 09:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 09:05 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
   Storage:      "FileDevice" (/media/disk)
   Pool:         Full-Pool
   Media type:   File
05-Apr 17:05 bacula-dir JobId 433: Error: sql_create.c:400 Volume
"Full-MVLaptop-2009-04-05" already exists.
05-Apr 17:05 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
   Storage:      "FileDevice" (/media/disk)
   Pool:         Full-Pool
   Media type:   File
06-Apr 07:38 bacula-fd JobId 433: Fatal error: job.c:1814 Comm error
with SD. bad response to Append Data. ERR=Interrupted system call
06-Apr 07:38 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
marked to be canceled.
06-Apr 07:38 bacula-sd JobId 433: Job MVLaptop.2009-04-05_02.05.00.10
canceled while waiting for mount on Storage Device ""FileDevice"
(/media/disk)".
06-Apr 07:38 bacula-dir JobId 433: Bacula bacula-dir 2.4.4 (28Dec08):
06-Apr-2009 07:38:57
 Build OS:               i686-pc-linux-gnu suse 11.0
 JobId:                  433
 Job:                    MVLaptop.2009-04-05_02.05.00.10
 Backup Level:           Full
 Client:                 "client-fd" 2.4.4 (28Dec08) i686-pc-linux-gnu,suse,11.0
 FileSet:                "Full Set" 2009-03-28 02:05:00
 Pool:                   "Full-Pool" (From Job FullPool override)
 Storage:                "File" (From Job resource)
 Scheduled time:         05-Apr-2009 02:05:00
 Start time:             05-Apr-2009 02:05:02
 End time:               06-Apr-2009 07:38:57
 Elapsed time:           1 day 5 hours 33 mins 55 secs
 Priority:               10
 FD Files Written:       0
 SD Files Written:       0
 FD Bytes Written:       0 (0 B)
 SD Bytes Written:       0 (0 B)
 Rate:                   0.0 KB/s
 Software Compression:   None
 VSS:                    no
 Storage Encryption:     no
 Volume name(s):
 Volume Session Id:      7
 Volume Session Time:    1238640563
 Last Volume Bytes:      0 (0 B)
 Non-fatal FD errors:    0
 SD Errors:              0
 FD termination status:  Error
 SD termination status:  Canceled
 Termination:            Backup Canceled

13) NOW the Bacula log file shows the same contents as the above email.

Apologies if that seems a bit of a whinge, it wasn't intended that way... it
just bring back memories of a very unsettling and then very frustrating
experience :)

HTH
Mark

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] My experience with Bacula error reports around a summer time change 'incident', Mark V <=