Bacula-users

[Bacula-users] SD crashes

2012-02-13 02:24:15
Subject: [Bacula-users] SD crashes
From: Joe Nyland <joenyland AT me DOT com>
To: Bacula Users <Bacula-users AT lists.sourceforge DOT net>
Date: Mon, 13 Feb 2012 07:21:03 +0000
Hello everyone,

I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup:

Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger.

Running 'status dir' after one of these 'crashes' gives the following output for the running jobs:

Running Jobs:
Console connected at 12-Feb-12 15:53
Console connected at 13-Feb-12 06:58
 JobId Level   Name                       Status
======================================================================
  2107 Full    WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running <Crashed Job>
  2108 Full    WebServer1_MySQL.2012-02-13_04.30.00_29 is running <Crashed Job>
  2111 Full    MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for higher priority jobs to finish
  2113 Full    TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution
  2114 Full    MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting execution
  2115 Full    WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting execution
  2116 Full    WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error
  2117 Full    TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting execution
  2121 Full    MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting execution
  2122 Full    WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting execution
  2123 Full    WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error
  2124 Full    TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting execution
  2125 Full    MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error
  2126 Full    WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error
====

Once the above appears, I am unable to view the status of any storage resource on my SD:

*status storage=FileServer1_Full
Connecting to Storage daemon FileServer1_Full at FileServer1:9103

FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu 10.04
Daemon started 12-Feb-12 15:53, 92 Jobs run since started.
 Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577 max_bufs=994
Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8

Running Jobs:
Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107 Volume="WebServer1_MySQL_1325"
    pool="WebServer1_MySQL" device="WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1)
    Files=4 Bytes=164,924 Bytes/sec=17
    FDSocket closed
====

Jobs waiting to reserve a drive:
====

Terminated Jobs:
 JobId  Level    Files      Bytes   Status   Finished        Name 
===================================================================
  2091  Full          2    92.45 K  OK       13-Feb-12 03:30 TestServer_MySQL_Copy
  2096  Full          5    2.258 M  OK       13-Feb-12 03:30 MythTVServer1_MySQL_Copy
  2098  Full          4    164.9 K  OK       13-Feb-12 03:30 WebServer1_MySQL_Copy
  2100  Full          2    92.45 K  OK       13-Feb-12 03:30 TestServer_MySQL_Copy
  2078  Full      1,145    2.942 G  OK       13-Feb-12 03:31 SVN_Copy
  2102  Full          5    2.259 M  OK       13-Feb-12 04:01 MythTVServer1_MySQL
  2103  Full          4    164.9 K  OK       13-Feb-12 04:01 WebServer1_MySQL
  2104  Full          2    92.37 K  OK       13-Feb-12 04:01 TestServer_MySQL
  2105  Full          5    2.259 M  OK       13-Feb-12 04:30 MythTVServer1_MySQL_Copy
  2109  Full          2    92.37 K  OK       13-Feb-12 04:30 TestServer_MySQL_Copy
====

Device status:
Device "Default" (/mnt/backup/Bacula) is not open.
<snip>
Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not open.
Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is mounted with:
    Volume:      WebServer1_MySQL_1325
    Pool:        WebServer1_MySQL
    Media type:  File
    Total Bytes Read=0 Blocks Read=0 Bytes/block=0
    Positioned at File=0 Block=0
Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1) is not open.
Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is not open.
Device "WebServer1_Inc_Copy" (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open.
<snip>
Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not open.
====

Used Volume status:

NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director.

What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, 01:00, etc. I then copy the MySQL backups to another storage resource on my SD at 00:30, 01:30, etc. The MySQL databases which I am backing up are relatively small, the biggest of which is my Bacula catalog - ~160Mb - although this backup is currently disabled and the database backed up outside of Bacula until I can resolve this issue.

Here's the config for one of the client's MySQL backups:

JobDefs {
  Name = DefaultBackup
  Type = Backup
  Accurate = yes
  Level = Full
  Client = FileServer1-fd
  Messages = Standard
  Pool = Default
  Storage = Default
  Priority = 10
  Allow Duplicate Jobs = No
  Cancel Lower Level Duplicates = yes
}

JobDefs {
  Name = DefaultCopy
  Type = Copy
  Level = Full
  Client = FileServer1-fd
  Messages = Standard
  Selection Type = PoolUncopiedJobs
  Priority = 12
}

Job {
  Name = TestServer_MySQL
  Type = Backup
  JobDefs = DefaultBackup
  Client = TestServer-fd
  FileSet = "MySQL Databases"
  ClientRunBeforeJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh bacula_backup Gromit123"
  ClientRunAfterJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh cleanup"
  Schedule = "Hourly MySQL Database Schedule"
  Messages = Standard
  Pool = TestServer_MySQL
  Storage = TestServer_MySQL
  Enabled = No
}

Job {
  Name = "TestServer_MySQL_Copy"
  JobDefs = DefaultCopy
  Type = Copy
  Client = TestServer-fd
  FileSet = "MySQL Databases"
  Pool = TestServer_MySQL
  Messages = Standard
  Schedule = "Hourly MySQL Database Copy Schedule"
  Storage = TestServer_MySQL
  Enabled = No
}

Reading back through console messages leading up to the crash, there doesn't appear to be any suggestion for why the jobs have crashed, only messages about duplicate jobs not being allowed for the jobs which are queued after the crashed jobs at the top of the queue.

If I can provide any further information to help diagnose this issue, please let me know and I will be able to provide it.

I hope someone can help, please.

Joe
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>