Bacula-users

Re: [Bacula-users] SD crashes

2012-02-13 09:35:03
Subject: Re: [Bacula-users] SD crashes
From: Joe Nyland <joenyland AT me DOT com>
To: Bacula Users <Bacula-users AT lists.sourceforge DOT net>
Date: Mon, 13 Feb 2012 14:30:49 +0000 (GMT)
On 13 Feb, 2012,at 02:11 PM, John Drescher <drescherjm AT gmail DOT com> wrote:

2012/2/13 Joe Nyland <joenyland AT me DOT com>:
> Hello everyone,
>
> I hope someone would be able to offer any suggestions of why I am seeing the
> following behaviour in my current Bacula setup:
>
> Since the tail end of last week, I have been having issues with my MySQL
> backups in Bacula, where they would randomly appear to 'crash', normally
> when performing a copy of a backup to another pool - but I'm not sure yet if
> this is the trigger.
>
> Running 'status dir' after one of these 'crashes' gives the following output
> for the running jobs:
>
> Running Jobs:
> Console connected at 12-Feb-12 15:53
> Console connected at 13-Feb-12 06:58
>  JobId Level   Name                       Status
> ======================================================================
>   2107 Full    WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running
> <Crashed Job>
>   2108 Full    WebServer1_MySQL.2012-02-13_04.30.00_29 is running <Crashed
> Job>
>   2111 Full    MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for
> higher priority jobs to finish
>   2113 Full    TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution
>   2114 Full    MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting
> execution
>   2115 Full    WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting
> execution
>   2116 Full    WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error
>   2117 Full    TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting
> execution
>   2121 Full    MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting
> execution
>   2122 Full    WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting
> execution
>   2123 Full    WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error
>   2124 Full    TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting
> execution
>   2125 Full    MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error
>   2126 Full    WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error
> ====
>
> Once the above appears, I am unable to view the status of any storage
> resource on my SD:
>
> *status storage=FileServer1_Full
> Connecting to Storage daemon FileServer1_Full at FileServer1:9103
>
> FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu
> 10.04
> Daemon started 12-Feb-12 15:53, 92 Jobs run since started.
>  Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577
> max_bufs=994
> Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8
>
> Running Jobs:
> Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107
> Volume="WebServer1_MySQL_1325"
>     pool="WebServer1_MySQL" device="WebServer1_MySQL"
> (/mnt/backup/Bacula/Databases/WebServer1)
>     Files=4 Bytes=164,924 Bytes/sec=17
>     FDSocket closed
> ====
>
> Jobs waiting to reserve a drive:
> ====
>
> Terminated Jobs:
>  JobId  Level    Files      Bytes   Status   Finished        Name
> ===================================================================
>   2091  Full          2    92.45 K  OK       13-Feb-12 03:30
> TestServer_MySQL_Copy
>   2096  Full          5    2.258 M  OK       13-Feb-12 03:30
> MythTVServer1_MySQL_Copy
>   2098  Full          4    164.9 K  OK       13-Feb-12 03:30
> WebServer1_MySQL_Copy
>   2100  Full          2    92.45 K  OK       13-Feb-12 03:30
> TestServer_MySQL_Copy
>   2078  Full      1,145    2.942 G  OK       13-Feb-12 03:31 SVN_Copy
>   2102  Full          5    2.259 M  OK       13-Feb-12 04:01
> MythTVServer1_MySQL
>   2103  Full          4    164.9 K  OK       13-Feb-12 04:01
> WebServer1_MySQL
>   2104  Full          2    92.37 K  OK       13-Feb-12 04:01
> TestServer_MySQL
>   2105  Full          5    2.259 M  OK       13-Feb-12 04:30
> MythTVServer1_MySQL_Copy
>   2109  Full          2    92.37 K  OK       13-Feb-12 04:30
> TestServer_MySQL_Copy
> ====
>
> Device status:
> Device "Default" (/mnt/backup/Bacula) is not open.
> <snip>
> Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not
> open.
> Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is
> mounted with:
>     Volume:      WebServer1_MySQL_1325
>     Pool:        WebServer1_MySQL
>     Media type:  File
>     Total Bytes Read=0 Blocks Read=0 Bytes/block=0
>     Positioned at File=0 Block=0
> Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1)
> is not open.
> Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is
> not open.
> Device "WebServer1_Inc_Copy"
> (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open.
> <snip>
> Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not
> open.
> ====
>
> Used Volume status:
>
> NOTE: bconsole appears to crash here - no further output is produced, and
> bconsole does not respond to any key presses. I have to Ctrl + C to exit out
> from bconsole. Furthermore, the only way I can clear our the failed jobs
> from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service
> bacula-sd stop' twice, then restart the SD and restart bacula-director.
>
>
> What I have is for 4 of my clients I run a MySQL backup hourly at 00:00,
> 01:00, etc. I then copy the MySQL backups to another storage resource on my
> SD at 00:30, 01:30, etc. The MySQL databases which I am backing up are
> relatively small, the biggest of which is my Bacula catalog - ~160Mb -
> although this backup is currently disabled and the database backed up
> outside of Bacula until I can resolve this issue.
>
> Here's the config for one of the client's MySQL backups:
>
> JobDefs {
>   Name = DefaultBackup
>   Type = Backup
>   Accurate = yes
>   Level = Full
>   Client = FileServer1-fd
>   Messages = Standard
>   Pool = Default
>   Storage = Default
>   Priority = 10
>   Allow Duplicate Jobs = No
>   Cancel Lower Level Duplicates = yes
> }
>
> JobDefs {
>   Name = DefaultCopy
>   Type = Copy
>   Level = Full
>   Client = FileServer1-fd
>   Messages = Standard
>   Selection Type = PoolUncopiedJobs
>   Priority = 12
> }
>
> Job {
>   Name = TestServer_MySQL
>   Type = Backup
>   JobDefs = DefaultBackup
>   Client = TestServer-fd
>   FileSet = "MySQL Databases"
>   ClientRunBeforeJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh
> bacula_backup Gromit123"
>   ClientRunAfterJob = "/etc/bacula/scripts/client-scripts/mysql-backup.sh
> cleanup"
>   Schedule = "Hourly MySQL Database Schedule"
>   Messages = Standard
>   Pool = TestServer_MySQL
>   Storage = TestServer_MySQL
>   Enabled = No
> }
>
> Job {
>   Name = "TestServer_MySQL_Copy"
>   JobDefs = DefaultCopy
>   Type = Copy
>   Client = TestServer-fd
>   FileSet = "MySQL Databases"
>   Pool = TestServer_MySQL
>   Messages = Standard
>   Schedule = "Hourly MySQL Database Copy Schedule"
>   Storage = TestServer_MySQL
>   Enabled = No
> }
>
> Reading back through console messages leading up to the crash, there doesn't
> appear to be any suggestion for why the jobs have crashed, only messages
> about duplicate jobs not being allowed for the jobs which are queued after
> the crashed jobs at the top of the queue.
>
>
> If I can provide any further information to help diagnose this issue, please
> let me know and I will be able to provide it.
>

I would look at the log for the sd. One way to get this is to run
bacula-sd in a console with the debug -d 100 option enabled instead of
running it as a daemon. You can also google for bacula kaboom for more
debugging tips.


John
 
Hi John,

Thank you for your reply too - only just received it after replying to Adrian Reyer.

That sounds like a logical step to me too. I'll set this up later on, so that it's in place for when it happens again.

Thank you for your input.

Joe

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users