Bacula-users

Re: [Bacula-users] bacula-fd crashes on FreeBSD 9.2

2013-10-30 16:52:13
Subject: Re: [Bacula-users] bacula-fd crashes on FreeBSD 9.2
From: dweimer <dweimer AT dweimer DOT net>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 30 Oct 2013 15:48:30 -0500
On 10/16/2013 5:43 pm, David Newman wrote:
> On 10/16/13 12:44 PM, dweimer wrote:
>> On 10/16/2013 2:13 pm, David Newman wrote:
>>> On 10/14/13 2:44 AM, Martin Simmons wrote:
>>>>>>>>> On Sun, 13 Oct 2013 18:25:07 -0700, David Newman said:
>>>>> 
>>>>> On 10/9/13 4:41 PM, David Newman wrote:
>>>>>> FreeBSD 9.2-RELEASE, bacula-client-5.2.12_3 installed from ports
>>>>>> 
>>>>>> Ever since upgrading this host to FreeBSD 9.2, bacula-fd crashes 
>>>>>> as
>>>>>> soon
>>>>>> as bacula-dir starts a backup job. The entry in /var/log/messages
>>>>>> is:
>>>>>> 
>>>>>> Oct  9 16:25:50 o bacula-fd: Bacula interrupted by signal 0: 
>>>>>> UNKNOWN
>>>>>> SIGNAL
>>>>>> 
>>>>>> Backups worked fine on this host running FreeBSD 9.1 and other 
>>>>>> hosts
>>>>>> upgraded to FreeBSD 9.2 run backups OK.
>>>>>> 
>>>>>> I've done the uninstall/reinstall thing with the bacula-client 
>>>>>> port,
>>>>>> but
>>>>>> that made no difference.
>>>>>> 
>>>>>> Thanks in advance for troubleshooting clues.
>>>>>> 
>>>>>> dn
>>>>> 
>>>>> Is there a Wireshark decode for Bacula?
>>>>> 
>>>>> I'm still stuck on this problem, and need more info on what's 
>>>>> causing
>>>>> that UNKNOWN SIGNAL error. Wireshark 1.8.6 just shows strings of
>>>>> bytes
>>>>> for the Bacula stuff.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> dn
>>>> 
>>>> A wireshark decode won't help much here because problems like this
>>>> must be in
>>>> the fd itself.
>>>> 
>>>> Try attaching gdb to the bacula-fd process and see if it catches the
>>>> mysterious signal (see
>>>> http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION00640000000000000000).
>>> 
>>> No luck with this. Per that URL, I've put the btraceback.gdb file in
>>> the
>>> same directory as the bacula-fd executable on the client (in this 
>>> case,
>>> /usr/local/sbin) and made the .gdb file executable.
>>> 
>>> At run time it produces this error:
>>> 
>>> /usr/local/sbin/btraceback.gdb:1: Error in sourced command file:
>>> No symbol table is loaded.  Use the "file" command.
>>> 
>>> That's problem 1. Problem 2 is that the syntax given for capturing
>>> STDERR and STDOUT -- 2>\&1 -- doesn't work on either csh (root's
>>> default
>>> on FreeBSD) or bash.
>>> 
>>> Any ideas on remedying either issue?
>>> 
>>> Thanks.
>>> 
>>> dn
>>> 
>> 
>> I have 2>&1, no backslash before the ampersand used with /bin/sh in
>> several cron scripts, on FreeBSD seems to do the job
> 
> Thanks, that works for capturing STDERR and STDOUT.
> 
> But that .gdb file still produces the same error:
> 
> /usr/local/sbin/btraceback.gdb:1: Error in sourced command file:
> No symbol table is loaded.  Use the "file" command.
> 
> So, I'm still blocked on debugging this issue.
> 
> dn
> 
> 

Well one of my FreeBSD 9.2 systems decided to take a new route to this 
problem.  My backups starting failing this morning, without the 
bacula-fd process stopping, it starts the client run before job script, 
then after two hours fails with no response from the client.

2013-10-30 07:52:34   bacula-dir JobId 291: Start Backup JobId 291, 
Job=Webmail-Backup.2013-10-30_07.52.32_46
2013-10-30 07:52:34   bacula-dir JobId 291: Using Device "FileStorage"
2013-10-30 07:52:35   webmail-fd JobId 291: shell command: run 
ClientRunBeforeJob "/root/bacula/before.sh"
2013-10-30 07:52:35   webmail-fd JobId 291: ClientRunBeforeJob:
2013-10-30 07:52:35   webmail-fd JobId 291: ClientRunBeforeJob: Create 
PostgreSQL Backup...
2013-10-30 07:52:35   webmail-fd JobId 291: ClientRunBeforeJob:
2013-10-30 07:52:35   webmail-fd JobId 291: ClientRunBeforeJob: Getting 
Database List
2013-10-30 07:52:35   webmail-fd JobId 291: ClientRunBeforeJob:
2013-10-30 09:58:46 bacula-dir JobId 291: Fatal error: Socket error on 
ClientRunBeforeJob command: ERR=Connection reset by peer

2013-10-30 09:58:46   bacula-dir JobId 291: Fatal error: Client 
"webmail-fd" RunScript failed.
2013-10-30 09:58:46 bacula-dir JobId 291: Fatal error: Network error 
with FD during Backup: ERR=Connection reset by peer

2013-10-30 09:58:47   bacula-dir JobId 291: Fatal error: No Job status 
returned from FD.
2013-10-30 09:58:47   bacula-dir JobId 291: Error: Bacula bacula-dir 
5.2.12 (12Sep12):
   Build OS:               amd64-portbld-freebsd9.2 freebsd 9.2-RELEASE
   JobId:                  291
   Job:                    Webmail-Backup.2013-10-30_07.52.32_46
   Backup Level:           Incremental, since=2013-10-29 00:07:02
   Client:                 "webmail-fd" 5.2.12 (12Sep12) 
amd64-portbld-freebsd9.2,freebsd,9.2-RELEASE
   FileSet:                "WebmailZFS-FileSet" 2013-09-27 13:12:07
   Pool:                   "File" (From Job resource)
   Catalog:                "MyCatalog" (From Client resource)
   Storage:                "File" (From Pool resource)
   Scheduled time:         30-Oct-2013 07:52:30
   Start time:             30-Oct-2013 07:52:34
   End time:               30-Oct-2013 09:58:47
   Elapsed time:           2 hours 6 mins 13 secs
   Priority:               10
   FD Files Written:       0
   SD Files Written:       0
   FD Bytes Written:       0 (0 B)
   SD Bytes Written:       0 (0 B)
   Rate:                   0.0 KB/s
   Software Compression:   None
   VSS:                    no
   Encryption:             no
   Accurate:               no
   Volume name(s):
   Volume Session Id:      6
   Volume Session Time:    1383098903
   Last Volume Bytes:      27,632,643,492 (27.63 GB)
   Non-fatal FD errors:    1
   SD Errors:              0
   FD termination status:  Error
   SD termination status:  OK
   Termination:            *** Backup Error ***


When I check this server, the client run before job script completed, 
all the database dumps, were successful, and the ZFS snapshots that 
follow the Database dumps complete as well.  However Bacula stops 
returning the script's status.

This server was running fine on up through the full backup done Monday 
morning, but now comes right back to this problem on every attempt to 
backup today.  A reboot didn't help, trying a full backup instead of 
incremental made no difference.

Canceled one of the attempts, and restarted after removing the client 
run before script, its now backing up files just fine. so I have 
temporarily setup a cron job to run 30 minutes before backup to execute 
my database backups and zfs snapshots.  and removed the client run 
before job.

I can find no errors logged on the server running the bacula-fd or the 
bacula server with the exception of the timeout error message.  Tried 
adding heartbeat interval of 1 minute on the client, that didn't help 
either.

-- 
Thanks,
    Dean E. Weimer
    http://www.dweimer.net/

------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users