Bacula-users

Re: [Bacula-users] Max Wait Time sometimes crash Storage Daemon

2008-05-19 03:42:46
Subject: Re: [Bacula-users] Max Wait Time sometimes crash Storage Daemon
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 19 May 2008 09:42:15 +0200
Hi,

19.05.2008 08:52, Adam Cécile wrote:
> Hi,
> 
> Could you please tell me more about how to get a useful traceback ?

You need gdb installed and the binaries you run should not be 
stripped. The former is usually ensured by your package manager, the 
latter is typically done by compiling from source. I beluieve you need 
the -g switch to gcc, and during install, you skip the strip process.

Bacula itself has a script that is automatically called when a program 
crashes, which will create a backtrace and mail it to the configured 
operator.

You best verify that all this works, for example by sending a signal 
to the debug-version SD.

Also, have a look at the mail I just send regarding Marias problem... 
the information might be helpful for you, too. And I suspect the 
problems might be related.

Arno

> Thanks in advance,
> 
> Regards, Adam.
> 
> Kern Sibbald a écrit :
>> On Friday 16 May 2008 03:09:26 Adam Cécile wrote:
>>   
>>> Reported as #1087:
>>> http://bugs.bacula.org/view.php?id=1087
>>>     
>> OK, thanks.  If you haven't already done so, please attach a traceback when 
>> it
>> crashes, as well as your bacula-dir.conf and bacula-sd.conf files.
>>
>> Thanks,
>>
>> Kern
>>
>>   
>>> Best regards, Adam.
>>>
>>> Kern Sibbald a écrit :
>>>     
>>>> Hello Adam,
>>>>
>>>> If the SD is crashing, then there is definitely a bug and you should open
>>>> a bug report.  It would be preferable if you move up to version 2.2.8 as
>>>> it simplifies things for me in debugging and finding the problems.
>>>>
>>>> Best regards,
>>>>
>>>> Kern
>>>>
>>>> On Thursday 15 May 2008 09:05:02 Adam Cécile wrote:
>>>>       
>>>>> Hello,
>>>>>
>>>>> I use Max Wait Time to cancel jobs that are left in queue because no
>>>>> tapes are available.
>>>>> This is useful when our customers forget to load a new set of tapes into
>>>>> the changer.
>>>>>
>>>>> The problem is that SD crashes in this case, here a sample of logs:
>>>>> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005"
>>>>> or label a new one for:
>>>>> Job: pdc1.it-lyon.2008-05-02_21.00.26
>>>>> Storage: "Dell-LTO2" (/dev/nst0)
>>>>> Pool: Friday
>>>>> Media type: LTO2
>>>>>
>>>>> Then:
>>>>> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582,
>>>>> Job=intox1.it-lyon.2008-05-02_21.00.28
>>>>> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2"
>>>>> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal
>>>>> error: job.c:1808 Comm error with SD. bad response to Append Data.
>>>>> ERR=Aucune donnée disponible
>>>>> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir
>>>>> 2.2.5 (09Oct07): 06-May-2008 11:11:01
>>>>>
>>>>> Bacula-sd processus sometimes wipes, sometimes it keeps running but
>>>>> doesn't work anymore until we restart it.
>>>>>
>>>>> Another log example:
>>>>>
>>>>> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or
>>>>> label a new one for:
>>>>> Job: atp-data.2008-05-02_22.00.44
>>>>> Storage: "Drive-1" (/dev/nst0)
>>>>> Pool: Weekly
>>>>> Media type: LTO3
>>>>> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or
>>>>> label a new one for:
>>>>> Job: atp-data.2008-05-02_22.00.44
>>>>> Storage: "Drive-1" (/dev/nst0)
>>>>> Pool: Weekly
>>>>> Media type: LTO3
>>>>> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded
>>>>> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job
>>>>> atp-data.2008-05-02_22.00.44
>>>>> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41,
>>>>> Transfer rate = 3.350 M bytes/second
>>>>> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network
>>>>> send error to SD. ERR=Broken pipe
>>>>> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8
>>>>> (26Jan08): 08-mai-2008 12:23:41
>>>>>
>>>>> This is a serious issue as Max Wait Time can't be used (always crash).
>>>>>
>>>>> Could you please tell me if this is a known issue or not ? If not, a
>>>>> customer is okay to "forget to change the tape" so I can provide you
>>>>> some debugging backtraces if needed.
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Best regards, Adam.
>>>>>         
>>
>>   
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft 
> Defy all challenges. Microsoft(R) Visual Studio 2008. 
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users