Bacula-users

Re: [Bacula-users] Max Wait Time sometimes crash Storage Daemon

2008-05-19 12:56:51
Subject: Re: [Bacula-users] Max Wait Time sometimes crash Storage Daemon
From: Adam Cécile <adam.cecile AT linbox DOT com>
To: Arno Lehmann <al AT its-lehmann DOT de>
Date: Mon, 19 May 2008 18:19:41 +0200
Okay,

Bacula-sd and director are currently running with -d 100 and we plan to 
NOT change the tape tonight.
I hope we'll get and useful traceback. If not I'll rebuild the Debian 
package with NOSTRIP and see if sd send us a backtrace.

Regards, Adam.

Arno Lehmann a écrit :
> Hi,
>
> 19.05.2008 08:52, Adam Cécile wrote:
>   
>> Hi,
>>
>> Could you please tell me more about how to get a useful traceback ?
>>     
>
> You need gdb installed and the binaries you run should not be 
> stripped. The former is usually ensured by your package manager, the 
> latter is typically done by compiling from source. I beluieve you need 
> the -g switch to gcc, and during install, you skip the strip process.
>
> Bacula itself has a script that is automatically called when a program 
> crashes, which will create a backtrace and mail it to the configured 
> operator.
>
> You best verify that all this works, for example by sending a signal 
> to the debug-version SD.
>
> Also, have a look at the mail I just send regarding Marias problem... 
> the information might be helpful for you, too. And I suspect the 
> problems might be related.
>
> Arno
>
>   
>> Thanks in advance,
>>
>> Regards, Adam.
>>
>> Kern Sibbald a écrit :
>>     
>>> On Friday 16 May 2008 03:09:26 Adam Cécile wrote:
>>>   
>>>       
>>>> Reported as #1087:
>>>> http://bugs.bacula.org/view.php?id=1087
>>>>     
>>>>         
>>> OK, thanks.  If you haven't already done so, please attach a traceback when 
>>> it
>>> crashes, as well as your bacula-dir.conf and bacula-sd.conf files.
>>>
>>> Thanks,
>>>
>>> Kern
>>>
>>>   
>>>       
>>>> Best regards, Adam.
>>>>
>>>> Kern Sibbald a écrit :
>>>>     
>>>>         
>>>>> Hello Adam,
>>>>>
>>>>> If the SD is crashing, then there is definitely a bug and you should open
>>>>> a bug report.  It would be preferable if you move up to version 2.2.8 as
>>>>> it simplifies things for me in debugging and finding the problems.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Kern
>>>>>
>>>>> On Thursday 15 May 2008 09:05:02 Adam Cécile wrote:
>>>>>       
>>>>>           
>>>>>> Hello,
>>>>>>
>>>>>> I use Max Wait Time to cancel jobs that are left in queue because no
>>>>>> tapes are available.
>>>>>> This is useful when our customers forget to load a new set of tapes into
>>>>>> the changer.
>>>>>>
>>>>>> The problem is that SD crashes in this case, here a sample of logs:
>>>>>> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005"
>>>>>> or label a new one for:
>>>>>> Job: pdc1.it-lyon.2008-05-02_21.00.26
>>>>>> Storage: "Dell-LTO2" (/dev/nst0)
>>>>>> Pool: Friday
>>>>>> Media type: LTO2
>>>>>>
>>>>>> Then:
>>>>>> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582,
>>>>>> Job=intox1.it-lyon.2008-05-02_21.00.28
>>>>>> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2"
>>>>>> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal
>>>>>> error: job.c:1808 Comm error with SD. bad response to Append Data.
>>>>>> ERR=Aucune donnée disponible
>>>>>> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir
>>>>>> 2.2.5 (09Oct07): 06-May-2008 11:11:01
>>>>>>
>>>>>> Bacula-sd processus sometimes wipes, sometimes it keeps running but
>>>>>> doesn't work anymore until we restart it.
>>>>>>
>>>>>> Another log example:
>>>>>>
>>>>>> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or
>>>>>> label a new one for:
>>>>>> Job: atp-data.2008-05-02_22.00.44
>>>>>> Storage: "Drive-1" (/dev/nst0)
>>>>>> Pool: Weekly
>>>>>> Media type: LTO3
>>>>>> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or
>>>>>> label a new one for:
>>>>>> Job: atp-data.2008-05-02_22.00.44
>>>>>> Storage: "Drive-1" (/dev/nst0)
>>>>>> Pool: Weekly
>>>>>> Media type: LTO3
>>>>>> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded
>>>>>> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job
>>>>>> atp-data.2008-05-02_22.00.44
>>>>>> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41,
>>>>>> Transfer rate = 3.350 M bytes/second
>>>>>> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network
>>>>>> send error to SD. ERR=Broken pipe
>>>>>> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8
>>>>>> (26Jan08): 08-mai-2008 12:23:41
>>>>>>
>>>>>> This is a serious issue as Max Wait Time can't be used (always crash).
>>>>>>
>>>>>> Could you please tell me if this is a known issue or not ? If not, a
>>>>>> customer is okay to "forget to change the tape" so I can provide you
>>>>>> some debugging backtraces if needed.
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Best regards, Adam.
>>>>>>         
>>>>>>             
>>>   
>>>       
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Microsoft 
>> Defy all challenges. Microsoft(R) Visual Studio 2008. 
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>     
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users