Bacula-users

Re: [Bacula-users] Max Wait Time sometimes crash Storage Daemon

2008-05-16 08:49:41
Subject: Re: [Bacula-users] Max Wait Time sometimes crash Storage Daemon
From: Kern Sibbald <kern AT sibbald DOT com>
To: Adam Cécile <adam.cecile AT linbox DOT com>
Date: Fri, 16 May 2008 08:31:13 -0400
On Friday 16 May 2008 03:09:26 Adam Cécile wrote:
> Reported as #1087:
> http://bugs.bacula.org/view.php?id=1087

OK, thanks.  If you haven't already done so, please attach a traceback when it
crashes, as well as your bacula-dir.conf and bacula-sd.conf files.

Thanks,

Kern

>
> Best regards, Adam.
>
> Kern Sibbald a écrit :
> > Hello Adam,
> >
> > If the SD is crashing, then there is definitely a bug and you should open
> > a bug report.  It would be preferable if you move up to version 2.2.8 as
> > it simplifies things for me in debugging and finding the problems.
> >
> > Best regards,
> >
> > Kern
> >
> > On Thursday 15 May 2008 09:05:02 Adam Cécile wrote:
> >> Hello,
> >>
> >> I use Max Wait Time to cancel jobs that are left in queue because no
> >> tapes are available.
> >> This is useful when our customers forget to load a new set of tapes into
> >> the changer.
> >>
> >> The problem is that SD crashes in this case, here a sample of logs:
> >> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005"
> >> or label a new one for:
> >> Job: pdc1.it-lyon.2008-05-02_21.00.26
> >> Storage: "Dell-LTO2" (/dev/nst0)
> >> Pool: Friday
> >> Media type: LTO2
> >>
> >> Then:
> >> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582,
> >> Job=intox1.it-lyon.2008-05-02_21.00.28
> >> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2"
> >> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal
> >> error: job.c:1808 Comm error with SD. bad response to Append Data.
> >> ERR=Aucune donnée disponible
> >> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir
> >> 2.2.5 (09Oct07): 06-May-2008 11:11:01
> >>
> >> Bacula-sd processus sometimes wipes, sometimes it keeps running but
> >> doesn't work anymore until we restart it.
> >>
> >> Another log example:
> >>
> >> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or
> >> label a new one for:
> >> Job: atp-data.2008-05-02_22.00.44
> >> Storage: "Drive-1" (/dev/nst0)
> >> Pool: Weekly
> >> Media type: LTO3
> >> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or
> >> label a new one for:
> >> Job: atp-data.2008-05-02_22.00.44
> >> Storage: "Drive-1" (/dev/nst0)
> >> Pool: Weekly
> >> Media type: LTO3
> >> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded
> >> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job
> >> atp-data.2008-05-02_22.00.44
> >> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41,
> >> Transfer rate = 3.350 M bytes/second
> >> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network
> >> send error to SD. ERR=Broken pipe
> >> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8
> >> (26Jan08): 08-mai-2008 12:23:41
> >>
> >> This is a serious issue as Max Wait Time can't be used (always crash).
> >>
> >> Could you please tell me if this is a known issue or not ? If not, a
> >> customer is okay to "forget to change the tape" so I can provide you
> >> some debugging backtraces if needed.
> >>
> >> Thanks in advance,
> >>
> >> Best regards, Adam.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users