Bacula-users

[Bacula-users] Max Wait Time sometimes crash Storage Daemon

2008-05-15 09:05:17
Subject: [Bacula-users] Max Wait Time sometimes crash Storage Daemon
From: Adam Cécile <adam.cecile AT linbox DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 15 May 2008 15:05:02 +0200
Hello,

I use Max Wait Time to cancel jobs that are left in queue because no 
tapes are available.
This is useful when our customers forget to load a new set of tapes into 
the changer.

The problem is that SD crashes in this case, here a sample of logs:
03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" 
or label a new one for:
Job: pdc1.it-lyon.2008-05-02_21.00.26
Storage: "Dell-LTO2" (/dev/nst0)
Pool: Friday
Media type: LTO2

Then:
02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, 
Job=intox1.it-lyon.2008-05-02_21.00.28
02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2"
06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal 
error: job.c:1808 Comm error with SD. bad response to Append Data. 
ERR=Aucune donnée disponible
06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir 
2.2.5 (09Oct07): 06-May-2008 11:11:01

Bacula-sd processus sometimes wipes, sometimes it keeps running but 
doesn't work anymore until we restart it.

Another log example:

06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or 
label a new one for:
Job: atp-data.2008-05-02_22.00.44
Storage: "Drive-1" (/dev/nst0)
Pool: Weekly
Media type: LTO3
07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or 
label a new one for:
Job: atp-data.2008-05-02_22.00.44
Storage: "Drive-1" (/dev/nst0)
Pool: Weekly
Media type: LTO3
08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded 
waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job 
atp-data.2008-05-02_22.00.44
08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, 
Transfer rate = 3.350 M bytes/second
08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network 
send error to SD. ERR=Broken pipe
08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 
(26Jan08): 08-mai-2008 12:23:41

This is a serious issue as Max Wait Time can't be used (always crash).

Could you please tell me if this is a known issue or not ? If not, a 
customer is okay to "forget to change the tape" so I can provide you 
some debugging backtraces if needed.

Thanks in advance,

Best regards, Adam.




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users