Bacula-users

[Bacula-users] Bacula-sd crashing after tape mount request - can anyone help pleae?

2008-06-01 15:27:54
Subject: [Bacula-users] Bacula-sd crashing after tape mount request - can anyone help pleae?
From: Stephen Giles <stephen.giles AT ed.ac DOT uk>
To: bacula-users AT lists.sourceforge DOT net
Date: Sun, 01 Jun 2008 20:25:16 +0100
Hello,

I'm having some difficulty with my bacula-sd since i upgraded to 2.2.8.
I was previously using 2.0.3 and this didn't happen.

If i run out of volumes in my current tape pool and bacula asks for a
new tape in the pool, the SD will crash as soon as i mount the new
tape volume. Even if i quickly restart the SD again, it has the effect
of killing the currently paused backup job (very annoying!). The error
doesn't happen if there are usable tapes in the pool, only when i get
a mount request notice and the SD has to wait for a mount.

A bit of background :

OS : Redhat EL v4

Director :
Host : gamma
Dell 2900, 2 x 2GHz
2GB RAM

SD :
Host : delta
Dell 2950, 2 x 2GHz quad core
8GB RAM
Neo2000 with single LTO3, direct SCSI connected.

both client, storage, and director are all 2.2.8


i have quite alot of data to backup. some 6 TB a week and many small
files (20 million or so). I have all my data split into chunks so as
to make it manageable. I perform fulls on different chunks every night
of the week. I backup all my fulls onto tape, an my incrementals onto
file volumes. This problem only happens when i run out of usable tapes
in the current pool and bacula generates a mount request. It then
waits for me to mount a volume in the morning (either by putting a
recyclable volume in the juke box, labeling up a new volume, or
purging an old (but not yet recyclealble) volume. When i run the mount
command on the correct slot, i get something like below.

*mount
The defined Storage resources are:
1: gamma-sd-file
2: delta-sd-file-mig1
3: delta-sd-file-mig2
4: delta-sd-file-incrementals
5: delta-sd-neo2000
Select Storage resource (1-5): 5
Enter autochanger slot: 6
3001 OK mount. Device="NeoLTO3-0" (/dev/nst0)
You have messages.
*
*m
30-May 10:04 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?
drive 0" command.
30-May 10:04 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",
result is Slot 27.
*
*
30-May 10:04 delta-sd JobId 8224: 3307 Issuing autochanger "unload
slot 27, drive 0" command.
*
30-May 10:05 delta-fd JobId 8224: Fatal error: backup.c:892 Network
send error to SD. ERR=Connection reset by peer
30-May 10:05 delta-fd JobId 8227: Fatal error: backup.c:892 Network
send error to SD. ERR=Broken pipe
*
30-May 10:05 gamma-dir JobId 0: Warning: bsock.c:123 Could not connect
to Storage daemon on delta.pst.ed.ac.uk:9103. ERR=Connection refused
Retrying ...
30-May 10:05 gamma-dir JobId 0: Warning: bsock.c:123 Could not connect
to Storage daemon on delta.pst.ed.ac.uk:9103. ERR=Connection refused
Retrying ...

######################################################################
If we look at the failed job :


29-May 21:05 gamma-dir JobId 8224: Start Backup JobId 8224,
Job=delta-data-1G.2008-05-29_21.05.54
29-May 21:05 gamma-dir JobId 8224: Using Device "NeoLTO3-0"
29-May 21:09 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?
drive 0" command.
29-May 21:09 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",
result is Slot 27.
29-May 21:09 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?
drive 0" command.
29-May 21:09 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",
result is Slot 27.
29-May 21:09 delta-sd JobId 8224: Volume "PY4958L3" previously
written, moving to end of data.
29-May 21:09 delta-sd JobId 8224: Ready to append to end of Volume
"PY4958L3" at file=334.
29-May 23:49 delta-sd JobId 8224: End of Volume "PY4958L3" at 609:5156
on device "NeoLTO3-0" (/dev/nst0). Write of 64512 bytes got -1.
29-May 23:50 delta-sd JobId 8224: Re-read of last block succeeded.
29-May 23:50 delta-sd JobId 8224: End of medium on Volume "PY4958L3"
Bytes=608,741,554,176 Blocks=9,436,097 at 29-May-2008 23:50.
29-May 23:50 delta-sd JobId 8224: Invalid slot=0 defined in catalog
for Volume "CLN195L1" on "NeoLTO3-0" (/dev/nst0). Manual load my be
required.
29-May 23:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or
label a new one for:
Job: delta-data-1G.2008-05-29_21.05.54
Storage: "NeoLTO3-0" (/dev/nst0)
Pool: Full-Srv-Tape
Media type: LTO3
30-May 00:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or
label a new one for:
Job: delta-data-1G.2008-05-29_21.05.54
Storage: "NeoLTO3-0" (/dev/nst0)
Pool: Full-Srv-Tape
Media type: LTO3
30-May 02:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or
label a new one for:
Job: delta-data-1G.2008-05-29_21.05.54
Storage: "NeoLTO3-0" (/dev/nst0)
Pool: Full-Srv-Tape
Media type: LTO3
30-May 06:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or
label a new one for:
Job: delta-data-1G.2008-05-29_21.05.54
Storage: "NeoLTO3-0" (/dev/nst0)
Pool: Full-Srv-Tape
Media type: LTO3
30-May 10:04 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?
drive 0" command.
30-May 10:04 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",
result is Slot 27.
30-May 10:04 delta-sd JobId 8224: 3307 Issuing autochanger "unload
slot 27, drive 0" command.
30-May 10:05 delta-fd JobId 8224: Fatal error: backup.c:892 Network
send error to SD. ERR=Connection reset by peer
30-May 10:06 gamma-dir JobId 8224: Error: Bacula gamma-dir 2.2.8
(26Jan08): 30-May-2008 10:06:53
Build OS: i686-redhat-linux-gnu redhat
JobId: 8224
Job: delta-data-1G.2008-05-29_21.05.54
Backup Level: Full
Client: "delta-fd" 2.2.8 (26Jan08)
x86_64-redhat-linux-gnu,redhat,
FileSet: "linux-data-1G" 2008-04-03 21:05:00
Pool: "Full-Srv-Tape" (From Job resource)
Storage: "delta-sd-neo2000" (From Pool resource)
Scheduled time: 29-May-2008 21:05:00
Start time: 29-May-2008 21:09:18
End time: 30-May-2008 10:06:53
Elapsed time: 12 hours 57 mins 35 secs
Priority: 10
FD Files Written: 5,127,593
SD Files Written: 0
FD Bytes Written: 273,831,871,318 (273.8 GB)
SD Bytes Written: 0 (0 B)
Rate: 5869.3 KB/s
Software Compression: None
VSS: no
Storage Encryption: no
Volume name(s): PY4958L3
Volume Session Id: 82
Volume Session Time: 1211466643
Last Volume Bytes: 1 (1 B)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: Error
SD termination status: Error
Termination: *** Backup Error ***


Is there anything i can run to further track what's going on?
Is this a known bug?


Many Thanks,

Stephen.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Bacula-sd crashing after tape mount request - can anyone help pleae?, Stephen Giles <=