Bacula-users

[Bacula-users] Bacula-sd crashing after tape mount request

2008-05-30 06:28:34
Subject: [Bacula-users] Bacula-sd crashing after tape mount request
From: Stephen Giles <stephen.giles AT ed.ac DOT uk>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 30 May 2008 11:27:52 +0100
Hello,

I'm having some difficulty with my bacula-sd since i upgraded to 2.2.8.
I was previously using 2.0.3 and this didn't happen.

If i run out of volumes in my current tape pool and bacula asks for a  
new tape in the pool, the SD will crash as soon as i mount the new  
tape volume. Even if i quickly restart the SD again, it has the effect  
of killing the currently paused backup job (very annoying!). The error  
doesn't happen if there are usable tapes in the pool, only when i get  
a mount request notice and the SD has to wait for a mount.

A bit of background :

OS : Redhat EL v4

Director :
Host : gamma
Dell 2900, 2 x 2GHz
2GB RAM

SD :
Host : delta
Dell 2950, 2 x 2GHz quad core
8GB RAM
Neo2000 with single LTO3, direct SCSI connected.

both client, storage, and director are all 2.2.8


i have quite alot of data to backup. some 6 TB a week and many small  
files (20 million or so). I have all my data split into chunks so as  
to make it manageable. I perform fulls on different chunks every night  
of the week. I backup all my fulls onto tape, an my incrementals onto  
file volumes. This problem only happens when i run out of usable tapes  
in the current pool and bacula generates a mount request. It then  
waits for me to mount a volume in the morning (either by putting a  
recyclable volume in the juke box, labeling up a new volume, or  
purging an old (but not yet recyclealble) volume. When i run the mount  
command on the correct slot, i get something like below.

*mount
The defined Storage resources are:
      1: gamma-sd-file
      2: delta-sd-file-mig1
      3: delta-sd-file-mig2
      4: delta-sd-file-incrementals
      5: delta-sd-neo2000
Select Storage resource (1-5): 5
Enter autochanger slot: 6
3001 OK mount. Device="NeoLTO3-0" (/dev/nst0)
You have messages.
*
*m
30-May 10:04 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?  
drive 0" command.
30-May 10:04 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",  
result is Slot 27.
*
*
30-May 10:04 delta-sd JobId 8224: 3307 Issuing autochanger "unload  
slot 27, drive 0" command.
*
30-May 10:05 delta-fd JobId 8224: Fatal error: backup.c:892 Network  
send error to SD. ERR=Connection reset by peer
30-May 10:05 delta-fd JobId 8227: Fatal error: backup.c:892 Network  
send error to SD. ERR=Broken pipe
*
30-May 10:05 gamma-dir JobId 0: Warning: bsock.c:123 Could not connect  
to Storage daemon on delta.pst.ed.ac.uk:9103. ERR=Connection refused
Retrying ...
30-May 10:05 gamma-dir JobId 0: Warning: bsock.c:123 Could not connect  
to Storage daemon on delta.pst.ed.ac.uk:9103. ERR=Connection refused
Retrying ...

######################################################################
If we look at the failed job :


29-May 21:05 gamma-dir JobId 8224: Start Backup JobId 8224,  
Job=delta-data-1G.2008-05-29_21.05.54
29-May 21:05 gamma-dir JobId 8224: Using Device "NeoLTO3-0"
29-May 21:09 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?  
drive 0" command.
29-May 21:09 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",  
result is Slot 27.
29-May 21:09 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?  
drive 0" command.
29-May 21:09 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",  
result is Slot 27.
29-May 21:09 delta-sd JobId 8224: Volume "PY4958L3" previously  
written, moving to end of data.
29-May 21:09 delta-sd JobId 8224: Ready to append to end of Volume  
"PY4958L3" at file=334.
29-May 23:49 delta-sd JobId 8224: End of Volume "PY4958L3" at 609:5156  
on device "NeoLTO3-0" (/dev/nst0). Write of 64512 bytes got -1.
29-May 23:50 delta-sd JobId 8224: Re-read of last block succeeded.
29-May 23:50 delta-sd JobId 8224: End of medium on Volume "PY4958L3"  
Bytes=608,741,554,176 Blocks=9,436,097 at 29-May-2008 23:50.
29-May 23:50 delta-sd JobId 8224: Invalid slot=0 defined in catalog  
for Volume "CLN195L1" on "NeoLTO3-0" (/dev/nst0). Manual load my be  
required.
29-May 23:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or  
label a new one for:
     Job:          delta-data-1G.2008-05-29_21.05.54
     Storage:      "NeoLTO3-0" (/dev/nst0)
     Pool:         Full-Srv-Tape
     Media type:   LTO3
30-May 00:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or  
label a new one for:
     Job:          delta-data-1G.2008-05-29_21.05.54
     Storage:      "NeoLTO3-0" (/dev/nst0)
     Pool:         Full-Srv-Tape
     Media type:   LTO3
30-May 02:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or  
label a new one for:
     Job:          delta-data-1G.2008-05-29_21.05.54
     Storage:      "NeoLTO3-0" (/dev/nst0)
     Pool:         Full-Srv-Tape
     Media type:   LTO3
30-May 06:50 delta-sd JobId 8224: Please mount Volume "CLN195L1" or  
label a new one for:
     Job:          delta-data-1G.2008-05-29_21.05.54
     Storage:      "NeoLTO3-0" (/dev/nst0)
     Pool:         Full-Srv-Tape
     Media type:   LTO3
30-May 10:04 delta-sd JobId 8224: 3301 Issuing autochanger "loaded?  
drive 0" command.
30-May 10:04 delta-sd JobId 8224: 3302 Autochanger "loaded? drive 0",  
result is Slot 27.
30-May 10:04 delta-sd JobId 8224: 3307 Issuing autochanger "unload  
slot 27, drive 0" command.
30-May 10:05 delta-fd JobId 8224: Fatal error: backup.c:892 Network  
send error to SD. ERR=Connection reset by peer
30-May 10:06 gamma-dir JobId 8224: Error: Bacula gamma-dir 2.2.8  
(26Jan08): 30-May-2008 10:06:53
   Build OS:               i686-redhat-linux-gnu redhat
   JobId:                  8224
   Job:                    delta-data-1G.2008-05-29_21.05.54
   Backup Level:           Full
   Client:                 "delta-fd" 2.2.8 (26Jan08)  
x86_64-redhat-linux-gnu,redhat,
   FileSet:                "linux-data-1G" 2008-04-03 21:05:00
   Pool:                   "Full-Srv-Tape" (From Job resource)
   Storage:                "delta-sd-neo2000" (From Pool resource)
   Scheduled time:         29-May-2008 21:05:00
   Start time:             29-May-2008 21:09:18
   End time:               30-May-2008 10:06:53
   Elapsed time:           12 hours 57 mins 35 secs
   Priority:               10
   FD Files Written:       5,127,593
   SD Files Written:       0
   FD Bytes Written:       273,831,871,318 (273.8 GB)
   SD Bytes Written:       0 (0 B)
   Rate:                   5869.3 KB/s
   Software Compression:   None
   VSS:                    no
   Storage Encryption:     no
   Volume name(s):         PY4958L3
   Volume Session Id:      82
   Volume Session Time:    1211466643
   Last Volume Bytes:      1 (1 B)
   Non-fatal FD errors:    0
   SD Errors:              0
   FD termination status:  Error
   SD termination status:  Error
   Termination:            *** Backup Error ***


Is there anything i can run to further track what's going on?


Thanks,

Stephen.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Bacula-sd crashing after tape mount request, Stephen Giles <=