Bacula-users

Re: [Bacula-users] large file system with lustre, spooling and crash questions

2008-04-08 14:45:20
Subject: Re: [Bacula-users] large file system with lustre, spooling and crash questions
From: Gauthier DELERCE <gauthier AT delerce DOT fr>
To: Tore Anderson <tore AT linpro DOT no>
Date: Tue, 08 Apr 2008 20:21:25 +0200
Thanks Tore for your reply, I didn't progress on this problem and it 
happened twice again :

07-avr 20:36 azurite-sd JobId 129: Spooling data again ...
08-avr 00:53 azurite JobId 129: Fatal error: backup.c:892 Network send error to 
SD. ERR=Connection reset by peer
08-avr 00:53 azurite JobId 129: Error: bsock.c:306 Write error sending 8856 
bytes to Storage daemon:azurite.andra.fr:9103: ERR=Connection reset by peer

AND

08-avr 00:09 azurite-sd JobId 128: Spooling data again ...
08-avr 00:53 azurite JobId 128: Fatal error: backup.c:892 Network send error to 
SD. ERR=Connection reset by peer
08-avr 00:55 bacula-dir JobId 128: Error: Bacula bacula-dir 2.2.8 (26Jan08): 
08-avr-2008 00:55:00

both FD and SD are running on the same physical server.

the next steps for me are :
    checking if this happens also when I don't use a spool ( SD reset 
the connection during a spooling stage )
    checking if this also happen when only one job is running
    having a look in backup.c and see if it's possible to add debug 
infos and manage this exception (connection reset )
    monitor the whole physical server activities

Kind regards

Gauthier


Tore Anderson a écrit :
> * Gauthier DELERCE
>
>   
>> I also had a crash of a job and I'm still wondering why, here few
>> lines from the log:
>>     
>
> Hi,
>
> unfortunately I cannot help, but would like to chime in a "me too". Some 
> full backups have a tendency of failing with similar error messages as 
> yours - common to them all is that the backup appears to run fine, and 
> that the FD is apparantly able to send all the data to the SD - the "FD 
> (Files|Bytes) Written" lines look OK.  However it crashes with this 
> error:  "Network send error to SD. ERR=Connection reset by peer", and
> "SD (Files|Bytes) Written" is always 0.  This only happens with full
> backups, haven't been able to establish if that's simply because of size
> or because that's the only level that is using the tape library.
>
> I get this error with several clients, some running Ubuntu, some running
> Red Hat.  The Bacula version is 2.2.8.  I've been not had any luck
> tracking down what's going on (I believe I've ruled out network problems,
> though), and would appreciate any suggestions...
>
> The email report I receive looks like this:
>
> 02-Apr 04:05 dump-dir JobId 31766: Start Backup JobId 31766, 
> Job=foo.linpro.no-job4.2008-04-02_04.05.04
> 02-Apr 04:05 dump-dir JobId 31766: Using Device "LTO3-0"
> 02-Apr 04:05 dump-sd JobId 31766: Spooling data ...
> 02-Apr 04:45 foo.linpro.no-fd JobId 31766: Fatal error: backup.c:892 Network 
> send error to SD. ERR=Connection reset by peer
> 02-Apr 04:45 dump-dir JobId 31766: Error: Bacula dump-dir 2.2.8 (26Jan08): 
> 02-Apr-2008 04:45:47
>   Build OS:               x86_64-unknown-linux-gnu debian testing/unstable
>   JobId:                  31766
>   Job:                    foo.linpro.no-job4.2008-04-02_04.05.04
>   Backup Level:           Full
>   Client:                 "foo.linpro.no-fd" 2.2.8 (26Jan08) 
> i686-redhat-linux-gnu,redhat,
>   FileSet:                "foo.linpro.no-fileset4" 2007-07-10 10:05:00
>   Pool:                   "FullTape" (From Job FullPool override)
>   Storage:                "TapeLibrary" (From Pool resource)
>   Scheduled time:         02-Apr-2008 04:05:00
>   Start time:             02-Apr-2008 04:05:02
>   End time:               02-Apr-2008 04:45:47
>   Elapsed time:           40 mins 45 secs
>   Priority:               10
>   FD Files Written:       64,342
>   SD Files Written:       0
>   FD Bytes Written:       10,135,120,993 (10.13 GB)
>   SD Bytes Written:       0 (0 B)
>   Rate:                   4145.2 KB/s
>   Software Compression:   None
>   VSS:                    no
>   Storage Encryption:     no
>   Volume name(s):         
>   Volume Session Id:      240
>   Volume Session Time:    1206967810
>   Last Volume Bytes:      386,131,350,528 (386.1 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  Error
>   SD termination status:  Error
>   Termination:            *** Backup Error ***
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Bacula-users] large file system with lustre, spooling and crash questions, Gauthier DELERCE <=