Bacula-users

[Bacula-users] Sending spooled attrs to the Director Fatal error: Network error with FD during Backup: ERR=Connection reset by peer ?

2011-12-05 20:11:03
Subject: [Bacula-users] Sending spooled attrs to the Director Fatal error: Network error with FD during Backup: ERR=Connection reset by peer ?
From: "Ethier, Michael" <methier AT CGR.Harvard DOT edu>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Mon, 5 Dec 2011 19:55:49 -0500

Hello,

 

We are running Bacula 5.0.3 on RHEL and Centos. I have recently had a 16.5TB backup fail at the

end when the system tried to spool the attribute data, messages are below. The backend database used

is MySQL:

 

[root@hulsbackup lib]#  mysql -V

mysql  Ver 14.12 Distrib 5.0.77, for redhat-linux-gnu (x86_64) using readline 5.1

 

and lives on the same machine partition as the data spool directory. All backup data was spooled

and dumped to tape successfully it appears.

 

I have successfully backed up a 5TB data set before this. However, between that backup and

this failed one, we moved the bacula server to a different net and changed to a LACP bonded interface.

There is a local iptables firewall running on the Bacula server.

 

In addition we kept hitting this 6 day limit where backups were getting auto killed, so I changed

the following lines, and recompiled with a 60 day limit on both the bacula server and client.

 

bnet.c:   bsock->timeout = 60 * 60 * 60 * 24;   /* 60 days timeout */

bsock.c:   timeout = 60 * 60 * 60 * 24;   /* 60 days timeout */

 

Other than that, everything is the default code. Has anyone hit this problem and knows the solution

to this problem ? I can’t easily re-run and reproduce this since it runs for over 9 days.

 

Thanks,

Mike

 

05-Dec 02:48 hulsbackup-sd JobId 109: Alert: Home page is http://smartmontools.sourceforge.net/

05-Dec 02:48 hulsbackup-sd JobId 109: Alert:

05-Dec 02:48 hulsbackup-sd JobId 109: Alert: TapeAlert: OK

05-Dec 02:48 hulsbackup-sd JobId 109: Alert:

05-Dec 02:48 hulsbackup-sd JobId 109: Alert: Error Counter logging not supported

05-Dec 02:48 hulsbackup-sd JobId 109: Sending spooled attrs to the Director. Despooling 196,979,273 bytes ...

05-Dec 03:12 hulsbackup-dir JobId 109: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer

05-Dec 03:12 hulsbackup-dir JobId 109: Fatal error: No Job status returned from FD.

05-Dec 03:12 hulsbackup-dir JobId 109: Error: Bacula hulsbackup-dir 5.0.3 (04Aug10): 05-Dec-2011 03:12:15

  Build OS:               x86_64-unknown-linux-gnu redhat Enterprise release

  JobId:                  109

  Job:                    ceserve1.2011-11-25_21.11.56_11

  Backup Level:           Full

  Client:                 "ceserve1-fd" 5.0.3 (04Aug10) x86_64-unknown-linux-gnu,redhat,

  FileSet:                "ceserve1-data" 2011-11-02 11:03:12

  Pool:                   "Default" (From Job resource)

  Catalog:                "MyCatalog" (From Client resource)

  Storage:                "Autochanger" (From command line)

  Scheduled time:         25-Nov-2011 21:11:47

  Start time:             25-Nov-2011 21:11:58

  End time:               05-Dec-2011 03:12:15

  Elapsed time:           9 days 6 hours 17 secs

  Priority:               10

  FD Files Written:       0

  SD Files Written:       571,253

  FD Bytes Written:       0 (0 B)

  SD Bytes Written:       16,495,138,769,029 (16.49 TB)

  Rate:                   0.0 KB/s

  Software Compression:   None

  VSS:                    no

  Encryption:             no

  Accurate:               no

  Volume name(s):         000093L3|000094L3|000095L3|000096L3|000097L3|000098L3|000099L3|000100L3|000101L3|000102L3|000103L3|000104L3|000105L3|000106L3|000107L3|000108L3|000109L3|000110L3|000111L3|000112L3|000113L3|000114L3|000115L3|000127L3|000117L3|000118L3|000119L3|000013L3|000121L3|000122L3|000123L3|000124L3|000125L3|000126L3|000166L3|000128L3|000129L3|000130L3|000131L3|000132L3

  Volume Session Id:      2

  Volume Session Time:    1322270042

  Last Volume Bytes:      246,238,949,376 (246.2 GB)

  Non-fatal FD errors:    0

  SD Errors:              39

  FD termination status:  Error

  SD termination status:  OK

  Termination:            *** Backup Error ***

 

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>