Bacula-users

Re: [Bacula-users] Sending spooled attrs to the Director Fatal error: Network error with FD during Backup: ERR=Connection reset by peer ?

2011-12-06 14:47:12
Subject: Re: [Bacula-users] Sending spooled attrs to the Director Fatal error: Network error with FD during Backup: ERR=Connection reset by peer ?
From: "Ethier, Michael" <methier AT CGR.Harvard DOT edu>
To: Bob Hetzel <beh AT case DOT edu>, "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Tue, 6 Dec 2011 14:43:49 -0500
Hi Bob,

Thanks for your reply. So right now I just tell the Bacula system that a client 
has a 15Tb directory to backup.
How do you personally break up large backup job of data into pieces (multiple 
jobs)? Do you specify each directory 
or file in each job or do you use a smarter method ? 

Thanks,
Mike

-----Original Message-----
From: Bob Hetzel [mailto:beh AT case DOT edu] 
Sent: Tuesday, December 06, 2011 11:36 AM
To: bacula-users AT lists.sourceforge DOT net
Subject: Re: [Bacula-users] Sending spooled attrs to the Director Fatal error: 
Network error with FD during Backup: ERR=Connection reset by peer ?


I've been doing backups for a long time now and one thing I've learned is that 
if you have a backup that takes more than 24 hrs you're asking for trouble.  In 
theory this should work but since your fulls take so long you won't be able to 
get any changed files that it misses until you complete the full.

Here's what I mean in more detail.  If you have this set up in 9 equal sized 
directories a1, a2, a3 to a9, then in theory that 9 day full-backup job will be 
able to get all of a1 in the first 24 hours and work through the directories in 
order but if anything changes in the a1 directory before it completes, it won't 
go back to it.  So the obvious answer in that situation would be to split it up 
into at least 9 separate jobs.  To be sure this will mean some work by you, and 
it would also be a great thing for you to do some periodic auditing to ensure 
you aren't skipping babckup on any directories.

The more you can split it up the happier your life will be.  On a system that 
big you may even have enough IO throughput available able to run 2 or more jobs 
in parallel cutting the backup window down substantially.

Ideally, you'd be able to break it down into units small enough that your fulls 
won't interfere with your incrementals.  Bacula, like most backup packages, 
doesn't allow you to continue a failed full backup where it died so breaking 
big jobs like that into smaller jobs means when you have a system problem you 
won't have to repeat all that.

In addition, if your full backup takes > 9 days that means your disaster 
recovery will take even longer so keep that in mind as well.  If you can 
separate the jobs out by how critical the info is you can restore the most 
important information first just to get things running.


> Date: Mon, 5 Dec 2011 19:55:49 -0500
> From: "Ethier, Michael" <methier AT CGR.Harvard DOT edu>
> Subject: [Bacula-users] Sending spooled attrs to the Director Fatal
>
> Hello,
>
> We are running Bacula 5.0.3 on RHEL and Centos. I have recently had a 
> 16.5TB backup fail at the end when the system tried to spool the 
> attribute data, messages are below. The backend database used is MySQL:
>
> [root@hulsbackup lib]#  mysql -V
> mysql  Ver 14.12 Distrib 5.0.77, for redhat-linux-gnu (x86_64) using 
> readline 5.1
>
> and lives on the same machine partition as the data spool directory. 
> All backup data was spooled and dumped to tape successfully it appears.
>
> I have successfully backed up a 5TB data set before this. However, 
> between that backup and this failed one, we moved the bacula server to a 
> different net and changed to a LACP bonded interface.
> There is a local iptables firewall running on the Bacula server.
>
> In addition we kept hitting this 6 day limit where backups were 
> getting auto killed, so I changed the following lines, and recompiled with a 
> 60 day limit on both the bacula server and client.
>
> bnet.c:   bsock->timeout = 60 * 60 * 60 * 24;   /* 60 days timeout */
> bsock.c:   timeout = 60 * 60 * 60 * 24;   /* 60 days timeout */
>
> Other than that, everything is the default code. Has anyone hit this 
> problem and knows the solution to this problem ? I can't easily re-run and 
> reproduce this since it runs for over 9 days.
>
> Thanks,
> Mike
>
> ...
> ...
>
> 05-Dec 02:48 hulsbackup-sd JobId 109: Alert: Home page is 
> http://smartmontools.sourceforge.net/
>
> 05-Dec 02:48 hulsbackup-sd JobId 109: Alert:
>
> 05-Dec 02:48 hulsbackup-sd JobId 109: Alert: TapeAlert: OK
>
> 05-Dec 02:48 hulsbackup-sd JobId 109: Alert:
>
> 05-Dec 02:48 hulsbackup-sd JobId 109: Alert: Error Counter logging not 
> supported
>
> 05-Dec 02:48 hulsbackup-sd JobId 109: Sending spooled attrs to the Director. 
> Despooling 196,979,273 bytes ...
>
> 05-Dec 03:12 hulsbackup-dir JobId 109: Fatal error: Network error with 
> FD during Backup: ERR=Connection reset by peer
>
> 05-Dec 03:12 hulsbackup-dir JobId 109: Fatal error: No Job status returned 
> from FD.
>
> 05-Dec 03:12 hulsbackup-dir JobId 109: Error: Bacula hulsbackup-dir 
> 5.0.3 (04Aug10): 05-Dec-2011 03:12:15
>
>   Build OS:               x86_64-unknown-linux-gnu redhat Enterprise release
>
>   JobId:                  109
>
>   Job:                    ceserve1.2011-11-25_21.11.56_11
>
>   Backup Level:           Full
>
>   Client:                 "ceserve1-fd" 5.0.3 (04Aug10) 
> x86_64-unknown-linux-gnu,redhat,
>
>   FileSet:                "ceserve1-data" 2011-11-02 11:03:12
>
>   Pool:                   "Default" (From Job resource)
>
>   Catalog:                "MyCatalog" (From Client resource)
>
>   Storage:                "Autochanger" (From command line)
>
>   Scheduled time:         25-Nov-2011 21:11:47
>
>   Start time:             25-Nov-2011 21:11:58
>
>   End time:               05-Dec-2011 03:12:15
>
>   Elapsed time:           9 days 6 hours 17 secs
>
>   Priority:               10
>
>   FD Files Written:       0
>
>   SD Files Written:       571,253
>
>   FD Bytes Written:       0 (0 B)
>
>   SD Bytes Written:       16,495,138,769,029 (16.49 TB)
>
>   Rate:                   0.0 KB/s
>
>   Software Compression:   None
>
>   VSS:                    no
>
>   Encryption:             no
>
>   Accurate:               no
>
>   Volume name(s):         
> 000093L3|000094L3|000095L3|000096L3|000097L3|000098L3|000099L3|000100L3|000101L3|000102L3|000103L3|000104L3|000105L3|000106L3|000107L3|000108L3|000109L3|000110L3|000111L3|000112L3|000113L3|000114L3|000115L3|000127L3|000117L3|000118L3|000119L3|000013L3|000121L3|000122L3|000123L3|000124L3|000125L3|000126L3|000166L3|000128L3|000129L3|000130L3|000131L3|000132L3
>
>   Volume Session Id:      2
>
>   Volume Session Time:    1322270042
>
>   Last Volume Bytes:      246,238,949,376 (246.2 GB)
>
>   Non-fatal FD errors:    0
>
>   SD Errors:              39
>
>   FD termination status:  Error
>
>   SD termination status:  OK
>
>   Termination:            *** Backup Error ***
>
> -------------- next part -------------- An HTML attachment was 
> scrubbed...

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization This white paper 
is intended to serve as a reference, checklist and point of discussion for 
anyone considering optimizing the pricing and packaging model of a cloud 
services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>