Bacula-users

[Bacula-users] Network timeouts while running enormous backup

2011-05-26 17:02:29
Subject: [Bacula-users] Network timeouts while running enormous backup
From: S H <shdashbeta AT gmail DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 26 May 2011 16:58:36 -0400
Hello,

I've got a very large backup set (55M files, 1TB of data) that's
simply not backing up. This is always the failure message:

26-May 16:35 buny1-dir JobId 5223: Fatal error: Network error with FD
during Backup: ERR=Connection timed out
26-May 16:35 buny1-dir JobId 5223: Fatal error: No Job status returned from FD.

I've set "Heartbeat Interval = 1 minute" on the FD, the SD, and the
Director to no avail. This was happening at random times during the
backup job until I enabled spooling -- then the job ran for 27 hours,
spooled up all of its data to the SD and failed with the above message
when it started to despool. It was heartbreaking to watch a terabyte
of data and 20+GB of attribute spool just disappear.

The Director and SD both run on the same server: OpenBSD 4.6 (32-bit)
with Bacula 2.4.4. The client is FreeBSD 8.1 (64-bit) with Bacula
2..4.4. I know these are outdated versions; I just inherited the
environment and they're working for everything else so I haven't
wanted to go through the pain of upgrading until I have some spare
cycles to dedicate to it.

Compression and FD encryption are off but network traffic runs over
TLS everywhere.

Is there anything I can possibly check that might help?

-SH

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users