Bacula-users

[Bacula-users] SD freezing under FreeBSD 10?

2014-03-07 11:34:58
Subject: [Bacula-users] SD freezing under FreeBSD 10?
From: Robert Cousins <rec AT Rcousins DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 07 Mar 2014 08:09:56 -0800
I'm upgrading my older backup server to a new FreeBSD 10 box. (New 
machine has lots of resources and I backup to disk.)

After installing the director and storage daemons on the new server, I 
pointed to the existing file daemons (which I also reconfigured) on the 
clients spread around the network. I could read status and communicate 
to/from the daemons. But when I try to do a backup nothing seemed to 
work. Instead, I'd get an error such as

07-Mar 07:26 Colo8 JobId 94: Fatal error: backup.c:1190 Network send 
error to SD. ERR=Broken pipe

At that point, the CPU usage of the SD would go to zero and disk 
throughput would go to zero -- and all progress would stop. (That is, if 
5 machines were trying to backup and one threw a fatal error, then 
suddenly progress would stop on all 5 machines.) Webmin's display would 
show that one client had a 'fatal error' and reported the others' status 
as 'is running'. The various log files simply keep claiming Network 
errors.  If Ii cancel the job with the fatal error, then progress starts 
again for a short while until another job gets a fatal error and the 
system freezes again. These fatal errors can occur after just seconds or 
after an hour or two. But they occur so often that essentially no backup 
jobs can hope to complete.

The network is known and solid, not overloaded nor undergoing change. 
Other network services run without problems.

I have been fighting with this for several days and have succeeded in 
getting a total of 1 backup jobs to run to completion -- on a newly 
built but different FreeBSD 10 server. (The problem occurs even on the 
backup server.) I've upgraded the FDs to the latest revision. This fails 
with FDs on FreeBSD, Linux and Windows.

I can use telnet to communicate from each machine to each required port. 
For now, the machines are running with firewalls down.

The odds are 99% that I've done something stupid. Could someone please 
point me to the proper FAQ or suggest a debugging strategy?

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>