I finally managed to grab a truss during one of these failures. It's fairly large so I've uploaded it to our mirror. Hopefully this will help reveal something. http://mirrors.omniti.com/bacula/bacula
[ Creating a "fresh" thread... ] We moved our Bacula Director off Linux to Solaris (not my choice) recently. Since then, we've encountered frequent failures of the catalog backup job which reads from
Author: Allan Black <Allan.Black AT btconnect DOT com>
Date: Tue, 27 Jan 2009 10:36:05 +0000
Hi, Jason, Just found this on SunSolve: Solution Type Sun Alert Solution 201321 : Solaris 10 Systems With Certain Patches Installed May Experience Data Integrity Issues Over TCP Loopback 1. Impact So
I doubt this affects us. We're currently running OpenSolaris (Nevada) build 104. That alert is over a year old. We encounter the failures connecting to either localhost or the interface on bge0. Than
Slight correction to the previous comment (I was on the wrong server). This is a 10u3 SPARC system but it has been patched. The comment about testing on bge0 still applies. Thanks, -- Jason Dixon Omn
... I've enabled the truss to run on bacula-sd each night. I'll report back my findings. Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdixon AT omniti DOT com 443.325.1357 x.241 -- This SF
Good idea, I've put this into testing. I'll report back our results in a few days. Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdixon AT omniti DOT com 443.325.1357 x.241 -- This SF.Net e
Author: Allan Black <Allan.Black AT btconnect DOT com>
Date: Wed, 03 Dec 2008 18:35:51 +0000
I don't think this is the problem - 'Packet size too big from XYZ' is a Bacula error, caused by the SD rejecting a message from the FD. Data from the FD to SD is encoded in 2 packets; a 'header', whi
Wouldn't it fail consistently in that case? It hasn't failed the last two nights. -- Jason Dixon OmniTI Computer Consulting, Inc. jdixon AT omniti DOT com 443.325.1357 x.241 -- This SF.Net email is s
Author: Allan Black <Allan.Black AT btconnect DOT com>
Date: Wed, 03 Dec 2008 19:02:16 +0000
Good :-) Has it started working since you changed from "localhost" to "real host name"? Was it not just the catalog backup which was failing in this way? Anyway, if it starts happening again, it migh
It has stopped failing since that change. But it's only been two nights. We've had successful sequences of that length before, need to give it a few more nights to be certain. That's the only job tha
One final report. Everything has been working fine since switching the local FD to use the physical address (bge0) rather than loopback. Sounds like a bug. If anyone needs further details, please let
Alas, I spoke too soon. The CatalogBackup job failed again last night, usual symptoms. 08-Dec 23:10 vlad-dir JobId 485: BeforeJob: run command "/opt/bacula/libexec/make_catalog_backup.omniti bacula b
Author: Allan Black <Allan.Black AT btconnect DOT com>
Date: Tue, 09 Dec 2008 21:26:38 +0000
OK. need to find out what the FD is doing. I would recommend: truss -o filename -f -a -e -v all -w 2 -p <FD pid> Is it possible to run the catalog backup during the day, by hand? That way you could a
I've run it 6 times today, no failures yet. Frustrating. Here are the results showing all the previous failures, then the successes today. -bash-3.2$ echo 'list jobs' | sudo /opt/bacula/sbin/i386/bco
I'm seeing similar problems to the one Jason described. However, in addition to the 'Packet Size too big' failure, I see a variety of other error messages as well. FD, SD, and dir are all running on
Do these thoughts help? "Well, long ago, there was a problem with packet sizes, but that was a very old version. If he is running a recent Bacula version, then he is running a Bacula with the standar
I saw some mention of these past problems during my research, so I attempted to rule out network issues. In my case, the FD, SD, and dir are all on the same machine, so I assume the problem isn't swi
We moved our Bacula Director off Linux to Solaris (not my choice) recently. Since then, we've encountered frequent failures of the catalog backup job which reads from the local FD. They always fail w