Bacula-users

Re: [Bacula-users] Solaris "Packet size too big" failures

2008-12-23 19:38:16
Subject: Re: [Bacula-users] Solaris "Packet size too big" failures
From: Conor Edberg <conor AT lsit.ucsb DOT edu>
To: Dan Langille <dan AT langille DOT org>
Date: Tue, 23 Dec 2008 16:36:00 -0800
On Dec 23, 2008, at 3:38 PM, Dan Langille wrote:

>
> On Dec 23, 2008, at 11:53 AM, Conor Edberg wrote:
>
>>
>> On Dec 9, 2008, at 2:36 PM, Jason Dixon wrote:
>>
>>> On Tue, Dec 09, 2008 at 09:26:38PM +0000, Allan Black wrote:
>>>> Jason Dixon wrote:
>>>>> Alas, I spoke too soon.  The CatalogBackup job failed again last
>>>>> night,
>>>>> usual symptoms.
>>>>
>>>> OK. need to find out what the FD is doing. I would recommend:
>>>>
>>>> truss -o filename -f -a -e -v all -w 2 -p <FD pid>
>>>>
>>>> Is it possible to run the catalog backup during the day, by hand?
>>>> That way you could avoid leaving the truss running (and collecting
>>>> data) all night.
>>>
>>> I've run it 6 times today, no failures yet.  Frustrating.  Here are
>>> the
>>> results showing all the previous failures, then the successes today.
>>>
>>> -bash-3.2$ echo 'list jobs' | sudo /opt/bacula/sbin/i386/bconsole |
>>> grep BackupCatalog | grep '| f '
>>> |    60 | BackupCatalog            | 2008-11-08 00:36:12 | B    | F
>>> |          1 |     965,869,568 | f         |
>>> |   116 | BackupCatalog            | 2008-11-11 23:14:05 | B    | F
>>> |          1 |     614,727,680 | f         |
>>> |   286 | BackupCatalog            | 2008-11-22 23:14:28 | B    | F
>>> |          1 |   5,278,400,512 | f         |
>>> |   298 | BackupCatalog            | 2008-11-23 23:14:32 | B    | F
>>> |          1 |   4,723,965,952 | f         |
>>> |   336 | BackupCatalog            | 2008-11-26 23:14:30 | B    | F
>>> |          1 |   3,979,280,384 | f         |
>>> |   361 | BackupCatalog            | 2008-11-28 23:16:11 | B    | F
>>> |          1 |   2,101,936,128 | f         |
>>> |   385 | BackupCatalog            | 2008-11-30 23:13:21 | B    | F
>>> |          1 |   3,863,216,128 | f         |
>>>
>>> -bash-3.2$ echo 'list jobs' | sudo /opt/bacula/sbin/i386/bconsole |
>>> grep BackupCatalog | grep '2008-12-09'
>>> |   486 | BackupCatalog            | 2008-12-09 16:46:51 | B    | F
>>> |          1 |   3,183,911,677 | T         |
>>> |   487 | BackupCatalog            | 2008-12-09 16:55:42 | B    | F
>>> |          1 |   3,183,912,220 | T         |
>>> |   488 | BackupCatalog            | 2008-12-09 17:03:22 | B    | F
>>> |          1 |   3,183,912,724 | T         |
>>> |   489 | BackupCatalog            | 2008-12-09 17:18:03 | B    | F
>>> |          1 |   3,183,913,233 | T         |
>>> |   490 | BackupCatalog            | 2008-12-09 17:23:26 | B    | F
>>> |          1 |   3,183,913,740 | T         |
>>> |   491 | BackupCatalog            | 2008-12-09 17:29:53 | B    | F
>>> |          1 |   3,183,914,246 | T         |
>>>
>>>
>>> -- Jason Dixon
>>> OmniTI Computer Consulting, Inc.
>>> jdixon AT omniti DOT com
>>> 443.325.1357 x.241
>>>
>>
>>
>>
>> I'm seeing similar problems to the one Jason described.  However, in
>> addition to the 'Packet Size too big' failure, I see a variety of
>> other error messages as well.  FD, SD, and dir are all running on the
>> same machine, disk to disk backup. Some examples of the failures:
>
> Do these thoughts help?
>
> "Well, long ago, there was a problem with packet sizes, but that was  
> a very old
> version.  If he is running a recent Bacula version, then he is  
> running a
> Bacula with the standard networking parameters, it is most likely,  
> he has a
> bad network (bad card, network wiring, switch, OS driver, ...)."
>
> I know some do not apply to your situation.
>
> -- 
> Dan Langille
> http://langille.org/
>
>
>

I saw some mention of these past problems during my research, so I  
attempted to rule out network issues.  In my case, the FD, SD, and dir  
are all on the same machine, so I assume the problem isn't switches or  
cabling.

The machine is a Sun x4100m2, with 2 Intel Pro 1000 (e1000g driver)  
and 2 Nvidia (nge driver) interfaces, and I've tried using each model  
as the active interface.  I've also tried using localhost/127.0.0.1  
versus the FQDN in all the config files.  I can't totally rule out a  
driver problem, but I can at least say it applies to 2 different  
drivers.  I've seen this problem both on Bacula 2.4.2 and 2.5.16 with  
almost default configs, definitely no networking parameter changes.

-Conor



------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users