Veritas-bu

Re: [Veritas-bu] Bpjobd and other failures.

2010-01-20 04:11:48
Subject: Re: [Veritas-bu] Bpjobd and other failures.
From: Justin Piszcz <jpiszcz AT lucidpixels DOT com>
To: Jeff Cleverley <jeff.cleverley AT avagotech DOT com>
Date: Wed, 20 Jan 2010 04:11:37 -0500 (EST)
Hi,

Taking a shot in the dark here, for the tcp issues, try adding:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1

To your /etc/sysctl.conf, reboot.

For vnetd, check your /etc/xinetd.d/vnetd*
Also check the logs that xinetd is not throttling connections if too many 
servers are trying to backup too fast that can happen.

Justin.

On Tue, 19 Jan 2010, Jeff Cleverley wrote:

> Greetings,
>
> While continuing to work on this it seems there may be issues with vnetd.
> The netstat -a |grep vnet shows this:
>
> tcp        0      0 *:vnetd                     *:*
> LISTEN
> tcp        0      0 sgpbkp04.sgp.avagotec:35781 agt604.sgp.avagotech.:vnetd
> ESTABLISHED
> tcp        0      0 sgpbkp04.sgp.avagotec:35720 sgpbkp04.sgp.avagotec:vnetd
> ESTABLISHED
> tcp        0      0 sgpbkp04.sgp.avagotec:vnetd sgpbkp04.sgp.avagotec:35720
> ESTABLISHED
> tcp        0      0 localhost.localdomain:vnetd sgpbkp04.sgp.avagotec:35846
> TIME_WAIT
> tcp        0      0 localhost.localdomain:vnetd sgpbkp04.sgp.avagotec:35853
> TIME_WAIT
> tcp        0      0 localhost.localdomain:vnetd sgpbkp04.sgp.avagotec:35839
> TIME_WAIT
> unix  2      [ ACC ]     STREAM     LISTENING     146403
> /usr/openv/var/vnetd/vmd.uds
> unix  2      [ ACC ]     STREAM     LISTENING     145874
> /usr/openv/var/vnetd/bpcompatd.uds
> unix  2      [ ACC ]     STREAM     LISTENING     146786
> /usr/openv/var/vnetd/tldcd.uds
> unix  3      [ ]         STREAM     CONNECTED     152574
> /usr/openv/var/vnetd/bpcompatd.uds
>
> The time_wait entries seem to stick around a lot.  I've restarted xinetd on
> the system and we have rebooted but things are still wedged.
>
> Thanks,
>
> Jeff
>
> On Tue, Jan 19, 2010 at 6:00 PM, Jeff Cleverley <
> jeff.cleverley AT avagotech DOT com> wrote:
>
>> Greetings,
>>
>> Our environment is NB6.5.1 on a RHEL4 server.  It has a hpux SAN media
>> server also.  All other clients are backed up over the network.  Most are
>> RHEL4x.
>>
>> The tape library in our Singapore office failed over the weekend and caused
>> a lot of things to fail and continue to be wedged up.  Some jobs seemed to
>> have run but some failed with errors 13, 63, and 233.  This varied across
>> policies.  I decided to try and restart all processes and get things cleaned
>> up.  This hasn't worked well.
>>
>> When I started everything using service netbackup start or
>> /etc/init.d/netbackup start, everything looks OK.  When I look at things
>> like bpps -a I notice that the bpjobd isn't running anymore.  When I try to
>> start it manually it fails saying File size limit exceeded.  The bpdbjobs
>> returns no output.  I haven't been able to figure out which file it is
>> complaining about.
>>
>> I'm sure I have a lot of things that need to be cleaned up.  There are a
>> lot of files in the restart and trylogs.  I was thinking it was safe to move
>> those out of the way but wanted to make sure.
>>
>> Any help on tracking the bpjobd error along with advice on cleaning up all
>> the restart and trylogs would be appreciated.  Naturally I'm leaving on
>> vacation Thursday so I need to help clean this up before I go.  I won't be
>> doing any replies to this after Wednesday night because of that.
>>
>> Thanks,
>>
>> Jeff
>>
>> --
>> Jeff Cleverley
>> Unix Systems Administrator
>> 4380 Ziegler Road
>> Fort Collins, Colorado 80525
>> 970-288-4611
>>
>>
>
>
> -- 
> Jeff Cleverley
> Unix Systems Administrator
> 4380 Ziegler Road
> Fort Collins, Colorado 80525
> 970-288-4611
>
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>