Veritas-bu

Re: [Veritas-bu] Bpjobd and other failures.

2010-01-21 06:47:42
Subject: Re: [Veritas-bu] Bpjobd and other failures.
From: "WEAVER, Simon \(external\)" <simon.weaver AT astrium.eads DOT net>
To: "Jeff Cleverley" <jeff.cleverley AT avagotech DOT com>, "Justin Piszcz" <jpiszcz AT lucidpixels DOT com>
Date: Thu, 21 Jan 2010 11:47:01 -0000
Jeff
Good idea about not touching anything again...
 
One colleague a few years back, done a Firmware upgrade to a Tape Library on a Friday and went on a few weeks vacation.
 
On the Monday, a new person, totally unaware of the backups found every single one was failing. took 2 weeks to resolve.
 
Never make big changes on the last day of the week, or before you go on vacation springs to mind :-)
Enjoy the break !
 
Simon


From: veritas-bu-bounces AT mailman.eng.auburn DOT edu [mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Jeff Cleverley
Sent: Wednesday, January 20, 2010 6:03 PM
To: Justin Piszcz
Cc: VERITAS-BU AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Bpjobd and other failures.

Justin,

Thanks for the reply.  For whatever reason things seem to have magically started working again.  All I did was shutdown Veritas (again), turned up the verbosity in bp.conf, and restarted it.  When it first started I still didn't have bpdbm, bpjobd, etc, running.  The vnetd log had a lot of errors.  When I ran bpdbjobs from the command line, nothing came back.

While looking through the bpdbm log I found no errors but a lot of entries like it was doing backups.  About 10 minutes later I ran bpdbjobs again and everything showed up and some jobs were running.  I think this are restarts of some failed jobs so we'll see how they do.  So far 4 of them have completed successfully.

Since I leave the country on vacation tomorrow morning I don't plan on touching anything else on it today :-)

Thanks again for the help.

Jeff

On Wed, Jan 20, 2010 at 2:11 AM, Justin Piszcz <jpiszcz AT lucidpixels DOT com> wrote:
Hi,

Taking a shot in the dark here, for the tcp issues, try adding:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1

To your /etc/sysctl.conf, reboot.

For vnetd, check your /etc/xinetd.d/vnetd*
Also check the logs that xinetd is not throttling connections if too many servers are trying to backup too fast that can happen.

Justin.


On Tue, 19 Jan 2010, Jeff Cleverley wrote:

Greetings,

While continuing to work on this it seems there may be issues with vnetd.
The netstat -a |grep vnet shows this:

tcp        0      0 *:vnetd                     *:*
LISTEN
tcp        0      0 sgpbkp04.sgp.avagotec:35781 agt604.sgp.avagotech.:vnetd
ESTABLISHED
tcp        0      0 sgpbkp04.sgp.avagotec:35720 sgpbkp04.sgp.avagotec:vnetd
ESTABLISHED
tcp        0      0 sgpbkp04.sgp.avagotec:vnetd sgpbkp04.sgp.avagotec:35720
ESTABLISHED
tcp        0      0 localhost.localdomain:vnetd sgpbkp04.sgp.avagotec:35846
TIME_WAIT
tcp        0      0 localhost.localdomain:vnetd sgpbkp04.sgp.avagotec:35853
TIME_WAIT
tcp        0      0 localhost.localdomain:vnetd sgpbkp04.sgp.avagotec:35839
TIME_WAIT
unix  2      [ ACC ]     STREAM     LISTENING     146403
/usr/openv/var/vnetd/vmd.uds
unix  2      [ ACC ]     STREAM     LISTENING     145874
/usr/openv/var/vnetd/bpcompatd.uds
unix  2      [ ACC ]     STREAM     LISTENING     146786
/usr/openv/var/vnetd/tldcd.uds
unix  3      [ ]         STREAM     CONNECTED     152574
/usr/openv/var/vnetd/bpcompatd.uds

The time_wait entries seem to stick around a lot.  I've restarted xinetd on
the system and we have rebooted but things are still wedged.

Thanks,

Jeff

On Tue, Jan 19, 2010 at 6:00 PM, Jeff Cleverley <
jeff.cleverley AT avagotech DOT com> wrote:

Greetings,

Our environment is NB6.5.1 on a RHEL4 server.  It has a hpux SAN media
server also.  All other clients are backed up over the network.  Most are
RHEL4x.

The tape library in our Singapore office failed over the weekend and caused
a lot of things to fail and continue to be wedged up.  Some jobs seemed to
have run but some failed with errors 13, 63, and 233.  This varied across
policies.  I decided to try and restart all processes and get things cleaned
up.  This hasn't worked well.

When I started everything using service netbackup start or
/etc/init.d/netbackup start, everything looks OK.  When I look at things
like bpps -a I notice that the bpjobd isn't running anymore.  When I try to
start it manually it fails saying File size limit exceeded.  The bpdbjobs
returns no output.  I haven't been able to figure out which file it is
complaining about.

I'm sure I have a lot of things that need to be cleaned up.  There are a
lot of files in the restart and trylogs.  I was thinking it was safe to move
those out of the way but wanted to make sure.

Any help on tracking the bpjobd error along with advice on cleaning up all
the restart and trylogs would be appreciated.  Naturally I'm leaving on
vacation Thursday so I need to help clean this up before I go.  I won't be
doing any replies to this after Wednesday night because of that.

Thanks,

Jeff

--
Jeff Cleverley
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611




--
Jeff Cleverley
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611




--
Jeff Cleverley
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
970-288-4611

This email (including any attachments) may contain confidential
and/or privileged information or information otherwise protected
from disclosure. If you are not the intended recipient, please
notify the sender immediately, do not copy this message or any
attachments and do not use it for any purpose or disclose its
content to any person, but delete this message and any attachments
from your system. Astrium disclaims any and all liability if this
email transmission was virus corrupted, altered or falsified.
-o-
Astrium Limited, Registered in England and Wales No. 2449259
Registered Office:
Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS, England
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>