Re: [Veritas-bu] Bpjobd and other failures.
2010-01-21 06:47:42
Jeff
Good idea about not touching anything
again...
One colleague a few years back, done a Firmware upgrade to
a Tape Library on a Friday and went on a few weeks vacation.
On the Monday, a new person, totally unaware of the backups
found every single one was failing. took 2 weeks to resolve.
Never make big changes on the last day of the week, or
before you go on vacation springs to mind :-)
Enjoy the break !
Simon
Justin,
Thanks for the reply. For whatever reason things
seem to have magically started working again. All I did was shutdown
Veritas (again), turned up the verbosity in bp.conf, and restarted it.
When it first started I still didn't have bpdbm, bpjobd, etc, running. The
vnetd log had a lot of errors. When I ran bpdbjobs from the command line,
nothing came back.
While looking through the bpdbm log I found no errors
but a lot of entries like it was doing backups. About 10 minutes later I
ran bpdbjobs again and everything showed up and some jobs were running. I
think this are restarts of some failed jobs so we'll see how they do. So
far 4 of them have completed successfully.
Since I leave the country on
vacation tomorrow morning I don't plan on touching anything else on it today
:-)
Thanks again for the help.
Jeff
On Wed, Jan 20, 2010 at 2:11 AM, Justin Piszcz <jpiszcz AT lucidpixels DOT com>
wrote:
Hi,
Taking
a shot in the dark here, for the tcp issues, try
adding: net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1
To
your /etc/sysctl.conf, reboot.
For vnetd, check your
/etc/xinetd.d/vnetd* Also check the logs that xinetd is not throttling
connections if too many servers are trying to backup too fast that can
happen.
Justin.
On Tue, 19 Jan 2010, Jeff Cleverley wrote:
Greetings,
While
continuing to work on this it seems there may be issues with vnetd. The
netstat -a |grep vnet shows this:
tcp 0
0 *:vnetd
*:* LISTEN tcp
0 0 sgpbkp04.sgp.avagotec:35781
agt604.sgp.avagotech.:vnetd ESTABLISHED tcp
0 0 sgpbkp04.sgp.avagotec:35720
sgpbkp04.sgp.avagotec:vnetd ESTABLISHED tcp
0 0 sgpbkp04.sgp.avagotec:vnetd
sgpbkp04.sgp.avagotec:35720 ESTABLISHED tcp
0 0 localhost.localdomain:vnetd
sgpbkp04.sgp.avagotec:35846 TIME_WAIT tcp 0
0 localhost.localdomain:vnetd
sgpbkp04.sgp.avagotec:35853 TIME_WAIT tcp 0
0 localhost.localdomain:vnetd
sgpbkp04.sgp.avagotec:35839 TIME_WAIT unix 2
[ ACC ] STREAM LISTENING
146403 /usr/openv/var/vnetd/vmd.uds unix 2 [
ACC ] STREAM LISTENING
145874 /usr/openv/var/vnetd/bpcompatd.uds unix 2
[ ACC ] STREAM LISTENING
146786 /usr/openv/var/vnetd/tldcd.uds unix 3
[ ] STREAM CONNECTED
152574 /usr/openv/var/vnetd/bpcompatd.uds
The time_wait
entries seem to stick around a lot. I've restarted xinetd on the
system and we have rebooted but things are still
wedged.
Thanks,
Jeff
On Tue, Jan 19, 2010 at 6:00 PM,
Jeff Cleverley < jeff.cleverley AT avagotech DOT com> wrote:
Greetings,
Our
environment is NB6.5.1 on a RHEL4 server. It has a hpux SAN
media server also. All other clients are backed up over the
network. Most are RHEL4x.
The tape library in our
Singapore office failed over the weekend and caused a lot of things to
fail and continue to be wedged up. Some jobs seemed to have run
but some failed with errors 13, 63, and 233. This varied
across policies. I decided to try and restart all processes and
get things cleaned up. This hasn't worked well.
When I
started everything using service netbackup start
or /etc/init.d/netbackup start, everything looks OK. When I look
at things like bpps -a I notice that the bpjobd isn't running anymore.
When I try to start it manually it fails saying File size limit
exceeded. The bpdbjobs returns no output. I haven't been
able to figure out which file it is complaining about.
I'm sure
I have a lot of things that need to be cleaned up. There are
a lot of files in the restart and trylogs. I was thinking it was
safe to move those out of the way but wanted to make sure.
Any
help on tracking the bpjobd error along with advice on cleaning up
all the restart and trylogs would be appreciated. Naturally I'm
leaving on vacation Thursday so I need to help clean this up before I
go. I won't be doing any replies to this after Wednesday night
because of that.
Thanks,
Jeff
-- Jeff
Cleverley Unix Systems Administrator 4380 Ziegler Road Fort
Collins, Colorado 80525 970-288-4611
--
Jeff Cleverley Unix Systems Administrator 4380 Ziegler
Road Fort Collins, Colorado
80525 970-288-4611
-- Jeff Cleverley Unix Systems Administrator 4380
Ziegler Road Fort Collins, Colorado
80525 970-288-4611
This email (including any attachments) may contain confidential
and/or privileged information or information otherwise protected
from disclosure. If you are not the intended recipient, please
notify the sender immediately, do not copy this message or any
attachments and do not use it for any purpose or disclose its
content to any person, but delete this message and any attachments
from your system. Astrium disclaims any and all liability if this
email transmission was virus corrupted, altered or falsified.
-o-
Astrium Limited, Registered in England and Wales No. 2449259
Registered Office:
Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS, England |
_______________________________________________
Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
|
|
|