Veritas-bu

[Veritas-bu] running out of memory on Solaris 8 master servers

2005-12-21 14:32:32
Subject: [Veritas-bu] running out of memory on Solaris 8 master servers
From: ed AT gurski DOT com (Ed Gurski)
Date: Wed, 21 Dec 2005 14:32:32 -0500
> On Wed, Dec 21, 2005 at 08:51:45AM -0700, King, Cheryl wrote:
> > I have two Solaris 8 master servers running NetBackup 5.1 MP3S2.  They
> > keep running out of memory after a few months.  I'm wondering if anyone
> > else has this problem and if they fixed it.  It started some time after
> > going to v5.1 I think.  Maintenance hasn't fixed it.  
> 
> We've been seeing this as well on Solaris 9 with NBU 5.0.  We're
> currently at MP6.
> 
> > Any ideas on what's causing it or how I could find out what's causing
> > it?  When it runs out of memory it's always during regular backups.
> 
> Backups take memory so they're victims.  We see a nice jump in swap
> utilization every time our main windows open at 6pm.
> 
> > Since I haven't seen any discussion about this on this list I'm assuming
> > regular scheduled backups isn't the cause.
> > 
> > The only way to get back swap space is to boot the server.
> 
> Add swap :-).  That's our workaround until we can upgrde the server with
> a new box (unfortunately our 420 is maxed out for memory already).
> 
> We run the rman Oracle instance on our master server and it appears to
> be part of the culprit.  We had it off for a while and we were much
> better.  We don't know if it's the straw that breaks the camel's back or
> if it's the cause.
> 
> NetBackup really doesn't like to run out memory (not that I blame it
> much) - all the jobs terminate with a 150.
> 
> Are other Solaris admins experiencing the same issues?  Does NetBackup
> just take that much memory to run?
> 
> I have heard that NBU 6 takes a lot of memory than NBU 5 but haven't
> been able to validate that yet.
> 
>         .../Ed
> 
I ran into this problem on Solaris 9 and the problem kept getting worse.
It is now somewhat resolved....

What was casuing the problem --- there are possibly two culprits --

1) bpschedule takes up a huge chunk of processor and nothing gets
started --- There is a fix in 5.1 MP4 according to Symantec/Veritas

2) NIC Trunking --- this was our problem and there is no immediate fix.
I spoke with Sun engineers at length --- I got two solutions --- one was
to update to the latest patch --- which doesn't work and has since been
pulled. The other is to downgrade to the previous patch level.

Now my NBU servers were brand new V440's so I trunked them. The problem
is the downgraded trunking software does not work. So I am running on a
single NIC on my Master server. The 2 media servers are not affected ---
It's a volume related issue --- so it explains your observation.

I was able to acquire a spare V440 and will be testing Solaris 10 which
I am assured by SUN has better trunking software and the TCP/IP stack is
about 30% faster. 

I do have a script that I run to determine if the kernel is taking too
much memory --- it should normally take less than 15%, but once it
starts growing you know you have a problem.

I empathize with you since it worked on this problem for 2 months and
could not find anything. I finally called our solution provider and we
were able to get both Sun and Symantec involved...

I hope this helps....
-- 
Ed Gurski <ed AT gurski DOT com>