Veritas-bu

[Veritas-bu] Serious master issue...

2007-02-14 14:51:38
Subject: [Veritas-bu] Serious master issue...
From: hampus.lind at rps.police.se (Hampus Lind)
Date: Wed, 14 Feb 2007 20:51:38 +0100
Yes as good as a can..

No errors in the array or so... Patched the OS with IO patches etc..

Hampus Lind
Rikspolisstyrelsen
National Police Board
Tel dir: +46 (0)8 - 401 99 43
Tel mob: +46 (0)70 - 217 92 66
E-mail: hampus.lind at rps.police.se


-----Ursprungligt meddelande-----
Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com] 
Skickat: den 14 februari 2007 20:41
Till: Hampus Lind
Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
?mne: Re: SV: [Veritas-bu] Serious master issue...

Have you checked the underying hardware for any I/O problems? Degraded 
array/etc?

On Wed, 14 Feb 2007, Hampus Lind wrote:

> Hi,
>
> I cant don anything....
>
> Bpdbm -consistecny 2 has been running for over 12 hours and havent checked
> more than 4-5 clients.
>
> It was the first thing support told me. Your db is corrupted... So I tried
> to run bpdbm -consistency 2 check. The check found some issues, like
expired
> images which where not removed etc. But when I was about to remove them
> manually the netbackup db clean process already had took care of them..
>
> So what I understand you can have some level of corruption in your db
which
> nbu cleans out when the clean job runs.
>
> I am not compressing my catalogs.
>
> Thanks,
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
>
> -----Ursprungligt meddelande-----
> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
> Skickat: den 14 februari 2007 20:31
> Till: Hampus Lind
> Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: [Veritas-bu] Serious master issue...
>
> Have you run the check_db_consistency? There is a command that checks to
> make sure your images are not corrupted!
>
> I would recommend checking that.
>
> Also, are you running compression on your catalogs?
>
>
> On Wed, 14 Feb 2007, Hampus Lind wrote:
>
>> Thanks Bryan,
>>
>>
>>
>> It happens directly after reboot..
>>
>>
>>
>> The thing is:
>>
>> -          I have deactivated all polices
>>
>> -          Stop our media server
>>
>> -          And then restarted netbackup on the master.
>>
>>
>>
>> So there are absolutely no action going on (no backup, no user backup, no
>> restore, no staging) only internal netbackup work
.
>>
>> At once when netbackup on the master gets active, it starts bpdbm process
>> after bpdbm process. It consume 100% of both my CPU`s and write/read
> heavily
>> to the /usr/openv/netbackup/db filesystem.
>>
>> When I have no action at all after a clean start, we have about 42 bpdbm
>> processes and nearly as many bprd processes

>>
>>
>>
>> I cant figure this one out, and support points to disk config or
something
>> else that sounds good in there ears

>>
>>
>>
>> Thanks for all help,
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Bahnmiller, Bryan [mailto:BBahnmiller at pier1.com]
>> Skickat: den 14 februari 2007 20:04
>> Till: Hampus Lind
>> ?mne: RE: [Veritas-bu] Serious master issue...
>>
>>
>>
>> Hampus,
>>
>>
>>
>>  How quickly does this behaviour start happening after a recycle/reboot?
I
>> worked with an N4000 master running 11i. We did have 8 cpus and 8 GB RAM.
> We
>> were running over 15,000 backup jobs daily though. Our catalog was over
>> 400GB. (Catalog was on EMC DMX disk.) Running good old 3.4 we would have
> to
>> reboot the system almost every week. If you can cleanly re-cycle
NetBackup
> -
>> shut it down, kill all NBU processes, and then restart it, that should be
>> almost as good.
>>
>>
>>
>>  Here we are running NBU 5.1mp4 on a Win2K3 master - 2 cpus, 4 GB RAM. (I
>> inherited the system - not my choice.) We run about 5000 jobs per day, we
>> have a 280 GB catalog on EMC Clariion. The system will stay stable for 2
>> weeks pretty easily. 4 weeks starts pushing things. So we usually reboot
> our
>> Windows master and media servers every 2 weeks.
>>
>>
>>
>>  It seems like you will have cumulative problems with NetBackup that can
>> build up over time. It is way more pronounced on busy systems. We have
>> another NetBackup system that has 1 Master and 1 Media server. It runs
> about
>> 40 jobs per day max. I hardly ever have to reboot those servers.
>>
>>
>>
>>       Bryan
>>
>>
>>
>> Bryan Bahnmiller
>>
>> ISD Business Continuity
>>
>> Pier 1 Imports, Inc
>>
>> 817-252-8570
>>
>>
>>
>>
>>
>>
>>  _____
>>
>>
>> From: veritas-bu-bounces at mailman.eng.auburn.edu
>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Hampus
> Lind
>> Sent: Wednesday, February 14, 2007 12:17 PM
>> To: Veritas-bu at mailman.eng.auburn.edu
>> Subject: Re: [Veritas-bu] Serious master issue...
>> Importance: High
>>
>> All,
>>
>>
>>
>> Now I have been transferred to USA support
 God bless America!
>>
>>
>>
>> They have told me that they haven?t seen such a big installation in over
a
>> year
. Strange, I have about 200 clients and backup a couple a TB per
> day..
>> I was under the impression that this was kinda small installation..??
>>
>>
>>
>> However, they have told me that this is perfectly normal behaviour with
>> netbackup. That it produces heavy disk IO and eat all CPU power. And I
was
>> really stupid and told them that I also had an case with HP earlier on
> this
>> disk IO problem, so now Symantec support are pointing all there fingers
at
>> HP and our disk setup.
>>
>>
>>
>> Our DB is about 60-65 GB and resides on a StorageTek Flexline 380 disk
> array
>> (SAN). We run a RAID 5 on 146GB FC drives.. I don?t really see the
>> bottleneck there, but I will create a RAID 5 on 73GB 15K FC drives just
to
>> shut netbackup support up

>>
>>
>>
>> We run a two CPU HP rp2470  with HP-UX 11.11 as a master server.
Shouldn?t
>> this be enough for this installation?
>>
>>
>>
>> Ooh well

>>
>>
>>
>> If support cant help me, what should I do?? I am desperate!!!
>>
>>
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Hampus Lind
>> Skickat: den 14 februari 2007 12:48
>> Till: Veritas-bu at mailman.eng.auburn.edu
>> ?mne: [Veritas-bu] Serious master issue...
>> Prioritet: H?g
>>
>>
>>
>> Hi,
>>
>>
>>
>> We have a serious issue here with our master server. The problem occurred
> a
>> couple of weeks ago, or at least I found out about it then..
>>
>>
>>
>> I was looking at IO`s and scsi queue depth on my master (hp-ux 11.11)
when
> a
>> say that we had 4000-6000 SCSI commands in que, and a disk utilisation of
>> 100% for the /usr/openv/netbackup/db disk.
>>
>>
>>
>> I have patched hpux to the latest patch bundle and we run NBU 5.1 MP4.
>>
>>
>>
>> HP support sad that bpdbm was leaking memory.
>>
>>
>>
>> Veritas support still investigating.. But we have about 30 bpdbm and bprd
>> processes active on our master which eats both my CPU`s and produces tons
> of
>> IO against our db disk.
>>
>>
>>
>> I actived verbose = 5 on the master, and after 15 minutes the bpdbm log
> had
>> reached the file size limit on our filsystem, 2 GB

>>
>>
>>
>> Any one had similar problems?
>>
>>
>>
>>
>>
>> Thanks and regards,
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail:  <mailto:hampus.lind at rps.police.se> hampus.lind at rps.police.se
>>
>>
>>
>>
>