Veritas-bu

[Veritas-bu] Serious master issue...

2007-02-14 17:04:18
Subject: [Veritas-bu] Serious master issue...
From: hampus.lind at rps.police.se (Hampus Lind)
Date: Wed, 14 Feb 2007 23:04:18 +0100
Because of the heavy IO produced by all my bpdbm processes there are now way
that i can find anything in those logs...

But support has got the all and says everything seems normal.. So what can I
do.. ? I am helpless...

Hampus Lind
Rikspolisstyrelsen
National Police Board
Tel dir: +46 (0)8 - 401 99 43
Tel mob: +46 (0)70 - 217 92 66
E-mail: hampus.lind at rps.police.se


-----Ursprungligt meddelande-----
Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com] 
Skickat: den 14 februari 2007 23:01
Till: Hampus Lind
Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
Veritas-bu at mailman.eng.auburn.edu
?mne: Re: SV: SV: SV: [Veritas-bu] Serious master issue...

With VERBOSE = 5

cd /usr/openv/netbackup/logs
tail -f */*date_of_today*

Do you see anything weird relating to memory or corruption?


On Wed, 14 Feb 2007, Hampus Lind wrote:

> I can't tell.... I think it has been there for a while and got worse with
> time..
>
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
>
> -----Ursprungligt meddelande-----
> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
> Skickat: den 14 februari 2007 22:58
> Till: Hampus Lind
> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
> Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: SV: SV: [Veritas-bu] Serious master issue...
>
> When did this problem happen? Out of the blue or after a patch?
>
> On Wed, 14 Feb 2007, Hampus Lind wrote:
>
>> I have run a couple of tests... And it seems that if a want any info at
> all
>> from bpdbm -consistensy 2 I have to shutdown netbackup and then run the
>> check when everything is down.
>>
>> Even then it takes forever.. Sometime it gets further then other...
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>> Skickat: den 14 februari 2007 22:47
>> Till: Hampus Lind
>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>> Veritas-bu at mailman.eng.auburn.edu
>> ?mne: Re: SV: [Veritas-bu] Serious master issue...
>>
>> Another option is turn off backups, move the old images out of the way
one
>> by one and find what is causing the consistency to choke, does it stop on
>> one set of images or does it run through them all but just very slowly?
>>
>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>
>>> The NBCC doesn?t look at the image db, and they keep saying we have a
>>> problem there.. But I don?t know how we can fix it or even collect the
>> info
>>> from the db when bpdbm ?consistensy 2 wont runt..
>>>
>>>
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail: hampus.lind at rps.police.se
>>>
>>> -----Ursprungligt meddelande-----
>>> Fr?n: Steven L. Sesar [mailto:ssesar at mitre.org]
>>> Skickat: den 14 februari 2007 20:53
>>> Till: Hampus Lind
>>> Kopia: 'Justin Piszcz'; 'Bahnmiller, Bryan';
>>> Veritas-bu at mailman.eng.auburn.edu
>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>
>>>
>>>
>>> bpdbm -consistency 2 is useless to you, based on the amount of data that
>> you
>>> back up nightly and my own presumption of how long backups run in your
>>> environment. It will take longer to run than your backup domain will
>> remain
>>> idle. If I recall, they have a process which does a better job at
finding
>>> catalog/db corruption/inconsistency. I think that it's called NBCC.
>>>
>>> The problem with NBCC is similar, though. You send them the output of
>> three
>>> commands:
>>>
>>> vmquery -a, bpmedialist -ls, and bpimmedia
>>>
>>> Then, they munge the output of the above commands through a reporting
> tool
>>> that Symantec will NOT share with end users. At some point later in the
>> day
>>> (hopefully, sooner rather than later), they will send you a report. You
>> must
>>> then take certain actions to correct any discrepancies found. The backup
>>> system must be completely idle during this time. Restores are ok, but no
>>> backup activity can be taking place.
>>>
>>> Afterwards, you 'll run those commands again, they'll generate the
report
>>> again, and you'll see how you're doing. It may take you several passes
to
>>> get things squared away.
>>>
>>> The problem is that most of us don't have a completely idle backup
>>> infrastructure - at least for long enough for this process to complete.
I
>>> didn't when I was NBU customer. Once you take backups, the reports
become
>>> obsolete, as do the results of bpdbm -consistency 2.
>>>
>>> It would not surprise me if bpdbm was leaking memory on your platform.
>>>
>>> --Steve
>>>
>>>
>>> Hampus Lind wrote:
>>>
>>> Hi,
>>>
>>> I cant don anything....
>>>
>>> Bpdbm -consistecny 2 has been running for over 12 hours and havent
> checked
>>> more than 4-5 clients.
>>>
>>> It was the first thing support told me. Your db is corrupted... So I
> tried
>>> to run bpdbm -consistency 2 check. The check found some issues, like
>> expired
>>> images which where not removed etc. But when I was about to remove them
>>> manually the netbackup db clean process already had took care of them..
>>>
>>> So what I understand you can have some level of corruption in your db
>> which
>>> nbu cleans out when the clean job runs.
>>>
>>> I am not compressing my catalogs.
>>>
>>> Thanks,
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail: hampus.lind at rps.police.se
>>>
>>>
>>> -----Ursprungligt meddelande-----
>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>> Skickat: den 14 februari 2007 20:31
>>> Till: Hampus Lind
>>> Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>
>>> Have you run the check_db_consistency? There is a command that checks to
>>> make sure your images are not corrupted!
>>>
>>> I would recommend checking that.
>>>
>>> Also, are you running compression on your catalogs?
>>>
>>>
>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>
>>>
>>>
>>> Thanks Bryan,
>>>
>>>
>>>
>>> It happens directly after reboot..
>>>
>>>
>>>
>>> The thing is:
>>>
>>> -          I have deactivated all polices
>>>
>>> -          Stop our media server
>>>
>>> -          And then restarted netbackup on the master.
>>>
>>>
>>>
>>> So there are absolutely no action going on (no backup, no user backup,
no
>>> restore, no staging) only internal netbackup work
.
>>>
>>> At once when netbackup on the master gets active, it starts bpdbm
process
>>> after bpdbm process. It consume 100% of both my CPU`s and write/read
>>>
>>>
>>> heavily
>>>
>>>
>>> to the /usr/openv/netbackup/db filesystem.
>>>
>>> When I have no action at all after a clean start, we have about 42 bpdbm
>>> processes and nearly as many bprd processes

>>>
>>>
>>>
>>> I cant figure this one out, and support points to disk config or
> something
>>> else that sounds good in there ears

>>>
>>>
>>>
>>> Thanks for all help,
>>>
>>>
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail: hampus.lind at rps.police.se
>>>
>>> -----Ursprungligt meddelande-----
>>> Fr?n: Bahnmiller, Bryan [mailto:BBahnmiller at pier1.com]
>>> Skickat: den 14 februari 2007 20:04
>>> Till: Hampus Lind
>>> ?mne: RE: [Veritas-bu] Serious master issue...
>>>
>>>
>>>
>>> Hampus,
>>>
>>>
>>>
>>> How quickly does this behaviour start happening after a recycle/reboot?
I
>>> worked with an N4000 master running 11i. We did have 8 cpus and 8 GB
RAM.
>>>
>>>
>>> We
>>>
>>>
>>> were running over 15,000 backup jobs daily though. Our catalog was over
>>> 400GB. (Catalog was on EMC DMX disk.) Running good old 3.4 we would have
>>>
>>>
>>> to
>>>
>>>
>>> reboot the system almost every week. If you can cleanly re-cycle
> NetBackup
>>>
>>>
>>> -
>>>
>>>
>>> shut it down, kill all NBU processes, and then restart it, that should
be
>>> almost as good.
>>>
>>>
>>>
>>> Here we are running NBU 5.1mp4 on a Win2K3 master - 2 cpus, 4 GB RAM. (I
>>> inherited the system - not my choice.) We run about 5000 jobs per day,
we
>>> have a 280 GB catalog on EMC Clariion. The system will stay stable for 2
>>> weeks pretty easily. 4 weeks starts pushing things. So we usually reboot
>>>
>>>
>>> our
>>>
>>>
>>> Windows master and media servers every 2 weeks.
>>>
>>>
>>>
>>> It seems like you will have cumulative problems with NetBackup that can
>>> build up over time. It is way more pronounced on busy systems. We have
>>> another NetBackup system that has 1 Master and 1 Media server. It runs
>>>
>>>
>>> about
>>>
>>>
>>> 40 jobs per day max. I hardly ever have to reboot those servers.
>>>
>>>
>>>
>>>      Bryan
>>>
>>>
>>>
>>> Bryan Bahnmiller
>>>
>>> ISD Business Continuity
>>>
>>> Pier 1 Imports, Inc
>>>
>>> 817-252-8570
>>>
>>>
>>>
>>>
>>>
>>>
>>> _____
>>>
>>>
>>> From: veritas-bu-bounces at mailman.eng.auburn.edu
>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Hampus
>>>
>>>
>>> Lind
>>>
>>>
>>> Sent: Wednesday, February 14, 2007 12:17 PM
>>> To: Veritas-bu at mailman.eng.auburn.edu
>>> Subject: Re: [Veritas-bu] Serious master issue...
>>> Importance: High
>>>
>>> All,
>>>
>>>
>>>
>>> Now I have been transferred to USA support
 God bless America!
>>>
>>>
>>>
>>> They have told me that they haven?t seen such a big installation in over
> a
>>> year
. Strange, I have about 200 clients and backup a couple a TB per
>>>
>>>
>>> day..
>>>
>>>
>>> I was under the impression that this was kinda small installation..??
>>>
>>>
>>>
>>> However, they have told me that this is perfectly normal behaviour with
>>> netbackup. That it produces heavy disk IO and eat all CPU power. And I
> was
>>> really stupid and told them that I also had an case with HP earlier on
>>>
>>>
>>> this
>>>
>>>
>>> disk IO problem, so now Symantec support are pointing all there fingers
> at
>>> HP and our disk setup.
>>>
>>>
>>>
>>> Our DB is about 60-65 GB and resides on a StorageTek Flexline 380 disk
>>>
>>>
>>> array
>>>
>>>
>>> (SAN). We run a RAID 5 on 146GB FC drives.. I don?t really see the
>>> bottleneck there, but I will create a RAID 5 on 73GB 15K FC drives just
> to
>>> shut netbackup support up

>>>
>>>
>>>
>>> We run a two CPU HP rp2470  with HP-UX 11.11 as a master server.
> Shouldn?t
>>> this be enough for this installation?
>>>
>>>
>>>
>>> Ooh well

>>>
>>>
>>>
>>> If support cant help me, what should I do?? I am desperate!!!
>>>
>>>
>>>
>>>
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail: hampus.lind at rps.police.se
>>>
>>> -----Ursprungligt meddelande-----
>>> Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Hampus Lind
>>> Skickat: den 14 februari 2007 12:48
>>> Till: Veritas-bu at mailman.eng.auburn.edu
>>> ?mne: [Veritas-bu] Serious master issue...
>>> Prioritet: H?g
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> We have a serious issue here with our master server. The problem
occurred
>>>
>>>
>>> a
>>>
>>>
>>> couple of weeks ago, or at least I found out about it then..
>>>
>>>
>>>
>>> I was looking at IO`s and scsi queue depth on my master (hp-ux 11.11)
> when
>>>
>>>
>>> a
>>>
>>>
>>> say that we had 4000-6000 SCSI commands in que, and a disk utilisation
of
>>> 100% for the /usr/openv/netbackup/db disk.
>>>
>>>
>>>
>>> I have patched hpux to the latest patch bundle and we run NBU 5.1 MP4.
>>>
>>>
>>>
>>> HP support sad that bpdbm was leaking memory.
>>>
>>>
>>>
>>> Veritas support still investigating.. But we have about 30 bpdbm and
bprd
>>> processes active on our master which eats both my CPU`s and produces
tons
>>>
>>>
>>> of
>>>
>>>
>>> IO against our db disk.
>>>
>>>
>>>
>>> I actived verbose = 5 on the master, and after 15 minutes the bpdbm log
>>>
>>>
>>> had
>>>
>>>
>>> reached the file size limit on our filsystem, 2 GB

>>>
>>>
>>>
>>> Any one had similar problems?
>>>
>>>
>>>
>>>
>>>
>>> Thanks and regards,
>>>
>>>
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail:   <mailto:hampus.lind at rps.police.se>
>>> <mailto:hampus.lind at rps.police.se> hampus.lind at rps.police.se
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> ===================================
>>>
>>>   Steven L. Sesar
>>>   Lead Operating Systems Programmer/Analyst
>>>   UNIX Application Services R101
>>>   The MITRE Corporation
>>>   202 Burlington Road - MS K101
>>>   Bedford, MA 01730
>>>   tel: (781) 271-7702
>>>   fax: (781) 271-2600
>>>   mobile: (617) 519-8933
>>>   email: ssesar at mitre.org
>>>
>>> ===================================
>>>
>>
>