Veritas-bu

[Veritas-bu] Serious master issue...

2007-02-14 17:13:32
Subject: [Veritas-bu] Serious master issue...
From: hampus.lind at rps.police.se (Hampus Lind)
Date: Wed, 14 Feb 2007 23:13:32 +0100
5.1 MP4



Hampus Lind
Rikspolisstyrelsen
National Police Board
Tel dir: +46 (0)8 - 401 99 43
Tel mob: +46 (0)70 - 217 92 66
E-mail: hampus.lind at rps.police.se


-----Ursprungligt meddelande-----
Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com] 
Skickat: den 14 februari 2007 23:11
Till: Hampus Lind
Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
Veritas-bu at mailman.eng.auburn.edu
?mne: Re: SV: SV: SV: SV: SV: [Veritas-bu] Serious master issue...

Also are you using 5.x or 6.0?

On Wed, 14 Feb 2007, Hampus Lind wrote:

> I will try that tomorrow. But I don?t think the problem reside there..
>
> Iostat and sar don?t show any strange values.. sar -d 1 10 report under
50%
> average usage.
>
> But, still I will try with the fastest FC array/disk we have...
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
>
> -----Ursprungligt meddelande-----
> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
> Skickat: den 14 februari 2007 23:05
> Till: Hampus Lind
> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
> Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: SV: SV: SV: SV: [Veritas-bu] Serious master issue...
>
> Is it possible for you to move the db/images volume to another set of
> disks/raid array?
>
> then ln -s /other/location/db/images /usr/openv/netbackup/db/images
>
> That would rule out your array/FC.
>
> On Wed, 14 Feb 2007, Hampus Lind wrote:
>
>> Because of the heavy IO produced by all my bpdbm processes there are now
> way
>> that i can find anything in those logs...
>>
>> But support has got the all and says everything seems normal.. So what
can
> I
>> do.. ? I am helpless...
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>> Skickat: den 14 februari 2007 23:01
>> Till: Hampus Lind
>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>> Veritas-bu at mailman.eng.auburn.edu
>> ?mne: Re: SV: SV: SV: [Veritas-bu] Serious master issue...
>>
>> With VERBOSE = 5
>>
>> cd /usr/openv/netbackup/logs
>> tail -f */*date_of_today*
>>
>> Do you see anything weird relating to memory or corruption?
>>
>>
>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>
>>> I can't tell.... I think it has been there for a while and got worse
with
>>> time..
>>>
>>>
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail: hampus.lind at rps.police.se
>>>
>>>
>>> -----Ursprungligt meddelande-----
>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>> Skickat: den 14 februari 2007 22:58
>>> Till: Hampus Lind
>>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>>> Veritas-bu at mailman.eng.auburn.edu
>>> ?mne: Re: SV: SV: [Veritas-bu] Serious master issue...
>>>
>>> When did this problem happen? Out of the blue or after a patch?
>>>
>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>
>>>> I have run a couple of tests... And it seems that if a want any info at
>>> all
>>>> from bpdbm -consistensy 2 I have to shutdown netbackup and then run the
>>>> check when everything is down.
>>>>
>>>> Even then it takes forever.. Sometime it gets further then other...
>>>>
>>>>
>>>> Hampus Lind
>>>> Rikspolisstyrelsen
>>>> National Police Board
>>>> Tel dir: +46 (0)8 - 401 99 43
>>>> Tel mob: +46 (0)70 - 217 92 66
>>>> E-mail: hampus.lind at rps.police.se
>>>>
>>>>
>>>> -----Ursprungligt meddelande-----
>>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>>> Skickat: den 14 februari 2007 22:47
>>>> Till: Hampus Lind
>>>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>>>> Veritas-bu at mailman.eng.auburn.edu
>>>> ?mne: Re: SV: [Veritas-bu] Serious master issue...
>>>>
>>>> Another option is turn off backups, move the old images out of the way
>> one
>>>> by one and find what is causing the consistency to choke, does it stop
> on
>>>> one set of images or does it run through them all but just very slowly?
>>>>
>>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>>
>>>>> The NBCC doesn?t look at the image db, and they keep saying we have a
>>>>> problem there.. But I don?t know how we can fix it or even collect the
>>>> info
>>>>> from the db when bpdbm ?consistensy 2 wont runt..
>>>>>
>>>>>
>>>>>
>>>>> Hampus Lind
>>>>> Rikspolisstyrelsen
>>>>> National Police Board
>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>> E-mail: hampus.lind at rps.police.se
>>>>>
>>>>> -----Ursprungligt meddelande-----
>>>>> Fr?n: Steven L. Sesar [mailto:ssesar at mitre.org]
>>>>> Skickat: den 14 februari 2007 20:53
>>>>> Till: Hampus Lind
>>>>> Kopia: 'Justin Piszcz'; 'Bahnmiller, Bryan';
>>>>> Veritas-bu at mailman.eng.auburn.edu
>>>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>>>
>>>>>
>>>>>
>>>>> bpdbm -consistency 2 is useless to you, based on the amount of data
> that
>>>> you
>>>>> back up nightly and my own presumption of how long backups run in your
>>>>> environment. It will take longer to run than your backup domain will
>>>> remain
>>>>> idle. If I recall, they have a process which does a better job at
>> finding
>>>>> catalog/db corruption/inconsistency. I think that it's called NBCC.
>>>>>
>>>>> The problem with NBCC is similar, though. You send them the output of
>>>> three
>>>>> commands:
>>>>>
>>>>> vmquery -a, bpmedialist -ls, and bpimmedia
>>>>>
>>>>> Then, they munge the output of the above commands through a reporting
>>> tool
>>>>> that Symantec will NOT share with end users. At some point later in
the
>>>> day
>>>>> (hopefully, sooner rather than later), they will send you a report.
You
>>>> must
>>>>> then take certain actions to correct any discrepancies found. The
> backup
>>>>> system must be completely idle during this time. Restores are ok, but
> no
>>>>> backup activity can be taking place.
>>>>>
>>>>> Afterwards, you 'll run those commands again, they'll generate the
>> report
>>>>> again, and you'll see how you're doing. It may take you several passes
>> to
>>>>> get things squared away.
>>>>>
>>>>> The problem is that most of us don't have a completely idle backup
>>>>> infrastructure - at least for long enough for this process to
complete.
>> I
>>>>> didn't when I was NBU customer. Once you take backups, the reports
>> become
>>>>> obsolete, as do the results of bpdbm -consistency 2.
>>>>>
>>>>> It would not surprise me if bpdbm was leaking memory on your platform.
>>>>>
>>>>> --Steve
>>>>>
>>>>>
>>>>> Hampus Lind wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I cant don anything....
>>>>>
>>>>> Bpdbm -consistecny 2 has been running for over 12 hours and havent
>>> checked
>>>>> more than 4-5 clients.
>>>>>
>>>>> It was the first thing support told me. Your db is corrupted... So I
>>> tried
>>>>> to run bpdbm -consistency 2 check. The check found some issues, like
>>>> expired
>>>>> images which where not removed etc. But when I was about to remove
them
>>>>> manually the netbackup db clean process already had took care of
them..
>>>>>
>>>>> So what I understand you can have some level of corruption in your db
>>>> which
>>>>> nbu cleans out when the clean job runs.
>>>>>
>>>>> I am not compressing my catalogs.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Hampus Lind
>>>>> Rikspolisstyrelsen
>>>>> National Police Board
>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>> E-mail: hampus.lind at rps.police.se
>>>>>
>>>>>
>>>>> -----Ursprungligt meddelande-----
>>>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>>>> Skickat: den 14 februari 2007 20:31
>>>>> Till: Hampus Lind
>>>>> Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
>>>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>>>
>>>>> Have you run the check_db_consistency? There is a command that checks
> to
>>>>> make sure your images are not corrupted!
>>>>>
>>>>> I would recommend checking that.
>>>>>
>>>>> Also, are you running compression on your catalogs?
>>>>>
>>>>>
>>>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>>>
>>>>>
>>>>>
>>>>> Thanks Bryan,
>>>>>
>>>>>
>>>>>
>>>>> It happens directly after reboot..
>>>>>
>>>>>
>>>>>
>>>>> The thing is:
>>>>>
>>>>> -          I have deactivated all polices
>>>>>
>>>>> -          Stop our media server
>>>>>
>>>>> -          And then restarted netbackup on the master.
>>>>>
>>>>>
>>>>>
>>>>> So there are absolutely no action going on (no backup, no user backup,
>> no
>>>>> restore, no staging) only internal netbackup work
.
>>>>>
>>>>> At once when netbackup on the master gets active, it starts bpdbm
>> process
>>>>> after bpdbm process. It consume 100% of both my CPU`s and write/read
>>>>>
>>>>>
>>>>> heavily
>>>>>
>>>>>
>>>>> to the /usr/openv/netbackup/db filesystem.
>>>>>
>>>>> When I have no action at all after a clean start, we have about 42
> bpdbm
>>>>> processes and nearly as many bprd processes

>>>>>
>>>>>
>>>>>
>>>>> I cant figure this one out, and support points to disk config or
>>> something
>>>>> else that sounds good in there ears

>>>>>
>>>>>
>>>>>
>>>>> Thanks for all help,
>>>>>
>>>>>
>>>>>
>>>>> Hampus Lind
>>>>> Rikspolisstyrelsen
>>>>> National Police Board
>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>> E-mail: hampus.lind at rps.police.se
>>>>>
>>>>> -----Ursprungligt meddelande-----
>>>>> Fr?n: Bahnmiller, Bryan [mailto:BBahnmiller at pier1.com]
>>>>> Skickat: den 14 februari 2007 20:04
>>>>> Till: Hampus Lind
>>>>> ?mne: RE: [Veritas-bu] Serious master issue...
>>>>>
>>>>>
>>>>>
>>>>> Hampus,
>>>>>
>>>>>
>>>>>
>>>>> How quickly does this behaviour start happening after a
recycle/reboot?
>> I
>>>>> worked with an N4000 master running 11i. We did have 8 cpus and 8 GB
>> RAM.
>>>>>
>>>>>
>>>>> We
>>>>>
>>>>>
>>>>> were running over 15,000 backup jobs daily though. Our catalog was
over
>>>>> 400GB. (Catalog was on EMC DMX disk.) Running good old 3.4 we would
> have
>>>>>
>>>>>
>>>>> to
>>>>>
>>>>>
>>>>> reboot the system almost every week. If you can cleanly re-cycle
>>> NetBackup
>>>>>
>>>>>
>>>>> -
>>>>>
>>>>>
>>>>> shut it down, kill all NBU processes, and then restart it, that should
>> be
>>>>> almost as good.
>>>>>
>>>>>
>>>>>
>>>>> Here we are running NBU 5.1mp4 on a Win2K3 master - 2 cpus, 4 GB RAM.
> (I
>>>>> inherited the system - not my choice.) We run about 5000 jobs per day,
>> we
>>>>> have a 280 GB catalog on EMC Clariion. The system will stay stable for
> 2
>>>>> weeks pretty easily. 4 weeks starts pushing things. So we usually
> reboot
>>>>>
>>>>>
>>>>> our
>>>>>
>>>>>
>>>>> Windows master and media servers every 2 weeks.
>>>>>
>>>>>
>>>>>
>>>>> It seems like you will have cumulative problems with NetBackup that
can
>>>>> build up over time. It is way more pronounced on busy systems. We have
>>>>> another NetBackup system that has 1 Master and 1 Media server. It runs
>>>>>
>>>>>
>>>>> about
>>>>>
>>>>>
>>>>> 40 jobs per day max. I hardly ever have to reboot those servers.
>>>>>
>>>>>
>>>>>
>>>>>      Bryan
>>>>>
>>>>>
>>>>>
>>>>> Bryan Bahnmiller
>>>>>
>>>>> ISD Business Continuity
>>>>>
>>>>> Pier 1 Imports, Inc
>>>>>
>>>>> 817-252-8570
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _____
>>>>>
>>>>>
>>>>> From: veritas-bu-bounces at mailman.eng.auburn.edu
>>>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Hampus
>>>>>
>>>>>
>>>>> Lind
>>>>>
>>>>>
>>>>> Sent: Wednesday, February 14, 2007 12:17 PM
>>>>> To: Veritas-bu at mailman.eng.auburn.edu
>>>>> Subject: Re: [Veritas-bu] Serious master issue...
>>>>> Importance: High
>>>>>
>>>>> All,
>>>>>
>>>>>
>>>>>
>>>>> Now I have been transferred to USA support
 God bless America!
>>>>>
>>>>>
>>>>>
>>>>> They have told me that they haven?t seen such a big installation in
> over
>>> a
>>>>> year
. Strange, I have about 200 clients and backup a couple a TB per
>>>>>
>>>>>
>>>>> day..
>>>>>
>>>>>
>>>>> I was under the impression that this was kinda small installation..??
>>>>>
>>>>>
>>>>>
>>>>> However, they have told me that this is perfectly normal behaviour
with
>>>>> netbackup. That it produces heavy disk IO and eat all CPU power. And I
>>> was
>>>>> really stupid and told them that I also had an case with HP earlier on
>>>>>
>>>>>
>>>>> this
>>>>>
>>>>>
>>>>> disk IO problem, so now Symantec support are pointing all there
fingers
>>> at
>>>>> HP and our disk setup.
>>>>>
>>>>>
>>>>>
>>>>> Our DB is about 60-65 GB and resides on a StorageTek Flexline 380 disk
>>>>>
>>>>>
>>>>> array
>>>>>
>>>>>
>>>>> (SAN). We run a RAID 5 on 146GB FC drives.. I don?t really see the
>>>>> bottleneck there, but I will create a RAID 5 on 73GB 15K FC drives
just
>>> to
>>>>> shut netbackup support up

>>>>>
>>>>>
>>>>>
>>>>> We run a two CPU HP rp2470  with HP-UX 11.11 as a master server.
>>> Shouldn?t
>>>>> this be enough for this installation?
>>>>>
>>>>>
>>>>>
>>>>> Ooh well

>>>>>
>>>>>
>>>>>
>>>>> If support cant help me, what should I do?? I am desperate!!!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hampus Lind
>>>>> Rikspolisstyrelsen
>>>>> National Police Board
>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>> E-mail: hampus.lind at rps.police.se
>>>>>
>>>>> -----Ursprungligt meddelande-----
>>>>> Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
>>>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Hampus Lind
>>>>> Skickat: den 14 februari 2007 12:48
>>>>> Till: Veritas-bu at mailman.eng.auburn.edu
>>>>> ?mne: [Veritas-bu] Serious master issue...
>>>>> Prioritet: H?g
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> We have a serious issue here with our master server. The problem
>> occurred
>>>>>
>>>>>
>>>>> a
>>>>>
>>>>>
>>>>> couple of weeks ago, or at least I found out about it then..
>>>>>
>>>>>
>>>>>
>>>>> I was looking at IO`s and scsi queue depth on my master (hp-ux 11.11)
>>> when
>>>>>
>>>>>
>>>>> a
>>>>>
>>>>>
>>>>> say that we had 4000-6000 SCSI commands in que, and a disk utilisation
>> of
>>>>> 100% for the /usr/openv/netbackup/db disk.
>>>>>
>>>>>
>>>>>
>>>>> I have patched hpux to the latest patch bundle and we run NBU 5.1 MP4.
>>>>>
>>>>>
>>>>>
>>>>> HP support sad that bpdbm was leaking memory.
>>>>>
>>>>>
>>>>>
>>>>> Veritas support still investigating.. But we have about 30 bpdbm and
>> bprd
>>>>> processes active on our master which eats both my CPU`s and produces
>> tons
>>>>>
>>>>>
>>>>> of
>>>>>
>>>>>
>>>>> IO against our db disk.
>>>>>
>>>>>
>>>>>
>>>>> I actived verbose = 5 on the master, and after 15 minutes the bpdbm
log
>>>>>
>>>>>
>>>>> had
>>>>>
>>>>>
>>>>> reached the file size limit on our filsystem, 2 GB

>>>>>
>>>>>
>>>>>
>>>>> Any one had similar problems?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks and regards,
>>>>>
>>>>>
>>>>>
>>>>> Hampus Lind
>>>>> Rikspolisstyrelsen
>>>>> National Police Board
>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>> E-mail:   <mailto:hampus.lind at rps.police.se>
>>>>> <mailto:hampus.lind at rps.police.se> hampus.lind at rps.police.se
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
>>>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ===================================
>>>>>
>>>>>   Steven L. Sesar
>>>>>   Lead Operating Systems Programmer/Analyst
>>>>>   UNIX Application Services R101
>>>>>   The MITRE Corporation
>>>>>   202 Burlington Road - MS K101
>>>>>   Bedford, MA 01730
>>>>>   tel: (781) 271-7702
>>>>>   fax: (781) 271-2600
>>>>>   mobile: (617) 519-8933
>>>>>   email: ssesar at mitre.org
>>>>>
>>>>> ===================================
>>>>>
>>>>
>>>
>>
>