Veritas-bu

[Veritas-bu] Serious master issue...

2007-02-14 17:17:54
Subject: [Veritas-bu] Serious master issue...
From: jpiszcz at lucidpixels.com (Justin Piszcz)
Date: Wed, 14 Feb 2007 17:17:54 -0500 (EST)
Even more strange, with 6.0 I may expect something like this but 5.1 MP4 
for us has been rock solid.. HMm...

On Wed, 14 Feb 2007, Hampus Lind wrote:

> 5.1 MP4
>
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
>
> -----Ursprungligt meddelande-----
> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
> Skickat: den 14 februari 2007 23:11
> Till: Hampus Lind
> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
> Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: SV: SV: SV: SV: SV: [Veritas-bu] Serious master issue...
>
> Also are you using 5.x or 6.0?
>
> On Wed, 14 Feb 2007, Hampus Lind wrote:
>
>> I will try that tomorrow. But I don?t think the problem reside there..
>>
>> Iostat and sar don?t show any strange values.. sar -d 1 10 report under
> 50%
>> average usage.
>>
>> But, still I will try with the fastest FC array/disk we have...
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>> Skickat: den 14 februari 2007 23:05
>> Till: Hampus Lind
>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>> Veritas-bu at mailman.eng.auburn.edu
>> ?mne: Re: SV: SV: SV: SV: [Veritas-bu] Serious master issue...
>>
>> Is it possible for you to move the db/images volume to another set of
>> disks/raid array?
>>
>> then ln -s /other/location/db/images /usr/openv/netbackup/db/images
>>
>> That would rule out your array/FC.
>>
>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>
>>> Because of the heavy IO produced by all my bpdbm processes there are now
>> way
>>> that i can find anything in those logs...
>>>
>>> But support has got the all and says everything seems normal.. So what
> can
>> I
>>> do.. ? I am helpless...
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail: hampus.lind at rps.police.se
>>>
>>>
>>> -----Ursprungligt meddelande-----
>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>> Skickat: den 14 februari 2007 23:01
>>> Till: Hampus Lind
>>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>>> Veritas-bu at mailman.eng.auburn.edu
>>> ?mne: Re: SV: SV: SV: [Veritas-bu] Serious master issue...
>>>
>>> With VERBOSE = 5
>>>
>>> cd /usr/openv/netbackup/logs
>>> tail -f */*date_of_today*
>>>
>>> Do you see anything weird relating to memory or corruption?
>>>
>>>
>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>
>>>> I can't tell.... I think it has been there for a while and got worse
> with
>>>> time..
>>>>
>>>>
>>>>
>>>> Hampus Lind
>>>> Rikspolisstyrelsen
>>>> National Police Board
>>>> Tel dir: +46 (0)8 - 401 99 43
>>>> Tel mob: +46 (0)70 - 217 92 66
>>>> E-mail: hampus.lind at rps.police.se
>>>>
>>>>
>>>> -----Ursprungligt meddelande-----
>>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>>> Skickat: den 14 februari 2007 22:58
>>>> Till: Hampus Lind
>>>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>>>> Veritas-bu at mailman.eng.auburn.edu
>>>> ?mne: Re: SV: SV: [Veritas-bu] Serious master issue...
>>>>
>>>> When did this problem happen? Out of the blue or after a patch?
>>>>
>>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>>
>>>>> I have run a couple of tests... And it seems that if a want any info at
>>>> all
>>>>> from bpdbm -consistensy 2 I have to shutdown netbackup and then run the
>>>>> check when everything is down.
>>>>>
>>>>> Even then it takes forever.. Sometime it gets further then other...
>>>>>
>>>>>
>>>>> Hampus Lind
>>>>> Rikspolisstyrelsen
>>>>> National Police Board
>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>> E-mail: hampus.lind at rps.police.se
>>>>>
>>>>>
>>>>> -----Ursprungligt meddelande-----
>>>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>>>> Skickat: den 14 februari 2007 22:47
>>>>> Till: Hampus Lind
>>>>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>>>>> Veritas-bu at mailman.eng.auburn.edu
>>>>> ?mne: Re: SV: [Veritas-bu] Serious master issue...
>>>>>
>>>>> Another option is turn off backups, move the old images out of the way
>>> one
>>>>> by one and find what is causing the consistency to choke, does it stop
>> on
>>>>> one set of images or does it run through them all but just very slowly?
>>>>>
>>>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>>>
>>>>>> The NBCC doesn?t look at the image db, and they keep saying we have a
>>>>>> problem there.. But I don?t know how we can fix it or even collect the
>>>>> info
>>>>>> from the db when bpdbm ?consistensy 2 wont runt..
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hampus Lind
>>>>>> Rikspolisstyrelsen
>>>>>> National Police Board
>>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>>> E-mail: hampus.lind at rps.police.se
>>>>>>
>>>>>> -----Ursprungligt meddelande-----
>>>>>> Fr?n: Steven L. Sesar [mailto:ssesar at mitre.org]
>>>>>> Skickat: den 14 februari 2007 20:53
>>>>>> Till: Hampus Lind
>>>>>> Kopia: 'Justin Piszcz'; 'Bahnmiller, Bryan';
>>>>>> Veritas-bu at mailman.eng.auburn.edu
>>>>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>>>>
>>>>>>
>>>>>>
>>>>>> bpdbm -consistency 2 is useless to you, based on the amount of data
>> that
>>>>> you
>>>>>> back up nightly and my own presumption of how long backups run in your
>>>>>> environment. It will take longer to run than your backup domain will
>>>>> remain
>>>>>> idle. If I recall, they have a process which does a better job at
>>> finding
>>>>>> catalog/db corruption/inconsistency. I think that it's called NBCC.
>>>>>>
>>>>>> The problem with NBCC is similar, though. You send them the output of
>>>>> three
>>>>>> commands:
>>>>>>
>>>>>> vmquery -a, bpmedialist -ls, and bpimmedia
>>>>>>
>>>>>> Then, they munge the output of the above commands through a reporting
>>>> tool
>>>>>> that Symantec will NOT share with end users. At some point later in
> the
>>>>> day
>>>>>> (hopefully, sooner rather than later), they will send you a report.
> You
>>>>> must
>>>>>> then take certain actions to correct any discrepancies found. The
>> backup
>>>>>> system must be completely idle during this time. Restores are ok, but
>> no
>>>>>> backup activity can be taking place.
>>>>>>
>>>>>> Afterwards, you 'll run those commands again, they'll generate the
>>> report
>>>>>> again, and you'll see how you're doing. It may take you several passes
>>> to
>>>>>> get things squared away.
>>>>>>
>>>>>> The problem is that most of us don't have a completely idle backup
>>>>>> infrastructure - at least for long enough for this process to
> complete.
>>> I
>>>>>> didn't when I was NBU customer. Once you take backups, the reports
>>> become
>>>>>> obsolete, as do the results of bpdbm -consistency 2.
>>>>>>
>>>>>> It would not surprise me if bpdbm was leaking memory on your platform.
>>>>>>
>>>>>> --Steve
>>>>>>
>>>>>>
>>>>>> Hampus Lind wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I cant don anything....
>>>>>>
>>>>>> Bpdbm -consistecny 2 has been running for over 12 hours and havent
>>>> checked
>>>>>> more than 4-5 clients.
>>>>>>
>>>>>> It was the first thing support told me. Your db is corrupted... So I
>>>> tried
>>>>>> to run bpdbm -consistency 2 check. The check found some issues, like
>>>>> expired
>>>>>> images which where not removed etc. But when I was about to remove
> them
>>>>>> manually the netbackup db clean process already had took care of
> them..
>>>>>>
>>>>>> So what I understand you can have some level of corruption in your db
>>>>> which
>>>>>> nbu cleans out when the clean job runs.
>>>>>>
>>>>>> I am not compressing my catalogs.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Hampus Lind
>>>>>> Rikspolisstyrelsen
>>>>>> National Police Board
>>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>>> E-mail: hampus.lind at rps.police.se
>>>>>>
>>>>>>
>>>>>> -----Ursprungligt meddelande-----
>>>>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>>>>> Skickat: den 14 februari 2007 20:31
>>>>>> Till: Hampus Lind
>>>>>> Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
>>>>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>>>>
>>>>>> Have you run the check_db_consistency? There is a command that checks
>> to
>>>>>> make sure your images are not corrupted!
>>>>>>
>>>>>> I would recommend checking that.
>>>>>>
>>>>>> Also, are you running compression on your catalogs?
>>>>>>
>>>>>>
>>>>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks Bryan,
>>>>>>
>>>>>>
>>>>>>
>>>>>> It happens directly after reboot..
>>>>>>
>>>>>>
>>>>>>
>>>>>> The thing is:
>>>>>>
>>>>>> -          I have deactivated all polices
>>>>>>
>>>>>> -          Stop our media server
>>>>>>
>>>>>> -          And then restarted netbackup on the master.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So there are absolutely no action going on (no backup, no user backup,
>>> no
>>>>>> restore, no staging) only internal netbackup work
.
>>>>>>
>>>>>> At once when netbackup on the master gets active, it starts bpdbm
>>> process
>>>>>> after bpdbm process. It consume 100% of both my CPU`s and write/read
>>>>>>
>>>>>>
>>>>>> heavily
>>>>>>
>>>>>>
>>>>>> to the /usr/openv/netbackup/db filesystem.
>>>>>>
>>>>>> When I have no action at all after a clean start, we have about 42
>> bpdbm
>>>>>> processes and nearly as many bprd processes

>>>>>>
>>>>>>
>>>>>>
>>>>>> I cant figure this one out, and support points to disk config or
>>>> something
>>>>>> else that sounds good in there ears

>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for all help,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hampus Lind
>>>>>> Rikspolisstyrelsen
>>>>>> National Police Board
>>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>>> E-mail: hampus.lind at rps.police.se
>>>>>>
>>>>>> -----Ursprungligt meddelande-----
>>>>>> Fr?n: Bahnmiller, Bryan [mailto:BBahnmiller at pier1.com]
>>>>>> Skickat: den 14 februari 2007 20:04
>>>>>> Till: Hampus Lind
>>>>>> ?mne: RE: [Veritas-bu] Serious master issue...
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hampus,
>>>>>>
>>>>>>
>>>>>>
>>>>>> How quickly does this behaviour start happening after a
> recycle/reboot?
>>> I
>>>>>> worked with an N4000 master running 11i. We did have 8 cpus and 8 GB
>>> RAM.
>>>>>>
>>>>>>
>>>>>> We
>>>>>>
>>>>>>
>>>>>> were running over 15,000 backup jobs daily though. Our catalog was
> over
>>>>>> 400GB. (Catalog was on EMC DMX disk.) Running good old 3.4 we would
>> have
>>>>>>
>>>>>>
>>>>>> to
>>>>>>
>>>>>>
>>>>>> reboot the system almost every week. If you can cleanly re-cycle
>>>> NetBackup
>>>>>>
>>>>>>
>>>>>> -
>>>>>>
>>>>>>
>>>>>> shut it down, kill all NBU processes, and then restart it, that should
>>> be
>>>>>> almost as good.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here we are running NBU 5.1mp4 on a Win2K3 master - 2 cpus, 4 GB RAM.
>> (I
>>>>>> inherited the system - not my choice.) We run about 5000 jobs per day,
>>> we
>>>>>> have a 280 GB catalog on EMC Clariion. The system will stay stable for
>> 2
>>>>>> weeks pretty easily. 4 weeks starts pushing things. So we usually
>> reboot
>>>>>>
>>>>>>
>>>>>> our
>>>>>>
>>>>>>
>>>>>> Windows master and media servers every 2 weeks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It seems like you will have cumulative problems with NetBackup that
> can
>>>>>> build up over time. It is way more pronounced on busy systems. We have
>>>>>> another NetBackup system that has 1 Master and 1 Media server. It runs
>>>>>>
>>>>>>
>>>>>> about
>>>>>>
>>>>>>
>>>>>> 40 jobs per day max. I hardly ever have to reboot those servers.
>>>>>>
>>>>>>
>>>>>>
>>>>>>      Bryan
>>>>>>
>>>>>>
>>>>>>
>>>>>> Bryan Bahnmiller
>>>>>>
>>>>>> ISD Business Continuity
>>>>>>
>>>>>> Pier 1 Imports, Inc
>>>>>>
>>>>>> 817-252-8570
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _____
>>>>>>
>>>>>>
>>>>>> From: veritas-bu-bounces at mailman.eng.auburn.edu
>>>>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Hampus
>>>>>>
>>>>>>
>>>>>> Lind
>>>>>>
>>>>>>
>>>>>> Sent: Wednesday, February 14, 2007 12:17 PM
>>>>>> To: Veritas-bu at mailman.eng.auburn.edu
>>>>>> Subject: Re: [Veritas-bu] Serious master issue...
>>>>>> Importance: High
>>>>>>
>>>>>> All,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Now I have been transferred to USA support
 God bless America!
>>>>>>
>>>>>>
>>>>>>
>>>>>> They have told me that they haven?t seen such a big installation in
>> over
>>>> a
>>>>>> year
. Strange, I have about 200 clients and backup a couple a TB per
>>>>>>
>>>>>>
>>>>>> day..
>>>>>>
>>>>>>
>>>>>> I was under the impression that this was kinda small installation..??
>>>>>>
>>>>>>
>>>>>>
>>>>>> However, they have told me that this is perfectly normal behaviour
> with
>>>>>> netbackup. That it produces heavy disk IO and eat all CPU power. And I
>>>> was
>>>>>> really stupid and told them that I also had an case with HP earlier on
>>>>>>
>>>>>>
>>>>>> this
>>>>>>
>>>>>>
>>>>>> disk IO problem, so now Symantec support are pointing all there
> fingers
>>>> at
>>>>>> HP and our disk setup.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Our DB is about 60-65 GB and resides on a StorageTek Flexline 380 disk
>>>>>>
>>>>>>
>>>>>> array
>>>>>>
>>>>>>
>>>>>> (SAN). We run a RAID 5 on 146GB FC drives.. I don?t really see the
>>>>>> bottleneck there, but I will create a RAID 5 on 73GB 15K FC drives
> just
>>>> to
>>>>>> shut netbackup support up

>>>>>>
>>>>>>
>>>>>>
>>>>>> We run a two CPU HP rp2470  with HP-UX 11.11 as a master server.
>>>> Shouldn?t
>>>>>> this be enough for this installation?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ooh well

>>>>>>
>>>>>>
>>>>>>
>>>>>> If support cant help me, what should I do?? I am desperate!!!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hampus Lind
>>>>>> Rikspolisstyrelsen
>>>>>> National Police Board
>>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>>> E-mail: hampus.lind at rps.police.se
>>>>>>
>>>>>> -----Ursprungligt meddelande-----
>>>>>> Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
>>>>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Hampus Lind
>>>>>> Skickat: den 14 februari 2007 12:48
>>>>>> Till: Veritas-bu at mailman.eng.auburn.edu
>>>>>> ?mne: [Veritas-bu] Serious master issue...
>>>>>> Prioritet: H?g
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We have a serious issue here with our master server. The problem
>>> occurred
>>>>>>
>>>>>>
>>>>>> a
>>>>>>
>>>>>>
>>>>>> couple of weeks ago, or at least I found out about it then..
>>>>>>
>>>>>>
>>>>>>
>>>>>> I was looking at IO`s and scsi queue depth on my master (hp-ux 11.11)
>>>> when
>>>>>>
>>>>>>
>>>>>> a
>>>>>>
>>>>>>
>>>>>> say that we had 4000-6000 SCSI commands in que, and a disk utilisation
>>> of
>>>>>> 100% for the /usr/openv/netbackup/db disk.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have patched hpux to the latest patch bundle and we run NBU 5.1 MP4.
>>>>>>
>>>>>>
>>>>>>
>>>>>> HP support sad that bpdbm was leaking memory.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Veritas support still investigating.. But we have about 30 bpdbm and
>>> bprd
>>>>>> processes active on our master which eats both my CPU`s and produces
>>> tons
>>>>>>
>>>>>>
>>>>>> of
>>>>>>
>>>>>>
>>>>>> IO against our db disk.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I actived verbose = 5 on the master, and after 15 minutes the bpdbm
> log
>>>>>>
>>>>>>
>>>>>> had
>>>>>>
>>>>>>
>>>>>> reached the file size limit on our filsystem, 2 GB

>>>>>>
>>>>>>
>>>>>>
>>>>>> Any one had similar problems?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks and regards,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hampus Lind
>>>>>> Rikspolisstyrelsen
>>>>>> National Police Board
>>>>>> Tel dir: +46 (0)8 - 401 99 43
>>>>>> Tel mob: +46 (0)70 - 217 92 66
>>>>>> E-mail:   <mailto:hampus.lind at rps.police.se>
>>>>>> <mailto:hampus.lind at rps.police.se> hampus.lind at rps.police.se
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
>>>>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ===================================
>>>>>>
>>>>>>   Steven L. Sesar
>>>>>>   Lead Operating Systems Programmer/Analyst
>>>>>>   UNIX Application Services R101
>>>>>>   The MITRE Corporation
>>>>>>   202 Burlington Road - MS K101
>>>>>>   Bedford, MA 01730
>>>>>>   tel: (781) 271-7702
>>>>>>   fax: (781) 271-2600
>>>>>>   mobile: (617) 519-8933
>>>>>>   email: ssesar at mitre.org
>>>>>>
>>>>>> ===================================
>>>>>>
>>>>>
>>>>
>>>
>>
>