Veritas-bu

[Veritas-bu] Serious master issue...

2007-02-14 17:08:19
Subject: [Veritas-bu] Serious master issue...
From: hampus.lind at rps.police.se (Hampus Lind)
Date: Wed, 14 Feb 2007 23:08:19 +0100
I will try that tomorrow. But I don?t think the problem reside there..

Iostat and sar don?t show any strange values.. sar -d 1 10 report under 50%
average usage.

But, still I will try with the fastest FC array/disk we have... 


Hampus Lind
Rikspolisstyrelsen
National Police Board
Tel dir: +46 (0)8 - 401 99 43
Tel mob: +46 (0)70 - 217 92 66
E-mail: hampus.lind at rps.police.se


-----Ursprungligt meddelande-----
Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com] 
Skickat: den 14 februari 2007 23:05
Till: Hampus Lind
Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
Veritas-bu at mailman.eng.auburn.edu
?mne: Re: SV: SV: SV: SV: [Veritas-bu] Serious master issue...

Is it possible for you to move the db/images volume to another set of 
disks/raid array?

then ln -s /other/location/db/images /usr/openv/netbackup/db/images

That would rule out your array/FC.

On Wed, 14 Feb 2007, Hampus Lind wrote:

> Because of the heavy IO produced by all my bpdbm processes there are now
way
> that i can find anything in those logs...
>
> But support has got the all and says everything seems normal.. So what can
I
> do.. ? I am helpless...
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
>
> -----Ursprungligt meddelande-----
> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
> Skickat: den 14 februari 2007 23:01
> Till: Hampus Lind
> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
> Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: SV: SV: SV: [Veritas-bu] Serious master issue...
>
> With VERBOSE = 5
>
> cd /usr/openv/netbackup/logs
> tail -f */*date_of_today*
>
> Do you see anything weird relating to memory or corruption?
>
>
> On Wed, 14 Feb 2007, Hampus Lind wrote:
>
>> I can't tell.... I think it has been there for a while and got worse with
>> time..
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>> Skickat: den 14 februari 2007 22:58
>> Till: Hampus Lind
>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>> Veritas-bu at mailman.eng.auburn.edu
>> ?mne: Re: SV: SV: [Veritas-bu] Serious master issue...
>>
>> When did this problem happen? Out of the blue or after a patch?
>>
>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>
>>> I have run a couple of tests... And it seems that if a want any info at
>> all
>>> from bpdbm -consistensy 2 I have to shutdown netbackup and then run the
>>> check when everything is down.
>>>
>>> Even then it takes forever.. Sometime it gets further then other...
>>>
>>>
>>> Hampus Lind
>>> Rikspolisstyrelsen
>>> National Police Board
>>> Tel dir: +46 (0)8 - 401 99 43
>>> Tel mob: +46 (0)70 - 217 92 66
>>> E-mail: hampus.lind at rps.police.se
>>>
>>>
>>> -----Ursprungligt meddelande-----
>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>> Skickat: den 14 februari 2007 22:47
>>> Till: Hampus Lind
>>> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
>>> Veritas-bu at mailman.eng.auburn.edu
>>> ?mne: Re: SV: [Veritas-bu] Serious master issue...
>>>
>>> Another option is turn off backups, move the old images out of the way
> one
>>> by one and find what is causing the consistency to choke, does it stop
on
>>> one set of images or does it run through them all but just very slowly?
>>>
>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>
>>>> The NBCC doesn?t look at the image db, and they keep saying we have a
>>>> problem there.. But I don?t know how we can fix it or even collect the
>>> info
>>>> from the db when bpdbm ?consistensy 2 wont runt..
>>>>
>>>>
>>>>
>>>> Hampus Lind
>>>> Rikspolisstyrelsen
>>>> National Police Board
>>>> Tel dir: +46 (0)8 - 401 99 43
>>>> Tel mob: +46 (0)70 - 217 92 66
>>>> E-mail: hampus.lind at rps.police.se
>>>>
>>>> -----Ursprungligt meddelande-----
>>>> Fr?n: Steven L. Sesar [mailto:ssesar at mitre.org]
>>>> Skickat: den 14 februari 2007 20:53
>>>> Till: Hampus Lind
>>>> Kopia: 'Justin Piszcz'; 'Bahnmiller, Bryan';
>>>> Veritas-bu at mailman.eng.auburn.edu
>>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>>
>>>>
>>>>
>>>> bpdbm -consistency 2 is useless to you, based on the amount of data
that
>>> you
>>>> back up nightly and my own presumption of how long backups run in your
>>>> environment. It will take longer to run than your backup domain will
>>> remain
>>>> idle. If I recall, they have a process which does a better job at
> finding
>>>> catalog/db corruption/inconsistency. I think that it's called NBCC.
>>>>
>>>> The problem with NBCC is similar, though. You send them the output of
>>> three
>>>> commands:
>>>>
>>>> vmquery -a, bpmedialist -ls, and bpimmedia
>>>>
>>>> Then, they munge the output of the above commands through a reporting
>> tool
>>>> that Symantec will NOT share with end users. At some point later in the
>>> day
>>>> (hopefully, sooner rather than later), they will send you a report. You
>>> must
>>>> then take certain actions to correct any discrepancies found. The
backup
>>>> system must be completely idle during this time. Restores are ok, but
no
>>>> backup activity can be taking place.
>>>>
>>>> Afterwards, you 'll run those commands again, they'll generate the
> report
>>>> again, and you'll see how you're doing. It may take you several passes
> to
>>>> get things squared away.
>>>>
>>>> The problem is that most of us don't have a completely idle backup
>>>> infrastructure - at least for long enough for this process to complete.
> I
>>>> didn't when I was NBU customer. Once you take backups, the reports
> become
>>>> obsolete, as do the results of bpdbm -consistency 2.
>>>>
>>>> It would not surprise me if bpdbm was leaking memory on your platform.
>>>>
>>>> --Steve
>>>>
>>>>
>>>> Hampus Lind wrote:
>>>>
>>>> Hi,
>>>>
>>>> I cant don anything....
>>>>
>>>> Bpdbm -consistecny 2 has been running for over 12 hours and havent
>> checked
>>>> more than 4-5 clients.
>>>>
>>>> It was the first thing support told me. Your db is corrupted... So I
>> tried
>>>> to run bpdbm -consistency 2 check. The check found some issues, like
>>> expired
>>>> images which where not removed etc. But when I was about to remove them
>>>> manually the netbackup db clean process already had took care of them..
>>>>
>>>> So what I understand you can have some level of corruption in your db
>>> which
>>>> nbu cleans out when the clean job runs.
>>>>
>>>> I am not compressing my catalogs.
>>>>
>>>> Thanks,
>>>>
>>>> Hampus Lind
>>>> Rikspolisstyrelsen
>>>> National Police Board
>>>> Tel dir: +46 (0)8 - 401 99 43
>>>> Tel mob: +46 (0)70 - 217 92 66
>>>> E-mail: hampus.lind at rps.police.se
>>>>
>>>>
>>>> -----Ursprungligt meddelande-----
>>>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>>>> Skickat: den 14 februari 2007 20:31
>>>> Till: Hampus Lind
>>>> Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
>>>> ?mne: Re: [Veritas-bu] Serious master issue...
>>>>
>>>> Have you run the check_db_consistency? There is a command that checks
to
>>>> make sure your images are not corrupted!
>>>>
>>>> I would recommend checking that.
>>>>
>>>> Also, are you running compression on your catalogs?
>>>>
>>>>
>>>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>>>
>>>>
>>>>
>>>> Thanks Bryan,
>>>>
>>>>
>>>>
>>>> It happens directly after reboot..
>>>>
>>>>
>>>>
>>>> The thing is:
>>>>
>>>> -          I have deactivated all polices
>>>>
>>>> -          Stop our media server
>>>>
>>>> -          And then restarted netbackup on the master.
>>>>
>>>>
>>>>
>>>> So there are absolutely no action going on (no backup, no user backup,
> no
>>>> restore, no staging) only internal netbackup work
.
>>>>
>>>> At once when netbackup on the master gets active, it starts bpdbm
> process
>>>> after bpdbm process. It consume 100% of both my CPU`s and write/read
>>>>
>>>>
>>>> heavily
>>>>
>>>>
>>>> to the /usr/openv/netbackup/db filesystem.
>>>>
>>>> When I have no action at all after a clean start, we have about 42
bpdbm
>>>> processes and nearly as many bprd processes

>>>>
>>>>
>>>>
>>>> I cant figure this one out, and support points to disk config or
>> something
>>>> else that sounds good in there ears

>>>>
>>>>
>>>>
>>>> Thanks for all help,
>>>>
>>>>
>>>>
>>>> Hampus Lind
>>>> Rikspolisstyrelsen
>>>> National Police Board
>>>> Tel dir: +46 (0)8 - 401 99 43
>>>> Tel mob: +46 (0)70 - 217 92 66
>>>> E-mail: hampus.lind at rps.police.se
>>>>
>>>> -----Ursprungligt meddelande-----
>>>> Fr?n: Bahnmiller, Bryan [mailto:BBahnmiller at pier1.com]
>>>> Skickat: den 14 februari 2007 20:04
>>>> Till: Hampus Lind
>>>> ?mne: RE: [Veritas-bu] Serious master issue...
>>>>
>>>>
>>>>
>>>> Hampus,
>>>>
>>>>
>>>>
>>>> How quickly does this behaviour start happening after a recycle/reboot?
> I
>>>> worked with an N4000 master running 11i. We did have 8 cpus and 8 GB
> RAM.
>>>>
>>>>
>>>> We
>>>>
>>>>
>>>> were running over 15,000 backup jobs daily though. Our catalog was over
>>>> 400GB. (Catalog was on EMC DMX disk.) Running good old 3.4 we would
have
>>>>
>>>>
>>>> to
>>>>
>>>>
>>>> reboot the system almost every week. If you can cleanly re-cycle
>> NetBackup
>>>>
>>>>
>>>> -
>>>>
>>>>
>>>> shut it down, kill all NBU processes, and then restart it, that should
> be
>>>> almost as good.
>>>>
>>>>
>>>>
>>>> Here we are running NBU 5.1mp4 on a Win2K3 master - 2 cpus, 4 GB RAM.
(I
>>>> inherited the system - not my choice.) We run about 5000 jobs per day,
> we
>>>> have a 280 GB catalog on EMC Clariion. The system will stay stable for
2
>>>> weeks pretty easily. 4 weeks starts pushing things. So we usually
reboot
>>>>
>>>>
>>>> our
>>>>
>>>>
>>>> Windows master and media servers every 2 weeks.
>>>>
>>>>
>>>>
>>>> It seems like you will have cumulative problems with NetBackup that can
>>>> build up over time. It is way more pronounced on busy systems. We have
>>>> another NetBackup system that has 1 Master and 1 Media server. It runs
>>>>
>>>>
>>>> about
>>>>
>>>>
>>>> 40 jobs per day max. I hardly ever have to reboot those servers.
>>>>
>>>>
>>>>
>>>>      Bryan
>>>>
>>>>
>>>>
>>>> Bryan Bahnmiller
>>>>
>>>> ISD Business Continuity
>>>>
>>>> Pier 1 Imports, Inc
>>>>
>>>> 817-252-8570
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _____
>>>>
>>>>
>>>> From: veritas-bu-bounces at mailman.eng.auburn.edu
>>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Hampus
>>>>
>>>>
>>>> Lind
>>>>
>>>>
>>>> Sent: Wednesday, February 14, 2007 12:17 PM
>>>> To: Veritas-bu at mailman.eng.auburn.edu
>>>> Subject: Re: [Veritas-bu] Serious master issue...
>>>> Importance: High
>>>>
>>>> All,
>>>>
>>>>
>>>>
>>>> Now I have been transferred to USA support
 God bless America!
>>>>
>>>>
>>>>
>>>> They have told me that they haven?t seen such a big installation in
over
>> a
>>>> year
. Strange, I have about 200 clients and backup a couple a TB per
>>>>
>>>>
>>>> day..
>>>>
>>>>
>>>> I was under the impression that this was kinda small installation..??
>>>>
>>>>
>>>>
>>>> However, they have told me that this is perfectly normal behaviour with
>>>> netbackup. That it produces heavy disk IO and eat all CPU power. And I
>> was
>>>> really stupid and told them that I also had an case with HP earlier on
>>>>
>>>>
>>>> this
>>>>
>>>>
>>>> disk IO problem, so now Symantec support are pointing all there fingers
>> at
>>>> HP and our disk setup.
>>>>
>>>>
>>>>
>>>> Our DB is about 60-65 GB and resides on a StorageTek Flexline 380 disk
>>>>
>>>>
>>>> array
>>>>
>>>>
>>>> (SAN). We run a RAID 5 on 146GB FC drives.. I don?t really see the
>>>> bottleneck there, but I will create a RAID 5 on 73GB 15K FC drives just
>> to
>>>> shut netbackup support up

>>>>
>>>>
>>>>
>>>> We run a two CPU HP rp2470  with HP-UX 11.11 as a master server.
>> Shouldn?t
>>>> this be enough for this installation?
>>>>
>>>>
>>>>
>>>> Ooh well

>>>>
>>>>
>>>>
>>>> If support cant help me, what should I do?? I am desperate!!!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hampus Lind
>>>> Rikspolisstyrelsen
>>>> National Police Board
>>>> Tel dir: +46 (0)8 - 401 99 43
>>>> Tel mob: +46 (0)70 - 217 92 66
>>>> E-mail: hampus.lind at rps.police.se
>>>>
>>>> -----Ursprungligt meddelande-----
>>>> Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
>>>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Hampus Lind
>>>> Skickat: den 14 februari 2007 12:48
>>>> Till: Veritas-bu at mailman.eng.auburn.edu
>>>> ?mne: [Veritas-bu] Serious master issue...
>>>> Prioritet: H?g
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> We have a serious issue here with our master server. The problem
> occurred
>>>>
>>>>
>>>> a
>>>>
>>>>
>>>> couple of weeks ago, or at least I found out about it then..
>>>>
>>>>
>>>>
>>>> I was looking at IO`s and scsi queue depth on my master (hp-ux 11.11)
>> when
>>>>
>>>>
>>>> a
>>>>
>>>>
>>>> say that we had 4000-6000 SCSI commands in que, and a disk utilisation
> of
>>>> 100% for the /usr/openv/netbackup/db disk.
>>>>
>>>>
>>>>
>>>> I have patched hpux to the latest patch bundle and we run NBU 5.1 MP4.
>>>>
>>>>
>>>>
>>>> HP support sad that bpdbm was leaking memory.
>>>>
>>>>
>>>>
>>>> Veritas support still investigating.. But we have about 30 bpdbm and
> bprd
>>>> processes active on our master which eats both my CPU`s and produces
> tons
>>>>
>>>>
>>>> of
>>>>
>>>>
>>>> IO against our db disk.
>>>>
>>>>
>>>>
>>>> I actived verbose = 5 on the master, and after 15 minutes the bpdbm log
>>>>
>>>>
>>>> had
>>>>
>>>>
>>>> reached the file size limit on our filsystem, 2 GB

>>>>
>>>>
>>>>
>>>> Any one had similar problems?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks and regards,
>>>>
>>>>
>>>>
>>>> Hampus Lind
>>>> Rikspolisstyrelsen
>>>> National Police Board
>>>> Tel dir: +46 (0)8 - 401 99 43
>>>> Tel mob: +46 (0)70 - 217 92 66
>>>> E-mail:   <mailto:hampus.lind at rps.police.se>
>>>> <mailto:hampus.lind at rps.police.se> hampus.lind at rps.police.se
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
>>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ===================================
>>>>
>>>>   Steven L. Sesar
>>>>   Lead Operating Systems Programmer/Analyst
>>>>   UNIX Application Services R101
>>>>   The MITRE Corporation
>>>>   202 Burlington Road - MS K101
>>>>   Bedford, MA 01730
>>>>   tel: (781) 271-7702
>>>>   fax: (781) 271-2600
>>>>   mobile: (617) 519-8933
>>>>   email: ssesar at mitre.org
>>>>
>>>> ===================================
>>>>
>>>
>>
>