Veritas-bu

[Veritas-bu] Serious master issue...

2007-02-14 16:58:09
Subject: [Veritas-bu] Serious master issue...
From: jpiszcz at lucidpixels.com (Justin Piszcz)
Date: Wed, 14 Feb 2007 16:58:09 -0500 (EST)
When did this problem happen? Out of the blue or after a patch?

On Wed, 14 Feb 2007, Hampus Lind wrote:

> I have run a couple of tests... And it seems that if a want any info at all
> from bpdbm -consistensy 2 I have to shutdown netbackup and then run the
> check when everything is down.
>
> Even then it takes forever.. Sometime it gets further then other...
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
>
> -----Ursprungligt meddelande-----
> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
> Skickat: den 14 februari 2007 22:47
> Till: Hampus Lind
> Kopia: 'Steven L. Sesar'; 'Bahnmiller, Bryan';
> Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: SV: [Veritas-bu] Serious master issue...
>
> Another option is turn off backups, move the old images out of the way one
> by one and find what is causing the consistency to choke, does it stop on
> one set of images or does it run through them all but just very slowly?
>
> On Wed, 14 Feb 2007, Hampus Lind wrote:
>
>> The NBCC doesn?t look at the image db, and they keep saying we have a
>> problem there.. But I don?t know how we can fix it or even collect the
> info
>> from the db when bpdbm ?consistensy 2 wont runt..
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Steven L. Sesar [mailto:ssesar at mitre.org]
>> Skickat: den 14 februari 2007 20:53
>> Till: Hampus Lind
>> Kopia: 'Justin Piszcz'; 'Bahnmiller, Bryan';
>> Veritas-bu at mailman.eng.auburn.edu
>> ?mne: Re: [Veritas-bu] Serious master issue...
>>
>>
>>
>> bpdbm -consistency 2 is useless to you, based on the amount of data that
> you
>> back up nightly and my own presumption of how long backups run in your
>> environment. It will take longer to run than your backup domain will
> remain
>> idle. If I recall, they have a process which does a better job at finding
>> catalog/db corruption/inconsistency. I think that it's called NBCC.
>>
>> The problem with NBCC is similar, though. You send them the output of
> three
>> commands:
>>
>> vmquery -a, bpmedialist -ls, and bpimmedia
>>
>> Then, they munge the output of the above commands through a reporting tool
>> that Symantec will NOT share with end users. At some point later in the
> day
>> (hopefully, sooner rather than later), they will send you a report. You
> must
>> then take certain actions to correct any discrepancies found. The backup
>> system must be completely idle during this time. Restores are ok, but no
>> backup activity can be taking place.
>>
>> Afterwards, you 'll run those commands again, they'll generate the report
>> again, and you'll see how you're doing. It may take you several passes to
>> get things squared away.
>>
>> The problem is that most of us don't have a completely idle backup
>> infrastructure - at least for long enough for this process to complete. I
>> didn't when I was NBU customer. Once you take backups, the reports become
>> obsolete, as do the results of bpdbm -consistency 2.
>>
>> It would not surprise me if bpdbm was leaking memory on your platform.
>>
>> --Steve
>>
>>
>> Hampus Lind wrote:
>>
>> Hi,
>>
>> I cant don anything....
>>
>> Bpdbm -consistecny 2 has been running for over 12 hours and havent checked
>> more than 4-5 clients.
>>
>> It was the first thing support told me. Your db is corrupted... So I tried
>> to run bpdbm -consistency 2 check. The check found some issues, like
> expired
>> images which where not removed etc. But when I was about to remove them
>> manually the netbackup db clean process already had took care of them..
>>
>> So what I understand you can have some level of corruption in your db
> which
>> nbu cleans out when the clean job runs.
>>
>> I am not compressing my catalogs.
>>
>> Thanks,
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
>> Skickat: den 14 februari 2007 20:31
>> Till: Hampus Lind
>> Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
>> ?mne: Re: [Veritas-bu] Serious master issue...
>>
>> Have you run the check_db_consistency? There is a command that checks to
>> make sure your images are not corrupted!
>>
>> I would recommend checking that.
>>
>> Also, are you running compression on your catalogs?
>>
>>
>> On Wed, 14 Feb 2007, Hampus Lind wrote:
>>
>>
>>
>> Thanks Bryan,
>>
>>
>>
>> It happens directly after reboot..
>>
>>
>>
>> The thing is:
>>
>> -          I have deactivated all polices
>>
>> -          Stop our media server
>>
>> -          And then restarted netbackup on the master.
>>
>>
>>
>> So there are absolutely no action going on (no backup, no user backup, no
>> restore, no staging) only internal netbackup work
.
>>
>> At once when netbackup on the master gets active, it starts bpdbm process
>> after bpdbm process. It consume 100% of both my CPU`s and write/read
>>
>>
>> heavily
>>
>>
>> to the /usr/openv/netbackup/db filesystem.
>>
>> When I have no action at all after a clean start, we have about 42 bpdbm
>> processes and nearly as many bprd processes

>>
>>
>>
>> I cant figure this one out, and support points to disk config or something
>> else that sounds good in there ears

>>
>>
>>
>> Thanks for all help,
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: Bahnmiller, Bryan [mailto:BBahnmiller at pier1.com]
>> Skickat: den 14 februari 2007 20:04
>> Till: Hampus Lind
>> ?mne: RE: [Veritas-bu] Serious master issue...
>>
>>
>>
>> Hampus,
>>
>>
>>
>> How quickly does this behaviour start happening after a recycle/reboot? I
>> worked with an N4000 master running 11i. We did have 8 cpus and 8 GB RAM.
>>
>>
>> We
>>
>>
>> were running over 15,000 backup jobs daily though. Our catalog was over
>> 400GB. (Catalog was on EMC DMX disk.) Running good old 3.4 we would have
>>
>>
>> to
>>
>>
>> reboot the system almost every week. If you can cleanly re-cycle NetBackup
>>
>>
>> -
>>
>>
>> shut it down, kill all NBU processes, and then restart it, that should be
>> almost as good.
>>
>>
>>
>> Here we are running NBU 5.1mp4 on a Win2K3 master - 2 cpus, 4 GB RAM. (I
>> inherited the system - not my choice.) We run about 5000 jobs per day, we
>> have a 280 GB catalog on EMC Clariion. The system will stay stable for 2
>> weeks pretty easily. 4 weeks starts pushing things. So we usually reboot
>>
>>
>> our
>>
>>
>> Windows master and media servers every 2 weeks.
>>
>>
>>
>> It seems like you will have cumulative problems with NetBackup that can
>> build up over time. It is way more pronounced on busy systems. We have
>> another NetBackup system that has 1 Master and 1 Media server. It runs
>>
>>
>> about
>>
>>
>> 40 jobs per day max. I hardly ever have to reboot those servers.
>>
>>
>>
>>      Bryan
>>
>>
>>
>> Bryan Bahnmiller
>>
>> ISD Business Continuity
>>
>> Pier 1 Imports, Inc
>>
>> 817-252-8570
>>
>>
>>
>>
>>
>>
>> _____
>>
>>
>> From: veritas-bu-bounces at mailman.eng.auburn.edu
>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Hampus
>>
>>
>> Lind
>>
>>
>> Sent: Wednesday, February 14, 2007 12:17 PM
>> To: Veritas-bu at mailman.eng.auburn.edu
>> Subject: Re: [Veritas-bu] Serious master issue...
>> Importance: High
>>
>> All,
>>
>>
>>
>> Now I have been transferred to USA support
 God bless America!
>>
>>
>>
>> They have told me that they haven?t seen such a big installation in over a
>> year
. Strange, I have about 200 clients and backup a couple a TB per
>>
>>
>> day..
>>
>>
>> I was under the impression that this was kinda small installation..??
>>
>>
>>
>> However, they have told me that this is perfectly normal behaviour with
>> netbackup. That it produces heavy disk IO and eat all CPU power. And I was
>> really stupid and told them that I also had an case with HP earlier on
>>
>>
>> this
>>
>>
>> disk IO problem, so now Symantec support are pointing all there fingers at
>> HP and our disk setup.
>>
>>
>>
>> Our DB is about 60-65 GB and resides on a StorageTek Flexline 380 disk
>>
>>
>> array
>>
>>
>> (SAN). We run a RAID 5 on 146GB FC drives.. I don?t really see the
>> bottleneck there, but I will create a RAID 5 on 73GB 15K FC drives just to
>> shut netbackup support up

>>
>>
>>
>> We run a two CPU HP rp2470  with HP-UX 11.11 as a master server. Shouldn?t
>> this be enough for this installation?
>>
>>
>>
>> Ooh well

>>
>>
>>
>> If support cant help me, what should I do?? I am desperate!!!
>>
>>
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail: hampus.lind at rps.police.se
>>
>> -----Ursprungligt meddelande-----
>> Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
>> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Hampus Lind
>> Skickat: den 14 februari 2007 12:48
>> Till: Veritas-bu at mailman.eng.auburn.edu
>> ?mne: [Veritas-bu] Serious master issue...
>> Prioritet: H?g
>>
>>
>>
>> Hi,
>>
>>
>>
>> We have a serious issue here with our master server. The problem occurred
>>
>>
>> a
>>
>>
>> couple of weeks ago, or at least I found out about it then..
>>
>>
>>
>> I was looking at IO`s and scsi queue depth on my master (hp-ux 11.11) when
>>
>>
>> a
>>
>>
>> say that we had 4000-6000 SCSI commands in que, and a disk utilisation of
>> 100% for the /usr/openv/netbackup/db disk.
>>
>>
>>
>> I have patched hpux to the latest patch bundle and we run NBU 5.1 MP4.
>>
>>
>>
>> HP support sad that bpdbm was leaking memory.
>>
>>
>>
>> Veritas support still investigating.. But we have about 30 bpdbm and bprd
>> processes active on our master which eats both my CPU`s and produces tons
>>
>>
>> of
>>
>>
>> IO against our db disk.
>>
>>
>>
>> I actived verbose = 5 on the master, and after 15 minutes the bpdbm log
>>
>>
>> had
>>
>>
>> reached the file size limit on our filsystem, 2 GB

>>
>>
>>
>> Any one had similar problems?
>>
>>
>>
>>
>>
>> Thanks and regards,
>>
>>
>>
>> Hampus Lind
>> Rikspolisstyrelsen
>> National Police Board
>> Tel dir: +46 (0)8 - 401 99 43
>> Tel mob: +46 (0)70 - 217 92 66
>> E-mail:   <mailto:hampus.lind at rps.police.se>
>> <mailto:hampus.lind at rps.police.se> hampus.lind at rps.police.se
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>>
>>
>>
>>
>>
>>
>>
>> --
>> ===================================
>>
>>   Steven L. Sesar
>>   Lead Operating Systems Programmer/Analyst
>>   UNIX Application Services R101
>>   The MITRE Corporation
>>   202 Burlington Road - MS K101
>>   Bedford, MA 01730
>>   tel: (781) 271-7702
>>   fax: (781) 271-2600
>>   mobile: (617) 519-8933
>>   email: ssesar at mitre.org
>>
>> ===================================
>>
>

<Prev in Thread] Current Thread [Next in Thread>