Veritas-bu

[Veritas-bu] Serious master issue...

2007-02-14 16:46:31
Subject: [Veritas-bu] Serious master issue...
From: jpiszcz at lucidpixels.com (Justin Piszcz)
Date: Wed, 14 Feb 2007 16:46:31 -0500 (EST)
Another option is turn off backups, move the old images out of the way one 
by one and find what is causing the consistency to choke, does it stop on 
one set of images or does it run through them all but just very slowly?

On Wed, 14 Feb 2007, Hampus Lind wrote:

> The NBCC doesn?t look at the image db, and they keep saying we have a
> problem there.. But I don?t know how we can fix it or even collect the info
> from the db when bpdbm ?consistensy 2 wont runt..
>
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
> -----Ursprungligt meddelande-----
> Fr?n: Steven L. Sesar [mailto:ssesar at mitre.org]
> Skickat: den 14 februari 2007 20:53
> Till: Hampus Lind
> Kopia: 'Justin Piszcz'; 'Bahnmiller, Bryan';
> Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: [Veritas-bu] Serious master issue...
>
>
>
> bpdbm -consistency 2 is useless to you, based on the amount of data that you
> back up nightly and my own presumption of how long backups run in your
> environment. It will take longer to run than your backup domain will remain
> idle. If I recall, they have a process which does a better job at finding
> catalog/db corruption/inconsistency. I think that it's called NBCC.
>
> The problem with NBCC is similar, though. You send them the output of three
> commands:
>
> vmquery -a, bpmedialist -ls, and bpimmedia
>
> Then, they munge the output of the above commands through a reporting tool
> that Symantec will NOT share with end users. At some point later in the day
> (hopefully, sooner rather than later), they will send you a report. You must
> then take certain actions to correct any discrepancies found. The backup
> system must be completely idle during this time. Restores are ok, but no
> backup activity can be taking place.
>
> Afterwards, you 'll run those commands again, they'll generate the report
> again, and you'll see how you're doing. It may take you several passes to
> get things squared away.
>
> The problem is that most of us don't have a completely idle backup
> infrastructure - at least for long enough for this process to complete. I
> didn't when I was NBU customer. Once you take backups, the reports become
> obsolete, as do the results of bpdbm -consistency 2.
>
> It would not surprise me if bpdbm was leaking memory on your platform.
>
> --Steve
>
>
> Hampus Lind wrote:
>
> Hi,
>
> I cant don anything....
>
> Bpdbm -consistecny 2 has been running for over 12 hours and havent checked
> more than 4-5 clients.
>
> It was the first thing support told me. Your db is corrupted... So I tried
> to run bpdbm -consistency 2 check. The check found some issues, like expired
> images which where not removed etc. But when I was about to remove them
> manually the netbackup db clean process already had took care of them..
>
> So what I understand you can have some level of corruption in your db which
> nbu cleans out when the clean job runs.
>
> I am not compressing my catalogs.
>
> Thanks,
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
>
> -----Ursprungligt meddelande-----
> Fr?n: Justin Piszcz [mailto:jpiszcz at lucidpixels.com]
> Skickat: den 14 februari 2007 20:31
> Till: Hampus Lind
> Kopia: 'Bahnmiller, Bryan'; Veritas-bu at mailman.eng.auburn.edu
> ?mne: Re: [Veritas-bu] Serious master issue...
>
> Have you run the check_db_consistency? There is a command that checks to
> make sure your images are not corrupted!
>
> I would recommend checking that.
>
> Also, are you running compression on your catalogs?
>
>
> On Wed, 14 Feb 2007, Hampus Lind wrote:
>
>
>
> Thanks Bryan,
>
>
>
> It happens directly after reboot..
>
>
>
> The thing is:
>
> -          I have deactivated all polices
>
> -          Stop our media server
>
> -          And then restarted netbackup on the master.
>
>
>
> So there are absolutely no action going on (no backup, no user backup, no
> restore, no staging) only internal netbackup work
.
>
> At once when netbackup on the master gets active, it starts bpdbm process
> after bpdbm process. It consume 100% of both my CPU`s and write/read
>
>
> heavily
>
>
> to the /usr/openv/netbackup/db filesystem.
>
> When I have no action at all after a clean start, we have about 42 bpdbm
> processes and nearly as many bprd processes

>
>
>
> I cant figure this one out, and support points to disk config or something
> else that sounds good in there ears

>
>
>
> Thanks for all help,
>
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
> -----Ursprungligt meddelande-----
> Fr?n: Bahnmiller, Bryan [mailto:BBahnmiller at pier1.com]
> Skickat: den 14 februari 2007 20:04
> Till: Hampus Lind
> ?mne: RE: [Veritas-bu] Serious master issue...
>
>
>
> Hampus,
>
>
>
> How quickly does this behaviour start happening after a recycle/reboot? I
> worked with an N4000 master running 11i. We did have 8 cpus and 8 GB RAM.
>
>
> We
>
>
> were running over 15,000 backup jobs daily though. Our catalog was over
> 400GB. (Catalog was on EMC DMX disk.) Running good old 3.4 we would have
>
>
> to
>
>
> reboot the system almost every week. If you can cleanly re-cycle NetBackup
>
>
> -
>
>
> shut it down, kill all NBU processes, and then restart it, that should be
> almost as good.
>
>
>
> Here we are running NBU 5.1mp4 on a Win2K3 master - 2 cpus, 4 GB RAM. (I
> inherited the system - not my choice.) We run about 5000 jobs per day, we
> have a 280 GB catalog on EMC Clariion. The system will stay stable for 2
> weeks pretty easily. 4 weeks starts pushing things. So we usually reboot
>
>
> our
>
>
> Windows master and media servers every 2 weeks.
>
>
>
> It seems like you will have cumulative problems with NetBackup that can
> build up over time. It is way more pronounced on busy systems. We have
> another NetBackup system that has 1 Master and 1 Media server. It runs
>
>
> about
>
>
> 40 jobs per day max. I hardly ever have to reboot those servers.
>
>
>
>      Bryan
>
>
>
> Bryan Bahnmiller
>
> ISD Business Continuity
>
> Pier 1 Imports, Inc
>
> 817-252-8570
>
>
>
>
>
>
> _____
>
>
> From: veritas-bu-bounces at mailman.eng.auburn.edu
> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Hampus
>
>
> Lind
>
>
> Sent: Wednesday, February 14, 2007 12:17 PM
> To: Veritas-bu at mailman.eng.auburn.edu
> Subject: Re: [Veritas-bu] Serious master issue...
> Importance: High
>
> All,
>
>
>
> Now I have been transferred to USA support
 God bless America!
>
>
>
> They have told me that they haven?t seen such a big installation in over a
> year
. Strange, I have about 200 clients and backup a couple a TB per
>
>
> day..
>
>
> I was under the impression that this was kinda small installation..??
>
>
>
> However, they have told me that this is perfectly normal behaviour with
> netbackup. That it produces heavy disk IO and eat all CPU power. And I was
> really stupid and told them that I also had an case with HP earlier on
>
>
> this
>
>
> disk IO problem, so now Symantec support are pointing all there fingers at
> HP and our disk setup.
>
>
>
> Our DB is about 60-65 GB and resides on a StorageTek Flexline 380 disk
>
>
> array
>
>
> (SAN). We run a RAID 5 on 146GB FC drives.. I don?t really see the
> bottleneck there, but I will create a RAID 5 on 73GB 15K FC drives just to
> shut netbackup support up

>
>
>
> We run a two CPU HP rp2470  with HP-UX 11.11 as a master server. Shouldn?t
> this be enough for this installation?
>
>
>
> Ooh well

>
>
>
> If support cant help me, what should I do?? I am desperate!!!
>
>
>
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail: hampus.lind at rps.police.se
>
> -----Ursprungligt meddelande-----
> Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Hampus Lind
> Skickat: den 14 februari 2007 12:48
> Till: Veritas-bu at mailman.eng.auburn.edu
> ?mne: [Veritas-bu] Serious master issue...
> Prioritet: H?g
>
>
>
> Hi,
>
>
>
> We have a serious issue here with our master server. The problem occurred
>
>
> a
>
>
> couple of weeks ago, or at least I found out about it then..
>
>
>
> I was looking at IO`s and scsi queue depth on my master (hp-ux 11.11) when
>
>
> a
>
>
> say that we had 4000-6000 SCSI commands in que, and a disk utilisation of
> 100% for the /usr/openv/netbackup/db disk.
>
>
>
> I have patched hpux to the latest patch bundle and we run NBU 5.1 MP4.
>
>
>
> HP support sad that bpdbm was leaking memory.
>
>
>
> Veritas support still investigating.. But we have about 30 bpdbm and bprd
> processes active on our master which eats both my CPU`s and produces tons
>
>
> of
>
>
> IO against our db disk.
>
>
>
> I actived verbose = 5 on the master, and after 15 minutes the bpdbm log
>
>
> had
>
>
> reached the file size limit on our filsystem, 2 GB

>
>
>
> Any one had similar problems?
>
>
>
>
>
> Thanks and regards,
>
>
>
> Hampus Lind
> Rikspolisstyrelsen
> National Police Board
> Tel dir: +46 (0)8 - 401 99 43
> Tel mob: +46 (0)70 - 217 92 66
> E-mail:   <mailto:hampus.lind at rps.police.se>
> <mailto:hampus.lind at rps.police.se> hampus.lind at rps.police.se
>
>
>
>
>
>
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
>
>
>
>
>
>
> -- 
> ===================================
>
>   Steven L. Sesar
>   Lead Operating Systems Programmer/Analyst
>   UNIX Application Services R101
>   The MITRE Corporation
>   202 Burlington Road - MS K101
>   Bedford, MA 01730
>   tel: (781) 271-7702
>   fax: (781) 271-2600
>   mobile: (617) 519-8933
>   email: ssesar at mitre.org
>
> ===================================
>