ADSM-L

Re: [ADSM-L] Database audit performance

2010-11-18 13:46:12
Subject: Re: [ADSM-L] Database audit performance
From: Paul Zarnowski <psz1 AT CORNELL DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 18 Nov 2010 13:45:06 -0500
Wanda,

We're in a similar situation to Eric, and we were planning to do pretty much 
what you suggested.  We have some (apparently) benign corruption that we would 
like to clean up before migrating the db to v6.  Our DB is much larger than 
Eric's, I think, because the audit is measured in days, not hours.  I was 
intended to use a technique similar to what you describe, but when we ran a 
test of the server-to-server export/import process, it took too long to be 
viable.  Our plan had more steps than yours.

At the time the db is cloned (to the test server), all storage pools would be 
marked readonly.  New temporary storage pools would be used to handle newly 
ingested data (during your step 2).  This way, when step 3 happens and the db 
on test is used as production, all of the db entries are still valid for the 
readonly storage pools.  And, all of the newly ingested data would be isolated 
and could (in theory anyway) be exported more quickly (since the temp storage 
pools would be on non-collocated tape and as much spare disk as we could have 
found).  Furthermore, I was planning to NOT switch test into production use 
until after the export/imports could catch up and transfer all of the newly 
ingested data to test.  I thought that would have been cleaner to end users 
trying to do restores during the transition period (no worries about what data 
had or had not yet been exported).  But the problem we ran into is that ingest 
rate was faster than the export rate, so we could never catch up.

Long and short of it, is that I don't think there's a way to avoid an extended 
outage, at least in our situation.  Perhaps with a smaller db and a lower daily 
ingest rate.

..Paul

At 10:03 AM 11/18/2010, Prather, Wanda wrote:
>Audit DB is notoriously slow.  Even if you improve performance by 20%, you'll 
>still have a very long down time.
>Here's a different idea to think about:
>
>1) perform the audit db on the test server, take as long as you need.  
>2) Let your clients continue backing up as usual to production
>3) when the db on test ready, bring up that TSM and swap ip addresses with 
>prod/test, so your clients are now backing up to test 
>4) set up server-to-server communications
>5) export node server-to-server from oldprod to test, using fromdate-fromtime 
>todate-totime merge=yes to pick up anything that you missed in that 17-hour+ 
>window
>
>Test is now prod with a clean db.
>Anybody think of a reason that won't work?
>
>W
>
>
>
>
>-----Original Message-----
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
>Of Loon, EJ van - SPLXO
>Sent: Thursday, November 18, 2010 5:44 AM
>To: ADSM-L AT VM.MARIST DOT EDU
>Subject: [ADSM-L] Database audit performance
>
>Hi TSM-ers!
>We're having orphaned database entries, caused by a very old bug, fixed
>some server releases ago, but only recently discovered. I'm currently
>trying to find a way to speed-up the auditdb performance.
>What I'm planning to do is this:
>1) backup the database on our production server
>2) stop the production server
>3) restore the production database on our test server which already used
>new disks, allocated on our new Vmax.
>4) perform an audit fix=yes on this database
>5) backup the fixed database and restore it on the production server
>I already tested the scenario above and it works, but the audit takes
>too long to finish (17 hours). Since we're backing up a lot of Oracle
>databases, TSM downtime will be too long, the Oracle recovery logs will
>fill up and the databases will stop.
>We are running an AIX TSM server with plenty of memory and multiple HBA
>to the SAN.
>Restoring the database runs ok, Topas is showing around 25 Mb/sec disk
>write speed. I have seen better performance on Vmax disks, but I can
>live with this.
>When I start the audit Topas shows a disk read and write speed average
>less than 1 Mb./sec. CPU average is around 50% and vmstat shows no page
>in and out.
>I tried everything: mounting the filespace with cio, dio, using RAW
>logical volumes, tuning read ahead through ioo, it doesn't make any
>difference or even gets worse (when using RAW for instance).
>I'm really out of options here. Something is holding back the audit, but
>I can't find what!
>Does anybody have some tips for me?
>Thank you VERY much in advance!
>Kind regards,
>Eric van Loon
>KLM Royal Dutch Airlines
></pre>********************************************************<br>For 
>information, services and offers, please visit our web site: 
>http://www.klm.com. This e-mail and any attachment may contain confidential 
>and privileged material intended for the addressee only. If you are not the 
>addressee, you are notified that no part of the e-mail or any attachment may 
>be disclosed, copied or distributed, and that any other action related to this 
>e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
>received this e-mail by error, please notify the sender immediately by return 
>e-mail, and delete this message.<br><br>Koninklijke Luchtvaart Maatschappij NV 
>(KLM), its subsidiaries and/or its employees shall not be liable for the 
>incorrect or incomplete transmission of this e-mail or any attachments, nor 
>responsible for any delay in receipt.<br>Koninklijke Luchtvaart Maatschappij 
>N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The 
>Netherlands, with registered number  33014286 
><br>********************************************************<pre> 


--
Paul Zarnowski                            Ph: 607-255-4757
Manager, Storage Services                 Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801    Em: psz1 AT cornell DOT edu  

<Prev in Thread] Current Thread [Next in Thread>