Warning: Audit DB can take weeks!

yd52

ADSM.ORG Member
Joined
Jun 28, 2005
Messages
4
Reaction score
0
Points
0
We have run into a database corruption. Some errors occured in a cache diskstorage pool, resulting in server crashes. We definitetly need to do an audit db with fix=yes. This volume has a capacity of 50GB. In case the full volume had database errors, IBM estimated the time for completion of the audit db process in our case to take about 150(!) days (based on timestamps, on newer machine you can reach about 4 times the speed, that would still be more than one month). TSM cannot be used while processing the database audit. We are running the process since June, 15th. IBM is not able to provide any other way to repair the database, dsmserv audit db is considered the only way. The reason for database corruption can't be determined. Again: we can't use our backup service since nearly two weeks! NEVER USE LARGE VOLUMES!
 
How far back does the corruption go?? You are correct, IBM does not offer sound DB maintenance tools at the present time. Although its within the Development teams hands, we are also waiting since our DBs are as high as 102Gb where we average at 72Gb x18 TSM servers. We tend to keep it to 2 hours recovery time, which is approx 60G.



Since your good backups are at least two weeks corrupt, Why don't you recall your DB from three weeks or so back, and perform a DB recovery on a brand new server>>

I would highly recommend that you find your last known good backup. From this, rebuild a Brand New TSM server and put your company back into business ASAP. In the meantime, your corrupted server should be taken off the network, all it's administration schedules stopped, all cron jobs stopped, all unused applications stopped. I would raise your NICE value very high end to gain the best performance of your server.

Then I would make copies of the vital 5 - especially the volhist and dsmserv.dsk and devcfg files. Take a look at these using vi, look for irregularities.



I would be highly curious if you proceed with this audit and your results, difficulties encountered, etc...If the audit is still running now for two weeks, then I am sorry to say, it will be easier and quicker, like I mentioned, to rebuild - at least 2 hrs rebuild time. That is if your DRM plan is adequate Is it??



Best of luck - let me know how it turns out.

Steven
 
> How far back does the corruption go??



dsmserv.err reports a problem in the database since 2003...

Now i know about this file and will examine it regularly, especially after crashes.



> Since your good backups are at least two weeks corrupt, Why don't you recall your DB from three

> weeks or so back, and perform a DB recovery on a brand new server>>



We did not know that auditing diskstorage can be made in absence of those volumes.

Audting the DB on a p655+ took about 10days, our machine for TSM is 3 Times slower.

It would have been a month.



> should be taken off the network, all it's administration schedules stopped, all cron jobs stopped, all > unused applications stopped. I would raise your NICE value very high end to gain the best

> performance of your server.



Well. First of all, it's a server for TSM so of course there are no other applications running on it. Then,

at 99,9% CPU time there is no need to renice (and 3 of 4 CPUs were idle, because audit runs only single threaded)



> I would be highly curious if you proceed with this audit and your results, difficulties encountered, etc.



IBM finally provided a way to throw away the corrupted bitfield (thereby loosing the data on the diskvolumes, but these are caches). A completly new audit was startet und ran only 2 days on the

slow machine.



Now everything is OK.



Joe
 
What was the corrupted bit field> do you mind sharing the steps IBM provided you?

Granted we have never experienced a crash in over 4 years against any of your TSM Servers, which by the way have Dbs larger than yours in some instances. This information would be a could prerequite for us to maintain.



Steve
 
<TABLE BORDER=0 ALIGN=CENTER WIDTH=85%><TR><TD><font class="pn-sub">Quote:</font><HR></TD></TR><TR><TD><FONT class="pn-sub"><BLOCKQUOTE>What was the corrupted bit field> do you mind sharing the steps IBM provided you?

</BLOCKQUOTE></FONT></TD></TR><TR><TD><HR></TD></TR></TABLE>



I know nothing about the internal database structures, so I can't say when applying this

method would be recommmended. A not documented option was used so you definitily

should do this only when IBM support advices you to do so. But you can be sure that this

will be integrated in future releases on TSM.



Joe
 
I wouldn't hold your breathe on this. Maybe with TSM 6 maybe but I do not expect anything this year or next.

According to our IBM developer POC, it boils down to priorities.

Keep you trick safe. :)
 
Back
Top