Utilized capacity of tape drives

jimlane

ADSM.ORG Member
Joined
May 9, 2008
Messages
24
Reaction score
0
Points
0
Hi, All

I need to come up with a methodology for measuring how close to effective capacity my TSM tape infrastructure is. My TSM support people say "the drives are busy all the time". I find this hard to credit and not at all helpful. Surely there must be a way to measure how much more work a fixed number of drives are doing as more and more work is being asked of them? I realize this question is more than a little open-ended but I'm starting from scratch here (pardon the pun). I have a way to measure the duration during which drives have a tape mounted versus not. I can also add up, from the TSM database, the total duration of tape-using activities (backup, archive, reclaim and so on). I'm just not sure what I should be looking for that would tell me when a certain amount of tape activity is too much.

Regards,
Jim Lane
 
Bottom line is - you can't. Probably the most effective way is looking for mediawait (from the summary) but that won't really tell you, how much additional workload you could handle with additional tapes nor would it really give you an idea how close you are to the edge and whether drives help you or a different setup would do better without new drives. Hire a good consultant (somebody who doesn't sell hardware ;) ) and let them analyse.

PJ
 
PJ: thanks for this, but. I'm afraid you've hit one of my hot buttons here. Consultants put their pants on one leg at a time just like everybody else. What do they know that your or I don't know? If it's a question of "trust me, I'm an expert" then I don't need outside help. After 34 years in IT I have more gravitas than any consultant I've seen in quite a while.

Now that I've gotten that off my chest:) Seriously though, there has to be a way of tracking this. How can I in good conscience ask the-powers-that-be to spend real money without a way of backing it up? I know how I'd react if I was on the other side of the desk.

/JL
 
I have a script that does a q mount every 15 mins. I can then do a query of the activity log for message number 8334. This gives me a rough estimate of how many drives are mounted on an every 15 min basis. Its not exact, but if I see that I am hitting my max for a good portion of the day, I can use that to help justify the need for more drives.
 
Sounds like an interesting starting point. Would you be willing to share your script?
 
Its just a one line script that cron calls every 15 mins that does a q mount. Then when you want to do reporting, just issue q actl msgn=8334 begindate=-X where X is the retention of your actl

Here is the script

dsmadmc -id=admin -pa=admin q mount
 
I neither expect our consultants to know more about TSM (especially my own servers) than I do. Its more the effect of somebody having experience with many different installations taking a fresh, unspoiled view at what I eventually don't see anymore because I've been looking at it for too long. Sometimes this leads to surprisingly simple, obvious and really helpfull results (and leaves me wondering why the h*ll I didn't think about it myself ;) )

PJ
 
You don't really need the "q mount" script, btw, if you want to know about your drive utilization. Mounts are in the summary table and while they don't reflect mediawaits, the backup/archive/restore/retrieve/migration etc. entries from the summary do. Just feeding the output of "select * from summary where ... whatever you're looking for" to excel and putting each entity on a timeline graph will probably already reveal quite a lot - visually. Or you pull in the summary using ODBC, create a pivot table and visualize that on a timeline.

PJ
 
Hi,

I use the following to determine how much the drives in the library are used

-> drive utilization in latest in previous 24 hour <-
select sum(CAST((END_TIME-START_TIME) minutes as decimal)) as "Tape Usage (min)", drive_name as "Drive Name" from summary where (start_time >=curRENT_TIMESTAMP - 24 hours) and (start_time <= current_timestamp -24 HOURS) AND ACTIVITY='TAPE MOUNT' group by drive_name

This qives the drive utilization in the last 24 hours.
Of course drive utilization differs each day. On Friday / Saturday it is generally quite higher then on Sunday.
I also change the select to cover a week and month. That balances out fluctuations.
To have the percentahe you will have to do some math yourself;
To have it a little more accurate you can do a cast to "seconds".
I have noticed that a drive utilization of 65% is perceived like the drives are constantly busy.

Regards. Wim.
 
It sounds to me like they're more concerned with a lack of mount points by saying 'it's busy all the time'. So, if they have to perform an on demand operation like a restore they're waiting for mount points and can't get the work done until a previous operation has completed. This can be frustrating and impact productivity.

If this is the case you might want to consider moving to a disk based storage pool at some point for your busier pools. I have a TSM help desk and this is what I did to support the exact same claim from them. This way you can set a mount limit of 30+ and not have to worry too much about available tape drives.

Just a thought. :)
 
If you don't mind using a third party tool

TSMManager can produce a nice graphic showing when each of your tape drives were mounted over a given time span and what process was using the tape. You can zoom in an out to get a good idea of which times are busy and if your drives sit idle at any point and it colours codes different activities (backups, reclamations etc) so you can easily see what is going on.

You can get a 30 day trial from their website and install it on any pc in just a few minutes, the drive utlisation report can then be run straight away - its jsut goes back throught the summary tables on your server and presents it in a nice easy to see way.
 
PJ: I've arranged to get csv files extracted from the summary table with all the mediawait numbers in them. One problem I've encountered is outliers For example a whole slew of backups from one instance starting at the same time with media waits of 50-60 hours. The admins say this was caused by a rogue client that "wouldn't back up". I'm wondering if it's fair to just exclude media waits longer than some arbitrary value. Also what statistic should I be looking at, total media wait or average per operation, or some percentile?

-Jim
 
Ypcat: the total amount of disk data my shop needs to back up is measured in petabytes so disk-only isn't really on option. I should be so lucky!

-Jim
 
Wim: I'm a bit weak on SQL. When I run your query on my system I get an error as follows

ANR2905E Unexpected SQL identifier token - 'USAGE'.

|
.............................V.................................
minutes as decimal)) as Tape Usage (min), drive_name as Drive N

ANS8001I Return code 3.

I wonder if you have any idea what I'm doing wrong here?

-Jim
 
I'd devide the mediawait for backups with the number of sessions (column: processes). It will still be effected by extremely misbehaving clients but the figure would probably lead to a better estimate and you won't see mediawaits exceeding the length of the session any more.

PJ
 
PJ: I'm not sure I follow the rationale for this. Why divide by the number of processes? That number seems to vary in the range of 1-15 on my test system. Would mediaw/processes vary with the demand for tape drives moreso than just media on its own? What I'm looking for is a number that I can do a time series plot of and project into the future so as to say "quantity x will exceed threshold y in thus-and-so months at which time more tape drives will be required".

-Jim
 
Back
Top