ADSM-L

Re: Looking for Help/Alternatives/Suggestions/Ideas

2003-04-08 09:23:32
Subject: Re: Looking for Help/Alternatives/Suggestions/Ideas
From: Richard Sims <rbs AT BU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 8 Apr 2003 09:23:12 -0400
>...
>We are now a week into our restore processing. Unfortunately, no matter
>what we do, it will still take 10-15 additional days to restore this
>system.  There are 8000+ users without their old email.
>...
>Yes, I have called IBM/Tivoli and gone over various TSM server settings,
>etc (when I described the situation and the size of the TSM backups, the
>response I got was WOW!).  There is nothing more that can be done from the
>TSM server perspective, that would speed things up.  In fact, most of the
>slowness seems to be on the AIX server disk subsystem (seems to be 100%
>busy most of the time).
>...

I'm concerned that TSM may be getting a bad rep at your shop because of this
situation when it may not be its fault.  And I'm much more concerned that there
seems to be no real analysis being done by your operating system people to get
to the root of the performance problem.  After all, a systems programmer does
not need to know a product in order to analyze where its resource utilization is
occurring.  The observation on "the AIX server disk subsystem" busyness is
awfully broad, and needs to be fully pursued.  Is it, for example, reflective of
high paging, which in turn the result of a server system with inadequate real
memory?  Is the file system directory topology that is utilized by the mail
facility seriously imbalanced, and thus impeding throughput?

There is clearly something VERY wrong with the configuration there, as
multi-week restorals are by no means normal or expected.  I would strongly
advise having the technical staff at your site (network, opsys, TSM) engage in
the team effort to analyze the problem, find the bottlenecks, and get them
resolved.  In that the restoral is ongoing, you have the unrivaled opportunity
now to pursue the issue while it evidences itself.  If you have to bring in
consultants to review things, then so be it.  What you're experiencing now is a
"worst case" system load, and have to gauge whether your environment was
properly outfitted to accommodate it.

>Due to this situation, TSM is fast loosing face as a viable DR tool...

TSM may, of course, not be the issue in this throughput problem, and so any
backup/restore product may be as bogged down in such a task.  It's always the
case that the last thing invoked takes the blame for whatever happens, but it's
the job of the technical talent at the site to determine actual causes and
communicate results so that all involved realize what's involved...and so that
you can take credit for resolution and get that pay raise.

  Richard Sims, BU