ADSM-L

Re: Slow restore for large NT client outcome

2000-09-20 15:55:22
Subject: Re: Slow restore for large NT client outcome
From: Richard Sims <rbs AT BU DOT EDU>
Date: Wed, 20 Sep 2000 15:55:32 -0400
Jeff - I more than sympathize with your predicament as a storage
       administrator dealing with NT systems and their administrators.
But be careful about going after a vendor to fix a problem you perceive
to be with their product without first establishing baseline values for
your configuration and otherwise analyzing it determine just where and
what the problem is.
    Nick's posting today about NTFS performance and recommendations last
week regarding FTP baseline tests regarding your problem will help to
define it.  Such measures will help you obtain baseline values, as close
as possible to optimal values, against which you can compare performance
with more involved applications like TSM, on top of that amalgam.
    Being a long-time ADSM guy I'm sure you remember back to postings
where people would wail on IBM about poor performance backing up and
restoring, asking "What's wrong with ADSM???" - when their implementation
choices resulted in 20,000 or more files in one directory, which is
deadly for anything entering such a directory.  That is to say, the way
in which systems are configured and implemented, plus networking problems
and operating system defects and design shortcomings can thwart performance in
any package implemented on them.  Some customers unknowingly implement tape
technologies with poor start-stop performance, see slow restoral performance,
and then blame the restoral software.  We have to be aware of what these
things can do to and for us.  Know thy technologies, lest they bite thee.
    It's a classic situation in data processing that users blame performance
problems on the first thing between them and the computer system, but of
course that's just convenient blame assignment.  After all - they have to
blame someone or something, and that's the one thing they know.  This is not
to say that TSM is perfect or necessarily blameless in this situation.  But as
customer technicians it's our responsibility to determine where the problem
lies.  And for that to succeed the various experts in the environment
(networking, opsys, application) have to work together to analyze it.
    Your NT people say that TSM seems like a UNIX product trying to make it in
the NT space.  The irony is that it's a mainframe product that did even better
in a Unix environment because of that environment's minimized overhead.  You
have the unenviable situation of an MVS server and NT clients, with a lineage
and history of TCP/IP performance shortcomings, high overhead, and file system
inefficiencies.  Your shop is looking at an AIX-based TSM server system, which
is a good move.  Whether "TSM can make it in the NT space" is more up to NT
than TSM: if Microsoft wants to be a serious contender, they have to make
Windows a serious operating system.  Many shops won't implement NTs as
enterprise servers because their performance is ridiculously inferior.  Tivoli
gets blamed for numerous things not its fault, like its TDP being unable to
restore individual mail boxes in a certain vendor mail system, when that other
vendor fails to provide an API to make it possible.  Certainly Tivoli would
agree that they should take responsibility for their own failings, but we
should be careful to attribute to them what is actually theirs.
    The situation you're in is the familiar one so many of us find ourselves
in, having to address complaints about why things are so bad, when we don't
have measurements to know how much that deviates from how good they can be.
I strongly encourage everyone to get such numbers during off-peak times so
that you have something to compare against, in each area (disk activity, tape
throughput, tape search time, network capacity, CPU load capability, etc.).
    From what you describe, your shop is undergoing an unusual number of full
file system recoveries.  With the size of today's disks and the need for data
to be current, I would very much avoid dependence on any backup package for
such recoveries.  I would recommend some form of disk mirroring for
first-level recovery, to render recovery immediate and current.  Rely upon a
package such as TSM for second-tier recovery when the mirror can't do it, and
for the spot restoral of individual files.
    In summary, get those baseline numbers and use them to help isolate the
problem areas.  Identifying problems is 90% of their solution.  And stay a
huge fan of the product.  :-)

      Richard Sims, BU
<Prev in Thread] Current Thread [Next in Thread>