ADSM-L

Re: Slow restore for large NT client outcome.. appeal to Tivoli

2015-10-04 17:27:38
Subject: Re: Slow restore for large NT client outcome.. appeal to Tivoli
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
To: ADSM-L AT VM.MARIST DOT EDU
Jeff,

I have the same concerns.  My company is in the process of server
consolidation.  I have concerns that  when the time comes to consolidate
these servers into cluster servers I (TSM) will not be able to restore 1.2
TB of data to a  cluster server in a timely manner.  I am quickly loosing
the battle in defending TSM.  My NT admin's are moving toward hardware
mirroring.  If improvements to small file restores does not come soon I may
loose the battle.  I typically I achieve a 1-3gb per hour restore rate on
file servers,


Arturo Lopez


        -----Original Message-----
        From:   Jeff Connor [SMTP:connorj AT NIAGARAMOHAWK DOT COM]
        Sent:   Wednesday, September 20, 2000 11:22 AM
        To:     ADSM-L AT VM.MARIST DOT EDU
        Subject:        Slow restore for large NT client outcome.. appeal to
Tivoli Development/Support

        I posted the memo below to this listserv last week when we were
        having trouble with the performance restoring a large NT drive.
        This memo is for the people who wanted to know how we made out in
        the end.  I am also writing to bring to the attention of TSM
        development what I feel is a pretty big problem in the area of
        performance for clients with lots of small files.  I am pursuing
        this issue with Tivoli through other channels but thought others
        on this listserv might have the same concern. For a summary of
        our TSM config see my first memo below.

        First lets get a couple things out of the way.  I have been
        working with TSM/ADSM for approximately five years since
        version 1.  I am a HUGE fan of the product and have fought very
        hard to get our company to standardize on TSM and leave Arcserve,
        Backup exec, Legato, and the like.  I am pleased with
        improvements in TSM functionality over the years.  The second
        thing is we run TSM on OS/390. Over time I've seen many posts on
        the listserv about users that have achieved better performance
        with UNIX based TSM servers.  We are currently piloting TSM on
        AIX to test the performance.

        Now that we've established my loyalties, back to my concern about
        backup, and more importantly restore, performance for TSM clients
        with lots of small files.  Most of our UNIX servers are database
        servers so my concerns about small files really pertain mostly to
        Windows NT server clients.  Others may have issues with other
        platforms.  The NT clients I have restore issues with are big
        file and print servers.  The data partition is typically the D:
        drive and can be anywhere from 20GB to 160GB in size.   The best
        restore time we can achieve for the file and print servers is
        somewhere between 1.5GB and 3.5GB per hour generally on the lower
        side. Now we could go through a lot of the common, is your
        network performing, is your database cache hit high enough, tcp
        window sizes, txn sizes, and the usual things but assume for a
        moment that we are optimally configured and done all "the right
        stuff".  To make a performance comparison, we have a couple NT
        clients that contain a small number of file and they are large
        files.  We restored 20GB of data on one of those servers recently
        in 1hr 45mins.  The restore of the one directory on the D:
        partition for the client mentioned in my first memo below with an
        average file size of 64K ran for 6hrs 5mins and transferring
        4.8GB.  The whole drive took 45hrs.

        Our NT group was a hard sell for replacing Arcserve with TSM.
        Since the switch, I have taken quite a beating about TSM restore
        performance.  Our NT admins take the position, "we'll try TSM but
        if the performance doesn't improve we are going with a tried and
        true solution like Compaq Enterprise Backup.  TSM seems to us
        like a UNIX product trying to make it in the NT space.  It is not
        typically selected by companies for NT backup and recovery".
        Not a word for word quote but generally sums up their position.
        The Compaq solution would use Arcserve from what I've been told.

        I know Tivoli/IBM have tried to address the small files issue
        with things like small file aggregation but I haven't noticed
        much improvement from version to version for big restores of
        servers with small files.  I've heard different reasons for slow
        performance with small files over the years like the amount of
        TSM database lookups, NT file system processing/inefficiencies,
        etc.  When looking at future directions for SAN backups I can
        understand the argument that the SAN pipes will be faster and
        TCPIP overhead will be eliminated leading to faster
        restores/backups.  But if the poor performance for small files
        has a lot to do with TSM database lookups/overhead then how will
        performance be different when the data travels over the SAN
        versus the LAN/WAN?  The database processing about file
        information will be pretty much the same won't it?  I have
        suggested to our NT admins that we break that big D: partition
        into multiple smaller partitions so I can collocate by filespace
        and restore multiple drives concurrently.  Frankly, they are not
        interested in changing the way they configure their servers to
        accommodate the backup software.  They feel they would not have
        to do this with Arcserve or other more common NT backup products.
        I've tried tests using share names for folders and performing
        backups/restores using the UNC name, collocating the data by
        filespace and running concurrent restores.  My tests showed
        improved elapsed time but this scheme would be tough to maintain.
        In a full server restore scenario  I'd need to create the folders
        and shares for the target restore which means we'd need to keep
        track of that info some place.  I'd constantly have to monitor
        growth in all the folders to make sure I've carved up the drive
        in fairly equal parts to optimize for restore, etc.  Not a good
        solution either.

        Does anyone else see the poor performance for restoring clients
        with lots of small files and feel that this is a problem Tivoli
        needs to address?  I do.  If this issue is not resolved then I
        won't be able to keep using TSM to backup our NT servers.

        Thanks,
        Jeff Connor
        Niagara Mohawk Power Corp.


        ---------------------- Forwarded by Jeffrey P Connor/IT/NMPC on
        09/20/2000 10:32 AM ---------------------------


        Jeffrey P Connor
        09/13/2000 01:20 PM

        To:   ADSM-L AT VM.MARIST DOT EDU
        cc:

        Subject:  Slow restore for large NT client.. help!


             We are in the process of restoring a subdirectory of a very
        large NT client file space (D:) and it is running really slow.  I
        thought I'd see if any of you have some ideas as to where we can
        look for bottlenecks.
        The client config is:
             Compaq proliant 5500
             400MB RAM
             two 400MHz Xeon processors.
             ~160GB of disk in a Compaq disc array made up of 18.2GB
        drives
             Windows NT 4.0 SP6a
             TSM client for NT 3.7.2.01
             Applicable TSM client options:
                  tcpwindowsize 63
                  tcpbuffsize         31
                  tcpnodelay       yes
                  txnbytelimit       25600

        TSM server config
             TSM for OS/390 V3.7.1.0
             OS/390 2.6
             9672-R55
             TSM server DB cache hit ratio 98.5%
             ApplicableTSM server options:
                  TXNGROUPMAX 256
                  Databufferpoolsize  262144



        Network path:
             NT Client ----100Mbit Ethernet --> Switch -- 100Mbit
        Ethernet--> Cisco 7513 rtr -- 155Mbit ATM -> Cisco 5500 atm
        switch -->IBM 2216 -->ESCON --> S/390 TSM Server


        Now that you have the background here's what we are seeing.

        Only 4.7GB have been transfered in 4 hours.  We are attempting to
        restore one subdirectory on the D: drive first.
        TSM command line client command entered was:
             RES -subd=y \\filecluster2\d$\groups\ugitoper\*
        The D: drive has approximately 2,000,000 files.  Lots of small
        files.  NT client is a file and print server.
        A network sniffer trace shows mostly large chunks of data sent,
        no restransmits, then the NT client appears to throttle back,
        decreasing the tcpwindow size as if it could not accept the data
        as fast as TSM was sending it.  Windows sizes go to zero at times
        then bounces back to large window size(64512).
        NT perfmon shows plenty of memory and cpu with minimal disk
        queueing.



        This brings me to my question.  What tools can I use or what
        metrics in perfmon can I check to see "under the covers"
        to determine what is slowing us down.  The network support staff
        feels the network bandwidth is there and feel the NT client is
        throttleling things back.  The NT support staff says the NT
        client machine is not overwhelmed in terms of CPU, Memory, disk,
        etc. they feel TSM is the problem.

        What could be the bottleneck on the NT client and what tool can I
        use to find it?

        Thanks in advance for your assistance,
        Jeff Connor
        Niagara Mohawk Power Corp
<Prev in Thread] Current Thread [Next in Thread>