ADSM-L

Re: Slow restore for large NT client outcome.. appeal to Tivoli D evelopment/Support

2000-09-20 14:19:43
Subject: Re: Slow restore for large NT client outcome.. appeal to Tivoli D evelopment/Support
From: "Farris, Raeana" <RFarris AT LRS DOT COM>
Date: Wed, 20 Sep 2000 13:19:24 -0500
Just a thought - Take a look at the new TSM Implementation Redbook.  They
set up separate storage pools for NT, one for directory info and one for
files.  According to the Redbook restoring the directory structure first
allows for faster restore times.  I happened to be in a TSM session last
week, the presenter said there was no need to separate storage
pools(confused me).  Please let us know if you get it resolved

Good Luck.

> -----Original Message-----
> From: Jeff Connor [SMTP:connorj AT NIAGARAMOHAWK DOT COM]
> Sent: Wednesday, September 20, 2000 11:22 AM
> To:   ADSM-L AT VM.MARIST DOT EDU
> Subject:      Slow restore for large NT client outcome.. appeal to Tivoli
> Development/Support
>
> I posted the memo below to this listserv last week when we were
> having trouble with the performance restoring a large NT drive.
> This memo is for the people who wanted to know how we made out in
> the end.  I am also writing to bring to the attention of TSM
> development what I feel is a pretty big problem in the area of
> performance for clients with lots of small files.  I am pursuing
> this issue with Tivoli through other channels but thought others
> on this listserv might have the same concern. For a summary of
> our TSM config see my first memo below.
>
> First lets get a couple things out of the way.  I have been
> working with TSM/ADSM for approximately five years since
> version 1.  I am a HUGE fan of the product and have fought very
> hard to get our company to standardize on TSM and leave Arcserve,
> Backup exec, Legato, and the like.  I am pleased with
> improvements in TSM functionality over the years.  The second
> thing is we run TSM on OS/390. Over time I've seen many posts on
> the listserv about users that have achieved better performance
> with UNIX based TSM servers.  We are currently piloting TSM on
> AIX to test the performance.
>
> Now that we've established my loyalties, back to my concern about
> backup, and more importantly restore, performance for TSM clients
> with lots of small files.  Most of our UNIX servers are database
> servers so my concerns about small files really pertain mostly to
> Windows NT server clients.  Others may have issues with other
> platforms.  The NT clients I have restore issues with are big
> file and print servers.  The data partition is typically the D:
> drive and can be anywhere from 20GB to 160GB in size.   The best
> restore time we can achieve for the file and print servers is
> somewhere between 1.5GB and 3.5GB per hour generally on the lower
> side. Now we could go through a lot of the common, is your
> network performing, is your database cache hit high enough, tcp
> window sizes, txn sizes, and the usual things but assume for a
> moment that we are optimally configured and done all "the right
> stuff".  To make a performance comparison, we have a couple NT
> clients that contain a small number of file and they are large
> files.  We restored 20GB of data on one of those servers recently
> in 1hr 45mins.  The restore of the one directory on the D:
> partition for the client mentioned in my first memo below with an
> average file size of 64K ran for 6hrs 5mins and transferring
> 4.8GB.  The whole drive took 45hrs.
>
> Our NT group was a hard sell for replacing Arcserve with TSM.
> Since the switch, I have taken quite a beating about TSM restore
> performance.  Our NT admins take the position, "we'll try TSM but
> if the performance doesn't improve we are going with a tried and
> true solution like Compaq Enterprise Backup.  TSM seems to us
> like a UNIX product trying to make it in the NT space.  It is not
> typically selected by companies for NT backup and recovery".
> Not a word for word quote but generally sums up their position.
> The Compaq solution would use Arcserve from what I've been told.
>
> I know Tivoli/IBM have tried to address the small files issue
> with things like small file aggregation but I haven't noticed
> much improvement from version to version for big restores of
> servers with small files.  I've heard different reasons for slow
> performance with small files over the years like the amount of
> TSM database lookups, NT file system processing/inefficiencies,
> etc.  When looking at future directions for SAN backups I can
> understand the argument that the SAN pipes will be faster and
> TCPIP overhead will be eliminated leading to faster
> restores/backups.  But if the poor performance for small files
> has a lot to do with TSM database lookups/overhead then how will
> performance be different when the data travels over the SAN
> versus the LAN/WAN?  The database processing about file
> information will be pretty much the same won't it?  I have
> suggested to our NT admins that we break that big D: partition
> into multiple smaller partitions so I can collocate by filespace
> and restore multiple drives concurrently.  Frankly, they are not
> interested in changing the way they configure their servers to
> accommodate the backup software.  They feel they would not have
> to do this with Arcserve or other more common NT backup products.
> I've tried tests using share names for folders and performing
> backups/restores using the UNC name, collocating the data by
> filespace and running concurrent restores.  My tests showed
> improved elapsed time but this scheme would be tough to maintain.
> In a full server restore scenario  I'd need to create the folders
> and shares for the target restore which means we'd need to keep
> track of that info some place.  I'd constantly have to monitor
> growth in all the folders to make sure I've carved up the drive
> in fairly equal parts to optimize for restore, etc.  Not a good
> solution either.
>
> Does anyone else see the poor performance for restoring clients
> with lots of small files and feel that this is a problem Tivoli
> needs to address?  I do.  If this issue is not resolved then I
> won't be able to keep using TSM to backup our NT servers.
>
> Thanks,
> Jeff Connor
> Niagara Mohawk Power Corp.
>
>
> ---------------------- Forwarded by Jeffrey P Connor/IT/NMPC on
> 09/20/2000 10:32 AM ---------------------------
>
>
> Jeffrey P Connor
> 09/13/2000 01:20 PM
>
> To:   ADSM-L AT VM.MARIST DOT EDU
> cc:
>
> Subject:  Slow restore for large NT client.. help!
>
>
>      We are in the process of restoring a subdirectory of a very
> large NT client file space (D:) and it is running really slow.  I
> thought I'd see if any of you have some ideas as to where we can
> look for bottlenecks.
> The client config is:
>      Compaq proliant 5500
>      400MB RAM
>      two 400MHz Xeon processors.
>      ~160GB of disk in a Compaq disc array made up of 18.2GB
> drives
>      Windows NT 4.0 SP6a
>      TSM client for NT 3.7.2.01
>      Applicable TSM client options:
>           tcpwindowsize 63
>           tcpbuffsize         31
>           tcpnodelay       yes
>           txnbytelimit       25600
>
> TSM server config
>      TSM for OS/390 V3.7.1.0
>      OS/390 2.6
>      9672-R55
>      TSM server DB cache hit ratio 98.5%
>      ApplicableTSM server options:
>           TXNGROUPMAX 256
>           Databufferpoolsize  262144
>
>
>
> Network path:
>      NT Client ----100Mbit Ethernet --> Switch -- 100Mbit
> Ethernet--> Cisco 7513 rtr -- 155Mbit ATM -> Cisco 5500 atm
> switch -->IBM 2216 -->ESCON --> S/390 TSM Server
>
>
> Now that you have the background here's what we are seeing.
>
> Only 4.7GB have been transfered in 4 hours.  We are attempting to
> restore one subdirectory on the D: drive first.
> TSM command line client command entered was:
>      RES -subd=y \\filecluster2\d$\groups\ugitoper\*
> The D: drive has approximately 2,000,000 files.  Lots of small
> files.  NT client is a file and print server.
> A network sniffer trace shows mostly large chunks of data sent,
> no restransmits, then the NT client appears to throttle back,
> decreasing the tcpwindow size as if it could not accept the data
> as fast as TSM was sending it.  Windows sizes go to zero at times
> then bounces back to large window size(64512).
> NT perfmon shows plenty of memory and cpu with minimal disk
> queueing.
>
>
>
> This brings me to my question.  What tools can I use or what
> metrics in perfmon can I check to see "under the covers"
> to determine what is slowing us down.  The network support staff
> feels the network bandwidth is there and feel the NT client is
> throttleling things back.  The NT support staff says the NT
> client machine is not overwhelmed in terms of CPU, Memory, disk,
> etc. they feel TSM is the problem.
>
> What could be the bottleneck on the NT client and what tool can I
> use to find it?
>
> Thanks in advance for your assistance,
> Jeff Connor
> Niagara Mohawk Power Corp