ADSM-L

Re: Slow restore for large NT client outcome.. appeal to Tivo

2000-09-20 16:20:47
Subject: Re: Slow restore for large NT client outcome.. appeal to Tivo
From: "Keith E. Pruitt" <kpruitt AT MAYERBROWN DOT COM>
Date: Wed, 20 Sep 2000 13:02:50 -0500
Jeff, we too have a problem with small files. At first I thought it was a
Netware thing because the servers we have the greatest amount of files on reside
on the Netware servers.
But reading emails from several users I see that I may have a future problem on
the NT side. We store Word and WordPerfect docs on two Netware 5 machines and
each server holds about 1.8 Million files apiece. Needless to say these files
are not that big. It took over 11 hours to back each of the servers up and they
total around 30GB per server. We were forced to perform a Full backup because
our director and other new admins don't understand and feel comfortable with the
"incremental forever" logic. I would hate to see what a restore would look like.
In contrast, we just backed up a directory on an NT server we are using for our
Backoffice conversion and that dir totals 35GB. That took 2h20m. We also
performed a large restore from one AIX machine to another one of about 25GB.
Less than 2 hours to restore. We have tweaked our Netware and AIX ADSM server
according to performance guides and other suggestions and still have issues with
small files.

We will be moving our documents from Netware to NT soon and our NT guys like to
refer to ADSM as crap. They are used to Arcserve but our now raving about
BackupExec. It is going to be extremely difficult to explain if our huge machine
can't keep up with their backup server. I know that overall ADSM is a better and
more stable product but what do you do when you have a mixture of servers with
large databases(ADSM's favorite) and (the more common) servers with small files
that Arcserve and others like? I'm hoping another ADSM/TSM user has some tricks
or tweaks that can help in this area. Anyone from any universities out there?

____________________Reply Separator____________________
Subject:    Slow restore for large NT client outcome.. appeal to Tivoli

Author: Jeff Connor <connorj AT NIAGARAMOHAWK DOT COM>
Date:       09/20/2000 12:21 PM

Our NT group was a hard sell for replacing Arcserve with TSM.
Since the switch, I have taken quite a beating about TSM restore
performance.  Our NT admins take the position, "we'll try TSM but
if the performance doesn't improve we are going with a tried and
true solution like Compaq Enterprise Backup.  TSM seems to us
like a UNIX product trying to make it in the NT space.  It is not
typically selected by companies for NT backup and recovery".
Not a word for word quote but generally sums up their position.
The Compaq solution would use Arcserve from what I've been told.

I know Tivoli/IBM have tried to address the small files issue
with things like small file aggregation but I haven't noticed
much improvement from version to version for big restores of
servers with small files.  I've heard different reasons for slow
performance with small files over the years like the amount of
TSM database lookups, NT file system processing/inefficiencies,
etc.    I have
suggested to our NT admins that we break that big D: partition
into multiple smaller partitions so I can collocate by filespace
and restore multiple drives concurrently.  Frankly, they are not
interested in changing the way they configure their servers to
accommodate the backup software.  They feel they would not have
to do this with Arcserve or other more common NT backup products.
I've tried tests using share names for folders and performing
backups/restores using the UNC name, collocating the data by
filespace and running concurrent restores.  My tests showed
improved elapsed time but this scheme would be tough to maintain.
In a full server restore scenario  I'd need to create the folders
and shares for the target restore which means we'd need to keep
track of that info some place.  I'd constantly have to monitor
growth in all the folders to make sure I've carved up the drive
in fairly equal parts to optimize for restore, etc.  Not a good
solution either.

Does anyone else see the poor performance for restoring clients
with lots of small files and feel that this is a problem Tivoli
needs to address?  I do.  If this issue is not resolved then I
won't be able to keep using TSM to backup our NT servers.

Thanks,
Jeff Connor
Niagara Mohawk Power Corp.


---------------------- Forwarded by Jeffrey P Connor/IT/NMPC on
09/20/2000 10:32 AM ---------------------------
09/20/2000 10:32 AM ---------------------------


Jeffrey P Connor
09/13/2000 01:20 PM

To:   ADSM-L AT VM.MARIST DOT EDU
cc:

Subject:  Slow restore for large NT client.. help!


     We are in the process of restoring a subdirectory of a very
large NT client file space (D:) and it is running really slow.  I
thought I'd see if any of you have some ideas as to where we can
look for bottlenecks.
The client config is:
     Compaq proliant 5500
     400MB RAM
     two 400MHz Xeon processors.
     ~160GB of disk in a Compaq disc array made up of 18.2GB
drives
     Windows NT 4.0 SP6a
     TSM client for NT 3.7.2.01
     Applicable TSM client options:
          tcpwindowsize 63
          tcpbuffsize         31
          tcpnodelay       yes
          txnbytelimit       25600

TSM server config
     TSM for OS/390 V3.7.1.0
     OS/390 2.6
     9672-R55
     TSM server DB cache hit ratio 98.5%
     ApplicableTSM server options:
          TXNGROUPMAX 256
          Databufferpoolsize  262144



Network path:
     NT Client ----100Mbit Ethernet --> Switch -- 100Mbit
Ethernet--> Cisco 7513 rtr -- 155Mbit ATM -> Cisco 5500 atm
switch -->IBM 2216 -->ESCON --> S/390 TSM Server


Now that you have the background here's what we are seeing.

Only 4.7GB have been transfered in 4 hours.  We are attempting to
restore one subdirectory on the D: drive first.
TSM command line client command entered was:
     RES -subd=y \\filecluster2\d$\groups\ugitoper\*
The D: drive has approximately 2,000,000 files.  Lots of small
files.  NT client is a file and print server.
A network sniffer trace shows mostly large chunks of data sent,
no restransmits, then the NT client appears to throttle back,
decreasing the tcpwindow size as if it could not accept the data
as fast as TSM was sending it.  Windows sizes go to zero at times
then bounces back to large window size(64512).
NT perfmon shows plenty of memory and cpu with minimal disk
queueing.



This brings me to my question.  What tools can I use or what
metrics in perfmon can I check to see "under the covers"
to determine what is slowing us down.  The network support staff
feels the network bandwidth is there and feel the NT client is
throttleling things back.  The NT support staff says the NT
client machine is not overwhelmed in terms of CPU, Memory, disk,
etc. they feel TSM is the problem.

What could be the bottleneck on the NT client and what tool can I
use to find it?

Thanks in advance for your assistance,
Jeff Connor
Niagara Mohawk Power Corp
<Prev in Thread] Current Thread [Next in Thread>