Re: Solaris backup performance

Ben/Bill,

        We have had several performance pmrs in this area recently after
customers have upgraded to AIX 5.2. Could you please post the output from a
netstat -p tcp command to the listserv for me to look at? Thanks.

        Also, please provide a description of the disk subsystems being
used for the cilient and server.


At 11:45 AM 2/13/2004 -0700, you wrote:

        Bill,
        It sounds like we are fighting a similar problem. We upgraded
the last of our TSM servers from AIX4.3.3/TSM5.1.1.0 to
AIX5.2ML2/TSM5.2.1.3 on Tuesday.  This TSM server has the largest load
of our servers. Both the TSM server and this particular client have GB
interfaces (the Solaris host has a SysKonnect GB interface).

        We are having ~very~ serious problems as a 1.5TB database backup
from the Solaris host has doubled in the time it takes to get done. We
too see the RecW on the sessions which is atypical. We are sweating
bullets, and I have opened calls with IBM (for the OS) and Tivoli, but
have yet to get a resolution.

        We have the Solaris client push the data to the TSM server in 6
threads, each of these threads goes directly to a 3595 tape drive. So a
real blast of data from one client. Before the upgrade we were hitting
almost 70MB/second, but now we flatline at about 45MB/second. We did not
have jumbo frames on, just a regular 1500 MTU size.

        Since we upgraded both the OS and the TSM server at the same
time, we are unsure of which changed introduced the problem. At this
point, my gut feeling is that the OS somehow cannot run packets through
the TCP stack as fast as before. With an "entstat -d en? " command on
the TSM GB interface, I see many many "No Resource errors" being logged
when the GB is running hard. IBM network support says that those errors
may be caused by the application not handling the packets from the TCP
stack properly.

....

        Just as I was composing this, we turned on Jumbo frames on the
TSM server, the switches in between and on the client. Our 6-thread push
is now running almost as fast as before, but we are still getting those
"No resource errors" on the TSM GB interface.

        My feeling is still an OS problem with the TCP stack when it is
being pushed hard.

        Ben


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Bill Boyer
Sent: Thursday, February 12, 2004 6:28 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Solaris backup performance


TSM client 5.2.2.0 on a Solaris 7 server, gigabit fibre Ethernet
adapter. TSM Server 5.2.2.1 on AIX 5.2 p630 with fibre gigabit Ethernet.
Different VLAN.

During the backup of this client (actually 4 out of 11 Solaris clients)
the backup just drags along. Looking at the sessions on the server we
see the session in RecvW state, but the wait time just goes up and
up..sometimes in the double-digits before more data goes in.

Doing a large FTP to that server (130MB) took just a few seconds. The
other 7 Solaris clients, same TSM client, same OS level, backup like you
would expect over gbit.

According to the switch, the ports which are configured for
autonegotiate, are at full speed and duplex.

Anyone got ideas? Searching the archive brings hits on 10/100 ethernet
adapters not being set correctly.

Bill Boyer
"Some days you are the bug, some days you are the windshield." - ??


Dave Canan
TSM Performance
IBM Advanced Technical Support
ddcanan AT us.ibm DOT com