ADSM-L

Re: Solaris backup performance

2004-02-16 00:06:33
Subject: Re: Solaris backup performance
From: Ben Bullock <bbullock AT MICRON DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Sun, 15 Feb 2004 22:05:59 -0700
        Sure,
        Here is the output from our TSM server (AIX 5.2 ML 2):

root:># netstat -p tcp 
tcp:
        1465151373 packets sent
                11229967 data packets (3524761987 bytes)
                2304 data packets (2143444 bytes) retransmitted
                17728969 ack-only packets (3592646 delayed)
                0 URG only packets
                0 window probe packets
                1435932298 window update packets
                258179 control packets
                0 large sends
                0 bytes sent using largesend
                0 bytes is the biggest largesend
        157281361 packets received
                4012588 acks (for 3524781775 bytes)
                111150 duplicate acks
                0 acks for unsent data
                139610354 packets (3352535089 bytes) received
in-sequence
                293632 completely duplicate packets (1423234164 bytes)
                23 old duplicate packets
                2725 packets with some dup. data (8004787 bytes duped)
                13568100 out-of-order packets (3552467717 bytes)
                4990 packets (81703 bytes) of data after window
                4849 window probes
                54031 window update packets
                6332 packets received after close
                0 packets with bad hardware assisted checksum
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                3268 discarded by listeners
                0 discarded due to listener's queue full
                2549691 ack packet headers correctly predicted
                117391230 data packet headers correctly predicted
        110040 connection requests
        107496 connection accepts
        217339 connections established (including accepts)
        231585 connections closed (including 98514 drops)
        0 connections with ECN capability
        0 times responded to ECN
        192 embryonic connections dropped
        3151572 segments updated rtt (of 1778235 attempts)
        0 segments with congestion window reduced bit set
        0 segments with congestion experienced bit set
        0 resends due to path MTU discovery
        0 path MTU discovery terminations due to retransmits
        429 retransmit timeouts
                8 connections dropped by rexmit timeout
        365 fast retransmits
                0 when congestion window less than 4 segments
        1439 newreno retransmits
        12 times avoided false fast retransmits
        2 persist timeouts
                0 connections dropped due to persist timeout
        146 keepalive timeouts
                138 keepalive probes sent
                8 connections dropped by keepalive
        0 times SACK blocks array is extended
        0 times SACK holes array is extended
        325 packets dropped due to memory allocation failure
        0 connections in timewait reused
        0 delayed ACKs for SYN
        0 delayed ACKs for FIN
        0 send_and_disconnects
        0 spliced connections
        0 spliced connections closed
        0 spliced connections reset
        0 spliced connections timeout
        0 spliced connections persist timeout
        0 spliced connections keepalive timeout

        As far as the disk subsystems, on the Solaris host pushing the
data, it is coming off an EMC Symmetrix 8730 with 6 fibre paths to the
host. Lots of bandwidth.

        On the TSM server we have SSA 7133-D40 disk drawers, but they
are pretty much irrelevent in our situation since the data is going
straight to 7 IBM 3495E1A tape drives. Yes, that makes the tape drives
the bottleneck in the path, but with 7 tapes running we were only
getting about half the throughput as we should have been.

        Hope that's informative enough.

Ben

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Dave Canan
Sent: Sunday, February 15, 2004 10:27 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Solaris backup performance


Ben/Bill,

         We have had several performance pmrs in this area recently
after customers have upgraded to AIX 5.2. Could you please post the
output from a netstat -p tcp command to the listserv for me to look at?
Thanks.

         Also, please provide a description of the disk subsystems being
used for the cilient and server.


At 11:45 AM 2/13/2004 -0700, you wrote:
>         Bill,
>         It sounds like we are fighting a similar problem. We upgraded 
>the last of our TSM servers from AIX4.3.3/TSM5.1.1.0 to 
>AIX5.2ML2/TSM5.2.1.3 on Tuesday.  This TSM server has the largest load 
>of our servers. Both the TSM server and this particular client have GB 
>interfaces (the Solaris host has a SysKonnect GB interface).
>
>         We are having ~very~ serious problems as a 1.5TB database 
>backup from the Solaris host has doubled in the time it takes to get 
>done. We too see the RecW on the sessions which is atypical. We are 
>sweating bullets, and I have opened calls with IBM (for the OS) and 
>Tivoli, but have yet to get a resolution.
>
>         We have the Solaris client push the data to the TSM server in 
>6 threads, each of these threads goes directly to a 3595 tape drive. So

>a real blast of data from one client. Before the upgrade we were 
>hitting almost 70MB/second, but now we flatline at about 45MB/second. 
>We did not have jumbo frames on, just a regular 1500 MTU size.
>
>         Since we upgraded both the OS and the TSM server at the same 
>time, we are unsure of which changed introduced the problem. At this 
>point, my gut feeling is that the OS somehow cannot run packets through

>the TCP stack as fast as before. With an "entstat -d en? " command on 
>the TSM GB interface, I see many many "No Resource errors" being logged

>when the GB is running hard. IBM network support says that those errors

>may be caused by the application not handling the packets from the TCP 
>stack properly.
>
>....
>
>         Just as I was composing this, we turned on Jumbo frames on the

>TSM server, the switches in between and on the client. Our 6-thread 
>push is now running almost as fast as before, but we are still getting 
>those "No resource errors" on the TSM GB interface.
>
>         My feeling is still an OS problem with the TCP stack when it 
>is being pushed hard.
>
>         Ben
>
>
>-----Original Message-----
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
>Of Bill Boyer
>Sent: Thursday, February 12, 2004 6:28 PM
>To: ADSM-L AT VM.MARIST DOT EDU
>Subject: Solaris backup performance
>
>
>TSM client 5.2.2.0 on a Solaris 7 server, gigabit fibre Ethernet 
>adapter. TSM Server 5.2.2.1 on AIX 5.2 p630 with fibre gigabit 
>Ethernet. Different VLAN.
>
>During the backup of this client (actually 4 out of 11 Solaris clients)

>the backup just drags along. Looking at the sessions on the server we 
>see the session in RecvW state, but the wait time just goes up and 
>up..sometimes in the double-digits before more data goes in.
>
>Doing a large FTP to that server (130MB) took just a few seconds. The 
>other 7 Solaris clients, same TSM client, same OS level, backup like 
>you would expect over gbit.
>
>According to the switch, the ports which are configured for 
>autonegotiate, are at full speed and duplex.
>
>Anyone got ideas? Searching the archive brings hits on 10/100 ethernet 
>adapters not being set correctly.
>
>Bill Boyer
>"Some days you are the bug, some days you are the windshield." - ??

Dave Canan
TSM Performance
IBM Advanced Technical Support
ddcanan AT us.ibm DOT com

<Prev in Thread] Current Thread [Next in Thread>