ADSM-L

Re: [ADSM-L] SQLLiteSpeed backups hanging when moved to TSM server on RHEL5.

2011-02-15 17:07:32
Subject: Re: [ADSM-L] SQLLiteSpeed backups hanging when moved to TSM server on RHEL5.
From: Robert Clark <robert.clark7 AT USBANK DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 15 Feb 2011 14:06:42 -0800
Hi Andy,

Yes we did get a tcpdump from the TSM server running RHEL5, and we can see
the server sending a windo size 0 in the ack. This preceeds the session
stop making backup process, and is repeatable.

I've also been told that this failure does not happen while tracing is
enabled in the TSM client.

This text snippet has had its IP addresses changed. 192.168.1.116 is the
TSM server. 10.1.1.235 is the client running LiteSpeed.

16:01:18.507558 IP 192.168.1.116.1505 > 10.1.1.235.2735: F 2235:2235(0)
ack 38543 win 61320
16:01:18.507801 IP 10.1.1.235.2735 > 192.168.1.116.1505: . ack 2236 win
63767
16:01:31.559579 IP 10.1.1.235.2640 > 192.168.1.116.1505: .
149397126:149397127(1) ack 5 win 64473
16:01:31.559687 IP 192.168.1.116.1505 > 10.1.1.235.2640: . ack 149397126
win 0
16:02:01.635149 IP 10.1.1.235.2640 > 192.168.1.116.1505: .
149397126:149397127(1) ack 5 win 64473
16:02:01.635231 IP 192.168.1.116.1505 > 10.1.1.235.2640: . ack 149397126
win 0

Would this be Windows closing the socket when it should be only backing
off instead?

I also have a pcap file for at least one of the fails, and will dig into
that if necessary.

A correction: The text below that mentions "commtcp.cpp", is from the TSM
client side, not from the server side as I indicated.

Thanks,
[RC]




From:
Andrew Raibeck <storman AT US.IBM DOT COM>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
02/15/2011 11:54 AM
Subject:
Re: [ADSM-L] SQLLiteSpeed backups hanging when moved to TSM server on
RHEL5.
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



Robert,

On it's face, this sounds like something in the network, with "network"
being between the TSM client side TCP stack and the TSM server side TCP
stack. Have you done any kind of packet tracing to see what's going on?

Best regards,

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Product Development
Level 3 Team Lead
Internal Notes e-mail: Andrew Raibeck/Hartford/IBM@IBMUS
Internet e-mail: storman AT us.ibm DOT com

IBM Tivoli Storage Manager support web page:
http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager


"ADSM: Dist Stor Manager" <ADSM-L AT vm.marist DOT edu> wrote on 2011-02-15
14:21:57:

> From: Robert Clark <robert.clark7 AT USBANK DOT COM>
> To: ADSM-L AT vm.marist DOT edu
> Date: 2011-02-15 14:24
> Subject: SQLLiteSpeed backups hanging when moved to TSM server on RHEL5.
> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT vm.marist DOT edu>
>
> We're running into a problem when trying to change SQLLiteSpeed backups
> clients to point to new TSM servers.
>
> The old TSM servers are RHEL4 (on Intel) running TSM server 5.5.5.0 with
> efix for APAR IC71586.  (kernel 2.6.9-89.0.11.ELsmp)
>
> The new TSM servers are RHEL5 (on Intel) running TSM server 5.5.5.0 with
> efix for APAR IC71586. (kernel 2.6.18-164.11.1.el5)
>
>
> We've made sure all the relevant values are set the same on the new
> servers, as on the old. (Management classes, disk storage pools,
maxnummp,
> and everything displayed in "q opt" output on the TSM server.)
>
> The two SQLLiteSpeed clients we've used for testing are:
>
> GENERICSYSTEMNAME1
> O/S: 2008
> SQL version: - 10.0.4000.0 (2008 SP2)
> SQL litespeed version: - 5.0.2.0
> TSM Client: 6.1.3.0
>
> GENERICSYSTEMNAME2
> O/S: 2003
> SQL version: - 9.00.4207.00 (2005 SP3)
> SQL litespeed version:- 5.0.2.0
> TSM Client: 6.1.3.0
>
> We have gathered client side trace, and it appears to indicate the
socket
> is being closed:
>
> 02/09/2011 15:07:51.192 : commtcp.cpp (2525): ANS1006I TCP/IP write
error
> on socket = 9300, errno = 10053, reason : An established connection was
> aborted by the software in your host machine.
>
> 02/09/2011 15:07:51.192 : apisend.cpp (1175):
> Contents of verb (0x7) Data, length: 32768:
>
> 02/09/2011 15:07:51.192 : commtcp.cpp (2525): ANS1006I TCP/IP write
error
> on socket = 4294967295, errno = 10038, reason : An operation was
attempted
>
> on something that is not a socket.
>
> We have also gathered server side trace, but nothing unusual has been
> noted there.
>
> The symptom on the TSM server is that backup session stops making
progress
> after a few minutes, and ultimately must be canceled to be cleaned up.
>
> We've opened a case with Tivoli support, and are working with the
> sysadmins of the TSM server.  We're not making much progress. My hope is
> to jog the memory of the list and see if anyone has seen window size or
> other stack weirdness with RHEL 5 that is triggered by LiteSpeed
backups.
>
> Thanks,
> [RC]




U.S. BANCORP made the following annotations
---------------------------------------------------------------------
Electronic Privacy Notice. This e-mail, and any attachments, contains 
information that is, or may be, covered by electronic communications privacy 
laws, and is also confidential and proprietary in nature. If you are not the 
intended recipient, please be advised that you are legally prohibited from 
retaining, using, copying, distributing, or otherwise disclosing this 
information in any manner. Instead, please reply to the sender that you have 
received this communication in error, and then immediately delete it. Thank you 
in advance for your cooperation.



---------------------------------------------------------------------