ADSM-L

Re: [ADSM-L] The continuing saga of TSM 5.5.5.0 on RedHat 5.4 on x86_64.

2011-09-14 19:03:06
Subject: Re: [ADSM-L] The continuing saga of TSM 5.5.5.0 on RedHat 5.4 on x86_64.
From: Andrew Raibeck <storman AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 14 Sep 2011 19:00:07 -0400
Hi Robert,

The RecvW state suggests that the TSM server is waiting on data from the
TSM client, and isn't receiving it. Either the server sent something to the
client and is awaiting a response but the client never got the information;
or the client sent something back, but the server didn't get it. Off-hand
this sounds like an underlying networking-related issue.

I know you are a long-time TSM user, so I imagine that it hasn't been like
this for years. :-) Do you have any history on when this problem started to
manifest? And if so, what environmental changes might have coincided with
that time?

The first thing that came to mind for me is an issue with the Windows
Scalable Network Pack (SNP) that is installed and activated when you
install Windows 2003 SP2. See this document:

https://www-304.ibm.com/support/docview.wss?uid=swg21460285

In addition, do you specify in dsm.opt the TCPWINDOWSIZE option? And if so,
do you deviate from the default value of 63? If so, I recommend commenting
it out, restarting the scheduler service, and see how that works.

Best regards,

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Product Development
Level 3 Team Lead
Internal Notes e-mail: Andrew Raibeck/Hartford/IBM@IBMUS
Internet e-mail: storman AT us.ibm DOT com

IBM Tivoli Storage Manager support web page:
http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager

"ADSM: Dist Stor Manager" <ADSM-L AT vm.marist DOT edu> wrote on 2011-09-14
17:05:35:

> From: Robert Clark <robert.clark7 AT USBANK DOT COM>
> To: ADSM-L AT vm.marist DOT edu
> Date: 2011-09-14 17:07
> Subject: The continuing saga of TSM 5.5.5.0 on RedHat 5.4 on x86_64.
> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT vm.marist DOT edu>
>
> Hi List,
>
> We have a number of TSM clients (5.5.2.0 on mostly Windows 2003 SP2) that
> stop making forward progress in the busy  part of the night, and are just
> sitting in RecvW when we check in the morning.
>
> These clients that aren't making progress appear to be involved as cause
> or effect in the log pinning and getting to 80% full, and then dsmserv
> starts killing the oldest session. (The cynic in me wonders if this pain
> was added to get people to move from 5.5 to 6.x.)
>
> For the problem sessions below, there is only one session per node, and
> based on the received bytes count, appear to correspond to the data
> session and not the meta-data session.
>
> Stopping the backup on the client side 'dsmcutil q /name:"TSM
Scheduler"',
> and then running a manual "dsmc i" finishes a manual incremental (
> typically) in under 5 minutes.
>
> So far, I've upped the MAXNUMMP value for all the effected nodes from 2
to
> 5, and have remove "queryschedperiod 12" and "quiet yes" from the client
> option set.
>
> I have a PMR open with Tivoli support, but wanted to check with the list
> to see if anyone has seen this particular weirdness before.
>
>
> tsm: TSMXX15>q ses
>
>   Sess     Comm.      Sess         Wait       Bytes       Bytes     Sess
> Platform     Client Name
> Number     Method     State        Time        Sent       Recvd     Type
> ------     ------     ------     ------     -------     -------     -----
>   --------     --------------------
>
> THESE SYSTEMS WERE EXPERIENCING THE PROBLEM ON THE MORNING THIS OUTPUT
WAS
> GATHERED:
>
> 56,511     Tcp/Ip     RecvW        0 S        7.4 K     135.8 G     Node
> WinNT        FRED
> 56,844     Tcp/Ip     RecvW      40.5 M         760     640.1 M     Node
> WinNT        BARNEY
> 58,047     Tcp/Ip     RecvW      46.3 M       1.5 K     150.7 M     Node
> WinNT        WILMA
> 58,907     Tcp/Ip     RecvW      9.1 M        4.7 K      56.0 G     Node
> WinNT        DINO
> 59,677     Tcp/Ip     RecvW      33.1 M         736     946.8 M     Node
> WinNT        BAMBAM
>
> THESE SYSTEMS WERE NOT EXPERIENCING THE PROBLEM ON THE MORNING THIS
OUTPUT
> WAS GATHERED:
>
> 64,606     Tcp/Ip     RecvW      37.7 M       1.8 K     156.3 M     Node
> SQLLite-     TRURL_SQL
>      Speed
> 64,725     Tcp/Ip     RecvW      10.0 M       2.5 K       2.2 G     Node
> SQLLite-     TRURL_SQL
>      Speed
> 64,759     Tcp/Ip     IdleW       21 S        8.1 M         530     Node
> AIX          KLAPAUCIUS
> 64,760     Tcp/Ip     RecvW        0 S          352       1.1 G     Node
> AIX          KLAPAUCIUS
>
> THE NODE NAMES IN THIS EMAIL WERE CHANGED TO CONFORM TO POLICY.
>
> Thanks,
> [RC]
> U.S. BANCORP made the following annotations
> ---------------------------------------------------------------------
> Electronic Privacy Notice. This e-mail, and any attachments,
> contains information that is, or may be, covered by electronic
> communications privacy laws, and is also confidential and
> proprietary in nature. If you are not the intended recipient, please
> be advised that you are legally prohibited from retaining, using,
> copying, distributing, or otherwise disclosing this information in
> any manner. Instead, please reply to the sender that you have
> received this communication in error, and then immediately delete
> it. Thank you in advance for your cooperation.
>
>
>
> ---------------------------------------------------------------------
>
<Prev in Thread] Current Thread [Next in Thread>