ADSM-L

Re: Very Puzzling Sessions lost

2003-01-17 10:29:59
Subject: Re: Very Puzzling Sessions lost
From: Andrew Raibeck <storman AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 17 Jan 2003 08:28:04 -0700
If you are getting errno=32, then that means EPIPE ("broken pipe"). This
means that you are almost certainly dealing with some kind of network
problem. Note that by "network", I am referring to any/all the software
and hardware that sits between the TSM client and TSM server software:
operating systems, network drivers, network adapters, cables, routers,
etc. These are among the most frustrating kinds of problems to diagnose
because hunting down the root cause is usually a challenge.

Suggestions to consider:

- If the problem is happening only with a single machine, try swapping the
network card with that of a different machine. Does the problem move with
the network adapter? How about the network cable?

- Put a sniffer (or similar device) on the client machine and trace the
network traffic. That should show you what is happening at the IP layer;
you're network folks should be able to analyze this data.

- Check with AIX support to see how they can help. I don't work on AIX,
but I suspect there is some kind of tool for analyzing TCP/IP traffic on
the machine. Also, they can look for any known TCP/IP issues at the OS
level.

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS
Internet e-mail: storman AT us.eyebm DOT com (change eye to i to reply)

The only dumb question is the one that goes unasked.
The command line is your friend.
"Good enough" is the enemy of excellence.




"Conko, Steven" <sconko AT ADT DOT COM>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
01/17/2003 07:42
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        Very Puzzling Sessions lost



We have some very strange session lost errors showing up on our client.
First the particulars:

Server is AIX 4.3.3 ML 10, TSM V4.2.2
Client is AIX 4.3.3 ML 10, TSM V5.1.5.5 (upgraded several times from
V4.2.2
at TSM Support request to deal with this problem)

Client and server are on separate 100 Mb full duplex subnets.

There are other AIX 4.3.3 ML 10 clients on the same subnet as the failing
client that do not get any errors. network option (no) settings are the
same
on both clients. dsm options have been modified on the failing client to
reflect those on the successful client in troubleshooting to ensure
differing options are not causing a problem.

The client having the problem will start a backup (incremental or archive,
scheduled or manual) and suddenly "freeze" for several minutes before
severing the socket connection and will proceed to reconnect/timeout
almost
continuously. Sometimes it appears to stop in the same spot, other times
not. There doesnt appear to be any reason why certain files would cause a
problem (i.e., they are not open at the time).

Any idea what i can check? I have tried everything... all sorts of
settings
in dsm.sys and no settings. There are not any errors reported on the
system
itself. just "session lost" on the client and "session terminated" on the
server. Ive been working with IBM TSM Support for quite sometime and they
just keep wanting more traces and "upgrade to the latest client."

Im at my wits end.

<Prev in Thread] Current Thread [Next in Thread>