ADSM-L

Re: Possibly OT: How to diagnose 3494 ATL "communications" failur es

2004-06-11 11:46:43
Subject: Re: Possibly OT: How to diagnose 3494 ATL "communications" failur es
From: "Coats, Jack" <Jack.Coats AT BANKSTERLING DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 11 Jun 2004 10:46:41 -0500
Sounds like more of a networking issue.  You might consider some subnet
routing
of the traffic, so you don't blast it everywhere on your local subnet.

something like

route add net 172.4.0.0/16 10.23.21.2 1

Where you are trying to get traffic for the 172.4.x.x subnet routed through
a router/machine 10.23.21.2

... LOL ... JC

-----Original Message-----
From: Zoltan Forray/AC/VCU [mailto:zforray AT VCU DOT EDU]
Sent: Friday, June 11, 2004 10:41 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: Possibly OT: How to diagnose 3494 ATL "communications"
failures


Well, after further analysis, this topic has had some strange twists.

Per some suggestions, I had IBM replace the NIC in the LM.      This had
absolutely no effect on the problem.

I have been discussing this issue with our networking folks. Their initial
review showed massive amounts of BROADCAST traffic on this subnet, which
is a private, GIG-E internal network, with connections to the outside
world, primarily for TSM backups traffic.

I updated ATAPE, which was a bit behind. I could not see anything in the
history of changes, that would address this kind of situation.

Now, my networking folks have put a sniffer on this private subnet.

Image my suprise when the biggest causer of broadcast traffic, is the TSM
AIX server, itself !!!!!

They also said that the peak in broadcast traffic correlated to backup
traffic from a box that is outside the private subnet (i.e. does not have
a direct connection to the same switch).

Anyone have any suggestions on why the TSM server would be doing this ?
This is an AIX 5.1 TSM 5.2.1.3 system, that is *EXCLUSIVELY* used for TSM
backups.

Could this have anything to do with the HLADDRESS parms on the node
definitions ?   Possibly a bug that has been fixed in a later release of
the TSM server ?

*NOTHING* has changed on the AIX system for the past 6-months, when it
comes to the AIX and TSM server software, itself !  The last upgrade was
from AIX 4.3.3 to 5.1 and the TSM server, at the same time.




Richard Sims <rbs AT BU DOT EDU>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
05/24/2004 07:41 PM
Please respond to
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>


To
ADSM-L AT VM.MARIST DOT EDU
cc

Subject
Re: Possibly OT: How to diagnose 3494 ATL "communications" failur es






>This hardware/system has been in place for years, without change.
>
>My bets are on a problem with the LM itself.
>
>This weekend, the connection died, again. No non-distruptive attempts to
>restablish the connection with the LM, worked. Yes, both boxes could PING
>each other.  As I told IBM, this is *NOT* a connectivity issue, in the
>lan/network sense. This is a "the LM is not responding as an LM".
>
>The only way we got it to work was to reinitialize the LAN ports from the
>LM/ATL.

Well, there has been change: it's gotten older!  (I've become an expert on
the subject.)  You could be seeing the effects of a deteriorating network
card or the like...which could be aggravated if the library is not on UPS
or power that is otherwise conditioned.

Watch out also if the library is not behind a firewall.  I worry about
these
"embedded system" computers in that they rarely see any updates, and yet
we
know that "holes" in operating systems are periodically found.  In the
right
network circumstances, odd behavior from such a system may be the result
of
someone trying to hack the box.

    me again,  Richard Sims

<Prev in Thread] Current Thread [Next in Thread>