ADSM-L

Re: "Lanfree path failed: using lan", reason ?

2004-05-13 16:17:36
Subject: Re: "Lanfree path failed: using lan", reason ?
From: Robert Clark <raclark AT REGENCE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 13 May 2004 13:12:49 -0700
Are tape paths on 3583 based on tape drive serial number or just the rmt
numbers?

We've had some problems with human error on tape paths.  Usually because of
1) the presence or lack of an internal 8mm drive at rmt0 or 2) the lanfree
systems low and high numbered HBAs being connected to the different SANs in
the opposite order to the TSM server.

[RC]



                      "Pawel Wozniczka"
                      <pwozniczka AT KGHM DOT PL>        To:    ADSM-L AT 
VM.MARIST DOT EDU
                      Sent by: "ADSM: Dist       cc:
                      Stor Manager"              Subject:      "Lanfree path 
failed: using lan", reason ?
                      <ADSM-L AT VM.MARIST DOT EDU
                      >


                      05/13/2004 12:22 PM
                      Please respond to
                      "ADSM: Dist Stor
                      Manager"

                      |-------------------|
                      | [ ] Secure E-mail |
                      |-------------------|





Hello all,

I would like to ask for help from more experienced TSM admins about
possible
reasons for lanfree path failures, during oracle-tdpo backups.

First some info about configuration details:

Client(s):

pSeries 660 / 670, OS: Aix 5.1 ML-04 (32bit kernel), Running: Oracle
8.1.7.4

StorageAgent: 5.1.9
Application Client:  TDP Oracle AIX 32bit
TDP Version:  2.2.1.0
Compile Time API Header:  4.2.1.0
Run Time API Header:  4.2.1.0
Client Code Version:  4.2.1.25
Atape: 7.1.5.0

TSM Server:

pSeries 610, OS: Aix 5.1 ML-04 (32bit kernel), TSM Version: 5.1.9

SAN & Storage

HBA: 6228 2Gb (identical firmware on every one)
SAN Switches: IBM 2109-16
Library: IBM 3583 with two SCSI LTO-1 drives connected through san data
gateway

All backups are bound to tape storage pools.

Problem:

I've tried to configure two clients so that they would use lanfree path
method during oracle tdp backups, these two clients are very similar when
it
comes to software components (identical os, tsm packages etc.), the only
difference is that the first one is pSeries 660 and the other is p670. So
far the p660 works great with lanfree backups (not even single failover to
lan path), but unfortunately the p670 is quite the opposite, because during
every oracle backup, after initial quite good san transfers, following
errors are appearing in its tdpoerror.log, which results in failover to
lan:

05/13/04   16:00:09 session.cpp         (1956): sessOpen: Failure in
communications open call. rc: -1
05/13/04   16:00:09 ANS9201W Lanfree path failed: using lan path.
05/13/04   16:01:38 session.cpp         (1956): sessOpen: Failure in
communications open call. rc: -1
05/13/04   16:01:38 ANS9201W Lanfree path failed: using lan path.
05/13/04   16:01:47 session.cpp         (1956): sessOpen: Failure in
communications open call. rc: -1
05/13/04   16:01:47 ANS9201W Lanfree path failed: using lan path.
05/13/04   16:03:11 session.cpp         (1956): sessOpen: Failure in
communications open call. rc: -1
05/13/04   16:03:11 ANS9201W Lanfree path failed: using lan path.

In order to find a reason for this, I undertook following actions:

a) checked fabric ports statistics on san switches, but found no
significant
errors.

b) added "traceflag api api_detail pid tid" to dsm.opt. That action
produced
huge trace file but again no clear errors were found, simply during first
few rman backupsets there was info that lanfree path was being used, but
suddenly at exactly the same time when tdpoerror.log reported "lanfree path
failed error", in the trace file new rman session began but without info
about using (or failure of) lanfree path.

c) added "tdpo_trace_flags orclevel0 orclevel1 orclevel2" to tdpo.opt used
to allocate rman channel, but despite quite detailed trace file, again
everything looked good, no errors.

d) tried to route data traffic to another 3583 library, connected to
another
san switch (few kilometers away), exaclty the same scenario was observed.

Has anyone faced similar problems ?

What else can I do to investigate the problem throughly ?

The only thing in my current configuration (that I'm aware of) that is
againts IBM recommendations is the fact of mixing san disks and san tapes
traffic through the same HBA adapters, but that is rather hard to overcome.

I'm especially curious about the fact that one client works ok, while the
other one (set up to my best knowledge in the same way) fails every time.

Thanks in advance for ANY hints.

Pawel Wozniczka

<Prev in Thread] Current Thread [Next in Thread>