Veritas-bu

[Veritas-bu] Very ODD problem with NB5.1MP3a + 2 Solaris Clie nts

2005-11-08 13:44:45
Subject: [Veritas-bu] Very ODD problem with NB5.1MP3a + 2 Solaris Clie nts
From: Mark.Donaldson AT cexp DOT com (Mark.Donaldson AT cexp DOT com)
Date: Tue, 8 Nov 2005 11:44:45 -0700
Once you do something successfully, the addresses may be stored for a while
in the arp cache making later tests behave differently.

You might want to flush the arp cache for these box address between tests.

-M
-----Original Message-----
From: Piszcz, Justin [mailto:jpiszcz AT servervault DOT com]
Sent: Tuesday, November 08, 2005 11:31 AM
To: Mark.Donaldson AT cexp DOT com; pkeating AT bank-banque-canada DOT ca
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Very ODD problem with NB5.1MP3a + 2 Solaris
Clients


Good suggestion, I tried the REQUIRED_INTERFACE, no luck there as it is not
able to "Reach" the open port where bpcd is listening.
The traceroute works fine however and I can ping the box for hours with 0%
packet loss, it seems to only affect that port, HMM, I wonder I I changed
the bpcd port?
 



From: Mark.Donaldson AT cexp DOT com [mailto:Mark.Donaldson AT cexp DOT com] 
Sent: Tuesday, November 08, 2005 1:25 PM
To: Piszcz, Justin; pkeating AT bank-banque-canada DOT ca
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Very ODD problem with NB5.1MP3a + 2 Solaris
Clients
 
Try using the traceroute facilty from master to client & the reverse.  It'll
tell you what interface the traffic is leaving.
 
If there's any doubt to the route to be taken, then use the
REQUIRED_INTERFACE in the bp.conf file to force traffic from the client to
the master over a specific interface.
 
-M
-----Original Message-----
From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu]On Behalf Of Piszcz, 
Justin
Sent: Tuesday, November 08, 2005 10:35 AM
To: Paul Keating
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Very ODD problem with NB5.1MP3a + 2 Solaris
Clients
They are multi-homed but with completely different subnet masks and routes.
Again, there are 5 machines with only 1 path to the backup server, there are
other nics but with separate interfaces+routes+Subnets.
 
I see what you are saying but this is a completely vlan+server-isolated
backup network.
 
I've checked with the network guys they are telling me is a box problem.
 
Any other ideas?
 
Justin.
 



From: Paul Keating [mailto:pkeating AT bank-banque-canada DOT ca] 
Sent: Tuesday, November 08, 2005 12:16 PM
To: Piszcz, Justin
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Very ODD problem with NB5.1MP3a + 2 Solaris
Clients
 
Ummm....are any of your systems multi homed?
 
looks like an asymmetrical routing issue we had here recently.
 
backup server talks to client on NIC A, but client replies back to server
from NIC B via an alternate path.
 
the ARP table on your switch that the backup server and NIC A are connected
to doesn't get populated with the MAC of the client NIC A untill the client
ARPs or tries to talk to via NIC A....then communication works for 5-10
minutes or untill your ARP cache refreshes, then communication is broken
till the client tries to talk to the server again.
 
I'm thinking it's likely not an issue if your clients all are single NIC or
connected to a single switch along with the backup server.
 
check with your network guys to see if they're seeing any broadcasting on
the switches your backup server and clients are attached to.
 
Paul
-----Original Message-----
From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] On Behalf Of Piszcz, 
Justin
Sent: November 8, 2005 11:59 AM
To: Piszcz, Justin; Patrick Whelan
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Very ODD problem with NB5.1MP3a + 2 Solaris
Clients
Name resolution is setup through /etc/hosts on the backup server, no
external DNS servers are used.
The subnet is the same yes.
The bp.conf is the same for the most part, the error occurs on both systems.
 
MASTER -> CLIENTS:
jpiszcz@backup01# telnet 172.16.0.135 bpcd
Trying 172.16.0.135...
^C
jpiszcz@backup01# telnet 172.16.0.136 bpcd
Trying 172.16.0.136...
 
HANGs and HANGs..-How would I debug this problem?
 
CLIENTS -> MASTER
box1# telnet backup01 bpcd
Trying 172.16.0.2...
Connected to backup01.
Escape character is '^]'.
 
box2# telnet backup01 bpcd
Trying 172.16.0.2...
Connected to backup01.
Escape character is '^]'.
 
Now the EXTREMELY WEIRD PART!!! (now I can MASTER->CLIENTS) - NO ISSUE
jpiszcz@backup01# telnet 172.16.0.135 bpcd
Trying 172.16.0.135...
Connected to 172.16.0.135.
Escape character is '^]'.
^]
telnet> Connection to 172.16.0.135 closed.
jpiszcz@backup01# telnet 172.16.0.136 bpcd
Trying 172.16.0.136...
Connected to 172.16.0.136.
Escape character is '^]'.
 
Any freaking clue what is going on here?
 

<Prev in Thread] Current Thread [Next in Thread>