Veritas-bu

[Veritas-bu] Throughput problem any ideas?

2005-05-16 09:41:16
Subject: [Veritas-bu] Throughput problem any ideas?
From: Greg.Hindle AT constellation DOT com (Hindle, Greg)
Date: Mon, 16 May 2005 09:41:16 -0400
This is a multi-part message in MIME format.

------_=_NextPart_001_01C55A1C.EEE54D04
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hello all,
We have been having a problem here with our jobs basically hanging.
Failure rate is 50-80%. This came on all of a sudden and we have been
working on it for almost a week now with many people involved. Our
setup: We currently back up over 1200 servers each day. We have 2 data
centers, with a media server at each location and 1 master at one
location. I will call them site 1 and 2. We have all servers at site 1
backing up to site 2 and all servers at site 2 backing up at site 1.
Both sites have 2 L700 tape units with about 30+ drives.  Our media and
master server have 2 gig nics and using round robin IP addressing,
meaning the IP address is not tied to a card rather they bounce back and
forth in order to maximum throughput. We are using ether channel at one
site that has the master.  This setup worked great since Jan of this
year. Then one night it all stopped. Failures rates were 50-80%. The
media servers would connect to the client pc's then no data would pass.
While others servers worked fine. We struggled and look at everything to
find the cause. No changes were done to the Veritas network or the data
network. Veritas would not help us because they said we were in a
unsupported network config. We did send them some logs and they did say
we have packet reordering problem and that was the extent of the help.
So over the weekend we reduced the nics in our master and media server
to 1 and removed 1 IP address as well in order to stabilize the backup
network. It worked to a point, however we have doubled out backup times.
I am sending this here in hopes that others can share their setup. AND
to also ask if anyone has a setup that IS approved by Veritas that has
the ability to get more than a gig throughput on the media and master
servers. We want to understand what is the best way we should have a our
Solaris 8 master and media servers setup according to Veritas.
=20
=20
Greg Hindle
=20



>>> The information contained in this e-mail transmission is privileged and=
/or confidential intended solely for the exclusive use of the individual ad=
dressee. If you are not the intended addressee you are hereby notified that=
 any retention, disclosure or other use is strictly prohibited. If you have=
 received this notification in error, please immediately contact the sender=
 and delete the material.


------_=_NextPart_001_01C55A1C.EEE54D04
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2627" name=3DGENERATOR></HEAD>
<BODY>
<DIV><SPAN class=3D670555212-16052005><FONT face=3DArial size=3D2>Hello=20
all,</FONT></SPAN></DIV>
<DIV><SPAN class=3D670555212-16052005><FONT face=3DArial size=3D2>We have b=
een having=20
a problem here with our jobs basically hanging. Failure rate is 50-80%. Thi=
s=20
came on all of a sudden and we have been working on it for almost a week no=
w=20
with many people involved.&nbsp;Our setup: We currently back up over 1200=
 servers each day. We have 2 data centers, with a media server at each loca=
tion=20
and 1 master at one location. I will call them site 1 and 2. We have all se=
rvers=20
at site 1 backing up to site 2 and all servers at site 2 backing up at site=
 1.=20
Both sites have 2 L700 tape units with about 30+ drives.&nbsp; Our media an=
d=20
master server have 2 gig nics and using round robin IP addressing, meaning =
the=20
IP address is not tied to a card rather they bounce back and forth in order=
 to=20
maximum throughput. We are using ether channel at one site that has the=20
master.&nbsp; This setup worked great since Jan of this year. Then one nigh=
t it=20
all stopped. Failures rates were 50-80%. The media servers would connect to=
 the=20
client pc's then no data would pass.&nbsp; While others servers worked fine=
. We=20
struggled and look at everything to find the cause. No changes were done to=
 the=20
Veritas network or the data network. Veritas would not help us because they=
 said=20
we were in a unsupported network config. We did send them some logs and the=
y did=20
say we have packet reordering problem and that was the extent of the help. =
So=20
over the weekend we reduced the nics in our master and media server to 1 an=
d=20
removed 1 IP address as well in order to stabilize the backup network. It w=
orked=20
to a point, however we have doubled out backup times. I am sending this her=
e in=20
hopes that others can share their setup. AND to also ask if anyone has a se=
tup=20
that IS approved by Veritas that has the ability to get more than a gig=20
throughput on the media and master servers. We want to understand what is t=
he=20
best way we should have a our Solaris 8 master and media servers setup acco=
rding=20
to Veritas.</FONT></SPAN></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV align=3Dleft><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV align=3Dleft><FONT face=3DArial size=3D2>Greg Hindle</FONT><FONT face=
=3DArial=20
size=3D2></FONT></DIV>
<DIV>&nbsp;</DIV><FONT SIZE=3D2><BR>
<BR>
<BR>
>>> The information contained in this e-mail transmission is privileged and=
/or confidential intended solely for the exclusive use of the individual ad=
dressee. If you are not the intended addressee you are hereby notified that=
 any retention, disclosure or other use is strictly prohibited. If you have=
 received this notification in error, please immediately contact the sender=
 and delete the material.<BR>
</FONT>
</BODY></HTML>

------_=_NextPart_001_01C55A1C.EEE54D04--