Veritas-bu

[Veritas-bu] NB5.0MP3 on Sol8 with Win2003 clients.. backups stalling

2005-03-10 10:27:51
Subject: [Veritas-bu] NB5.0MP3 on Sol8 with Win2003 clients.. backups stalling
From: kris.williams AT hp DOT com (Williams, Kristopher L)
Date: Thu, 10 Mar 2005 09:27:51 -0600
This is a multi-part message in MIME format.

------_=_NextPart_001_01C52585.BBD72437
Content-Type: text/plain;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Paul,
=20
=20
I have the same problem too, but I have a Win2k Master (NBU5.0MP3),
Win2k Media Server (NBU5.0MP3) and various clients both NT4.0, Win2k and
Win2k3 that have this problem. We are going to a STK L700 with LTO1
drives all FC connected. You are correct, it does seem pretty random. A
different box each night that throttles down to a crawl. When I try to
kill the job the next morning, nothing happens. I have to restart the
media servers to get the jobs to die.
=20
Has anyone seen and fixed this?
=20
=20
Thanks,
=20
Kris

________________________________

From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] On Behalf Of Paul
Keating
Sent: Wednesday, March 09, 2005 3:18 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] NB5.0MP3 on Sol8 with Win2003 clients.. backups
stalling


I have several machines that seem to stall during their backups.
In all cases they are Win 2003...backing up to a Sunfire V880 runing
Solaris 8 and NB5.0MP3 to STK L700 with FC connected LTO2 drives.
Backups start out fine, then the throughput gradually drops untill it is
in the neighbourhood of 25KB/s
Basically, at that point, the choice it to wait 3 days for it to finish,
or kill the job.
In every case so far, if I kill the job and run a manual, the manual
will run fine.
This is an issue in several cases since the machines have databases that
are backed up cold. (DBA's preference)
Because of this, the backup doesn't complete, therefore the
bpend_notify.bat doesn't kick off, the DBs don't restart and the clients
get in to find the service is down....also manuals can't be run on the
DB servers during the day, since the job will shutdown the DB.
=20
All the DB stuff is kind of secondary, however, since there are dozens
of way of remediating that situation.
I do want to treat the root cause, which is these stalling backup jobs.
=20
There are actually 4 machines on which this stalling is an issue (only 2
of them happen to be SQL servers).
The machines all have 100FD connections. The switch and NIC are both
hard set to 100FD.
NB client config is perfect. These 4 machines are backing up using vnetd
through a firewall (as are about 30+ others in the same configuration,
but without issue.)
It seems that probably 3 days a week, one of these machines will
stall.....at random....rarely the same machine twice in a row....very
unpredictable when one will stall......At this point, i have a VERY
clean environment.....with the exception of these machines, I have 100%
success rate.....
=20
Any ideas???
=20
Paul

------_=_NextPart_001_01C52585.BBD72437
Content-Type: text/html;
        charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2800.1458" name=3DGENERATOR></HEAD>
<BODY>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Paul,</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>I have the same problem too, but I have a Win2k =
Master=20
(NBU5.0MP3), Win2k Media Server (NBU5.0MP3) and various clients both =
NT4.0,=20
Win2k and Win2k3 that have this problem. We are going to a STK L700 with =
LTO1=20
drives all FC connected. You are correct, it does seem pretty random. A=20
different box each night that throttles down to a crawl. When I try to =
kill the=20
job the next morning, nothing happens. I have to restart the media =
servers to=20
get the jobs to die.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Has anyone seen and fixed =
this?</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Thanks,</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D816462415-10032005><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Kris</FONT></SPAN></DIV><BR>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> =
veritas-bu-admin AT mailman.eng.auburn DOT edu=20
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] <B>On Behalf Of =
</B>Paul=20
Keating<BR><B>Sent:</B> Wednesday, March 09, 2005 3:18 PM<BR><B>To:</B>=20
veritas-bu AT mailman.eng.auburn DOT edu<BR><B>Subject:</B> [Veritas-bu] =
NB5.0MP3 on=20
Sol8 with Win2003 clients.. backups stalling<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>I have =
several=20
machines that seem to stall during their backups.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>In all =
cases they=20
are Win 2003...backing up to a Sunfire V880 runing Solaris 8 and =
NB5.0MP3 to STK=20
L700 with FC connected LTO2 drives.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial =
size=3D2>Backups start out=20
fine, then the throughput gradually drops untill it is in the =
neighbourhood of=20
25KB/s</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial =
size=3D2>Basically, at that=20
point, the choice it to wait 3 days for it to finish, or kill the=20
job.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>In =
every case so=20
far, if I kill the job and run a manual, the manual will run=20
fine.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>This =
is an issue in=20
several cases&nbsp;since the&nbsp;machines have databases that are =
backed up=20
cold. (DBA's preference)</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial =
size=3D2>Because of this, the=20
backup doesn't complete, therefore the bpend_notify.bat doesn't kick =
off, the=20
DBs don't restart&nbsp;and the clients get in to find the service is=20
down....also manuals can't be run on the DB servers during the day, =
since the=20
job will shutdown the DB.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>All =
the DB stuff is=20
kind of secondary, however, since there are dozens of way of remediating =
that=20
situation.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>I do =
want to treat=20
the root cause, which is these stalling backup jobs.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>There =
are actually 4=20
machines on which this stalling is an issue (only 2 of them happen to be =
SQL=20
servers).</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>The =
machines all=20
have 100FD connections. The switch and NIC are both hard set to=20
100FD.</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>NB =
client config is=20
perfect. These 4 machines are backing up using vnetd through a firewall =
(as are=20
about 30+ others in the same configuration, but without=20
issue.)</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>It =
seems that=20
probably 3 days a week, one of these machines will stall.....at =
random....rarely=20
the same machine twice in a row....very unpredictable when one will=20
stall......At this point, i have a VERY clean environment.....with the =
exception=20
of these machines, I have 100% success rate.....</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial size=3D2>Any=20
ideas???</FONT></SPAN></DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D082360020-09032005><FONT face=3DArial=20
size=3D2>Paul</FONT></SPAN></DIV></BODY></HTML>

------_=_NextPart_001_01C52585.BBD72437--