Veritas-bu

[Veritas-bu] Netbackup hung

2004-09-17 01:31:43
Subject: [Veritas-bu] Netbackup hung
From: hampus.lind AT rps.police DOT se (Hampus Lind)
Date: Fri, 17 Sep 2004 07:31:43 +0200
This is a multi-part message in MIME format.

------=_NextPart_000_0012_01C49C88.619B4EB0
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

MessageHi!

I dont know if my problem is the same as yours, but earlier we also had =
problem with vmd.=20
It hung for no reason. The trick at the time was to kill vmd and start =
it again. But the problem keept coming back.
I`am still not excalty sure what fixed the problem. It seemed to occur =
when someone used Netbackup Remote Administration Console to manage =
medias.
I asked all admins at that place to uninstall ther admin console and =
reinstall the correct console version and patch level. I think it was a =
miss match between som admin console and the master that made this =
problem.

But in the same time i did upgrade hpux 11.00 to 11.11.=20
hpux 11.00 is REALLY bad on handle memory and network traffic, so many =
of our problems went away after the uppgrad to hpux 11.11.

Good Luck!

MVH / Hampus Lind
Rikspolisstyrelsen
Tele arb: +46 (0)8 - 401 99 43
Tele mob: +46 (0)70 - 217 92 66
E-mail: hampus.lind AT rps.police DOT se

  ----- Original Message -----=20
  From: Arif Budiman=20
  To: veritas-bu AT mailman.eng.auburn DOT edu=20
  Sent: Friday, September 17, 2004 3:54 AM
  Subject: [Veritas-bu] Netbackup hung


  Does anyone know, what the following log means =
(/usr/openv/volmgr/debug/daemon)? Does it critical ?  Because our =
netbackup process seem to be hung, many of them just in mounting state.

  vmd: could not set TCP_NODELAY   =20

  From device monitor I cant' see any device and get error message : =
network protocol error (MM status 39).
  Veritas support suggest us to increase client_connect_timeout and =
client_read_timeout variable. But I believe it doesn't solve the =
problem. The problem still happen.

  If i  cancel active jobs, it doesn't respond. From the bpsched log I =
get :

  08:42:46.180 [18814] <2> correct_drive_statuses: =
datamover1-hcart-robot-tld-0_MPX-incomplete mounts=3D2, available =
drives=3D12 aj=3D3 aj_cm=3D1
  08:42:46.180 [18814] <2> correct_drive_statuses: =
datamover2_MPX-incomplete mounts=3D2, available drives=3D12 aj=3D2 =
aj_cm=3D0
  08:42:46.180 [18814] <2> correct_drive_statuses: =
jktgrhxmedia_MPX-incomplete mounts=3D5, available drives=3D12 aj=3D6 =
aj_cm=3D1
  08:42:46.180 [18814] <2> correct_drive_statuses: test_DM1-incomplete =
mounts=3D0, available drives=3D0 aj=3D0 aj_cm=3D0
  08:42:46.181 [18814] <2> correct_drive_statuses: =
xl-file02-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available =
drives=3D16 aj=3D0 aj_cm=3D0
  08:42:46.181 [18814] <2> correct_drive_statuses: =
xl-library-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available =
drives=3D15 aj=3D0 aj_cm=3D0
  08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50, invalidate skip count =3D 5
  08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50, invalidate skip count =3D 5
  08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50, invalidate skip count =3D 5
  08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50, invalidate skip count =3D 5
  08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50, invalidate skip count =3D 5
  08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50, invalidate skip count =3D 5
  08:42:47.040 [19990] <2> set_job_details: Sending jobData jobid =
(90418)=20
  08:42:47.040 [19990] <2> send_structure_data: Index 34 Field =
m_nKilobytes Value <200735040>
  08:42:47.040 [19990] <2> send_structure_data: Index 37 Field =
m_nKbPerSec Value <7542>
  08:42:47.041 [19990] <2> set_job_details: Sending jobRunData jobid =
(90418)=20
  08:42:47.041 [19990] <2> send_structure_data: Index 47 Field =
m_nCompletion Value <12>
  08:42:47.041 [19990] <8> read_bpbrm_stderr: WROTE xl-file01_1095393040 =
50048 0 7542.663 0
  08:42:49.040 [19990] <8> read_bpbrm_stderr: CURRENT POSITION STK724 =
1135 0
  08:42:50.680 [8405] <2> salarm: got signal 14
  08:42:51.490 [8352] <2> salarm: got signal 14
  08:42:53.500 [5148] <2> salarm: got signal 14
  08:43:02.040 [19990] <2> set_job_details: Sending jobData jobid =
(90418)=20
  08:43:02.040 [19990] <2> send_structure_data: Index 35 Field m_nFiles =
Value <134000>
  08:43:02.041 [19990] <2> set_job_details: Sending jobRunData jobid =
(90418)=20
  08:43:02.041 [19990] <2> send_structure_data: Index 46 Field =
m_szPathname Value </M/Directorat/NetworkOperation/Network =
Assurance/5-Monitoring Financial/CER Files/BUDGET DIST 08.01.04.xls>
  08:43:02.041 [19990] <8> read_bpbrm_stderr: ADDED FILES TO DB FOR =
xl-file01_1095393040 500 /M/Directorat/NetworkOperation/Network =
Assurance/5-Monitoring Financial/CER Files/BUDGET DIST 08.01.04.xls
  08:43:06.040 [19990] <8> read_bpbrm_stderr: WROTE xl-file01_1095393040 =
50048 0 7539.643 0
  08:43:09.040 [19990] <8> read_bpbrm_stderr: CURRENT POSITION STK724 =
1136 0


  Does anyone have some suggestion how to solve such a bugging =
problem???
  Regards,
  Arif Budiman
  PT Excelcomindo Pratama


------=_NextPart_000_0012_01C49C88.619B4EB0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1458" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hi!</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>I dont know if my problem is the same =
as yours, but=20
earlier we&nbsp;also had problem with vmd. </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>It hung for no reason. The trick at the =
time was to=20
kill vmd and start it again. But the problem keept coming =
back.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>I`am still not excalty sure what fixed =
the problem.=20
It seemed to occur when someone used Netbackup Remote Administration =
Console to=20
manage medias.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>I asked all admins at that place to =
uninstall ther=20
admin console and reinstall the correct console version and patch level. =
I think=20
it was a miss match between som admin console and the master that made =
this=20
problem.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>But in the same time i did upgrade hpux =
11.00 to=20
11.11. </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>hpux 11.00 is REALLY bad on handle =
memory and=20
network traffic, so many of our problems went away after the uppgrad to =
hpux=20
11.11.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Good Luck!</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV>MVH / Hampus Lind<BR>Rikspolisstyrelsen<BR>Tele arb: +46 (0)8 - 401 =
99=20
43<BR>Tele mob: +46 (0)70 - 217 92 66<BR>E-mail: <A=20
href=3D"mailto:hampus.lind AT rps.police DOT se">hampus.lind AT rps.police DOT 
se</A><B=
R></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
  <DIV=20
  style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
  <A title=3DArifB AT xl.co DOT id href=3D"mailto:ArifB AT xl.co DOT id">Arif =
Budiman</A> </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>To:</B> <A=20
  title=3Dveritas-bu AT mailman.eng.auburn DOT edu=20
  =
href=3D"mailto:veritas-bu AT mailman.eng.auburn DOT edu">veritas-bu AT mailman 
DOT eng.=
auburn.edu</A>=20
  </DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Friday, September 17, =
2004 3:54=20
  AM</DIV>
  <DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Veritas-bu] Netbackup =

hung</DIV>
  <DIV><BR></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D041002501-17092004>Does =
anyone know,=20
  what the following log means (/usr/openv/volmgr/debug/daemon)? Does it =

  critical ?&nbsp; Because our netbackup process seem to be&nbsp;hung, =
many of=20
  them just in mounting state.</SPAN></FONT></DIV>
  <DIV><FONT face=3D"Arial Unicode MS" size=3D2></FONT>&nbsp;</DIV>
  <DIV><FONT size=3D+0><FONT face=3D"Courier New"><FONT size=3D2>vmd: =
could not set=20
  TCP_NODELAY<SPAN class=3D041002501-17092004>&nbsp;&nbsp;&nbsp;=20
  </SPAN></FONT></FONT></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial size=3D2>From =
device=20
  monitor I cant' see any device and get error message : <FONT=20
  face=3D"Courier New">network protocol error (MM status=20
  39).</FONT></FONT></SPAN></DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial =
size=3D2>Veritas support=20
  suggest us to=20
  =
increase&nbsp;client_connect_timeout&nbsp;and&nbsp;client_read_timeout&nb=
sp;variable.=20
  But I believe it doesn't solve the problem. The problem still=20
  happen.</FONT></SPAN></DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial=20
  size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial size=3D2>If =
i&nbsp; cancel=20
  active jobs, it doesn't respond. From the bpsched log I get=20
  :</FONT></SPAN></DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial=20
  size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial><FONT =
size=3D2><FONT=20
  face=3D"Courier New">08:42:46.180 [18814] &lt;2&gt; =
correct_drive_statuses:=20
  datamover1-hcart-robot-tld-0_MPX-incomplete mounts=3D2, available =
drives=3D12 aj=3D3=20
  aj_cm=3D1<BR>08:42:46.180 [18814] &lt;2&gt; correct_drive_statuses:=20
  datamover2_MPX-incomplete mounts=3D2, available drives=3D12 aj=3D2=20
  aj_cm=3D0<BR>08:42:46.180 [18814] &lt;2&gt; correct_drive_statuses:=20
  jktgrhxmedia_MPX-incomplete mounts=3D5, available drives=3D12 aj=3D6=20
  aj_cm=3D1<BR>08:42:46.180 [18814] &lt;2&gt; correct_drive_statuses:=20
  test_DM1-incomplete mounts=3D0, available drives=3D0 aj=3D0 =
aj_cm=3D0<BR>08:42:46.181=20
  [18814] &lt;2&gt; correct_drive_statuses:=20
  xl-file02-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available =
drives=3D16 aj=3D0=20
  aj_cm=3D0<BR>08:42:46.181 [18814] &lt;2&gt; correct_drive_statuses:=20
  xl-library-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available =
drives=3D15 aj=3D0=20
  aj_cm=3D0<BR>08:42:46.181 [18814] &lt;2&gt; invalidate_a_m_c_entry: =
cached=20
  threshold =3D 50, invalidate skip count =3D 5<BR>08:42:46.181 [18814] =
&lt;2&gt;=20
  invalidate_a_m_c_entry: cached threshold =3D 50, invalidate skip count =
=3D=20
  5<BR>08:42:46.181 [18814] &lt;2&gt; invalidate_a_m_c_entry: cached =
threshold =3D=20
  50, invalidate skip count =3D 5<BR>08:42:46.181 [18814] &lt;2&gt;=20
  invalidate_a_m_c_entry: cached threshold =3D 50, invalidate skip count =
=3D=20
  5<BR>08:42:46.181 [18814] &lt;2&gt; invalidate_a_m_c_entry: cached =
threshold =3D=20
  50, invalidate skip count =3D 5<BR>08:42:46.181 [18814] &lt;2&gt;=20
  invalidate_a_m_c_entry: cached threshold =3D 50, invalidate skip count =
=3D=20
  5<BR>08:42:47.040 [19990] &lt;2&gt; set_job_details: Sending jobData =
jobid=20
  (90418) <BR>08:42:47.040 [19990] &lt;2&gt; send_structure_data: Index =
34 Field=20
  m_nKilobytes Value &lt;200735040&gt;<BR>08:42:47.040 [19990] &lt;2&gt; =

  send_structure_data: Index 37 Field m_nKbPerSec Value=20
  &lt;7542&gt;<BR>08:42:47.041 [19990] &lt;2&gt; set_job_details: =
Sending=20
  jobRunData jobid (90418) <BR>08:42:47.041 [19990] &lt;2&gt;=20
  send_structure_data: Index 47 Field m_nCompletion Value=20
  &lt;12&gt;<BR>08:42:47.041 [19990] &lt;8&gt; read_bpbrm_stderr: WROTE=20
  xl-file01_1095393040 50048 0 7542.663 0<BR>08:42:49.040 [19990] =
&lt;8&gt;=20
  read_bpbrm_stderr: CURRENT POSITION STK724 1135 0<BR>08:42:50.680 =
[8405]=20
  &lt;2&gt; salarm: got signal 14<BR>08:42:51.490 [8352] &lt;2&gt; =
salarm: got=20
  signal 14<BR>08:42:53.500 [5148] &lt;2&gt; salarm: got signal=20
  14<BR>08:43:02.040 [19990] &lt;2&gt; set_job_details: Sending jobData =
jobid=20
  (90418) <BR>08:43:02.040 [19990] &lt;2&gt; send_structure_data: Index =
35 Field=20
  m_nFiles Value &lt;134000&gt;<BR>08:43:02.041 [19990] &lt;2&gt;=20
  set_job_details: Sending jobRunData jobid (90418) <BR>08:43:02.041 =
[19990]=20
  &lt;2&gt; send_structure_data: Index 46 Field m_szPathname Value=20
  &lt;/M/Directorat/NetworkOperation/Network Assurance/5-Monitoring=20
  Financial/CER Files/BUDGET DIST 08.01.04.xls&gt;<BR>08:43:02.041 =
[19990]=20
  &lt;8&gt; read_bpbrm_stderr: ADDED FILES TO DB FOR =
xl-file01_1095393040 500=20
  /M/Directorat/NetworkOperation/Network Assurance/5-Monitoring =
Financial/CER=20
  Files/BUDGET DIST 08.01.04.xls<BR>08:43:06.040 [19990] &lt;8&gt;=20
  read_bpbrm_stderr: WROTE xl-file01_1095393040 50048 0 7539.643=20
  0<BR>08:43:09.040 [19990] &lt;8&gt; read_bpbrm_stderr: CURRENT =
POSITION STK724=20
  1136 0</FONT><BR></FONT></FONT></SPAN></DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial=20
  size=3D2></FONT></SPAN>&nbsp;</DIV>
  <DIV><SPAN class=3D041002501-17092004><FONT face=3DArial size=3D2>Does =
anyone have=20
  some suggestion how to solve such a bugging =
problem???</FONT></SPAN></DIV><!-- Converted from text/plain format -->
  <P align=3Dleft><FONT face=3D"Arial Unicode MS" color=3D#8080ff=20
  size=3D2>Regards,<BR>Arif Budiman<BR><SPAN =
class=3D041002501-17092004>PT=20
  Excelcomindo Pratama</SPAN></FONT></P>
  <DIV><FONT face=3DArial =
size=3D2></FONT>&nbsp;</DIV></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0012_01C49C88.619B4EB0--


<Prev in Thread] Current Thread [Next in Thread>