Veritas-bu

[Veritas-bu] Netbackup hung

2004-09-16 21:54:49
Subject: [Veritas-bu] Netbackup hung
From: ArifB AT xl.co DOT id (Arif Budiman)
Date: Fri, 17 Sep 2004 08:54:49 +0700
This is a multi-part message in MIME format.

------_=_NextPart_001_01C49C59.7F34CDFE
Content-Type: text/plain;
        charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

Does anyone know, what the following log means
(/usr/openv/volmgr/debug/daemon)? Does it critical ?  Because our
netbackup process seem to be hung, many of them just in mounting state.
=20
vmd: could not set TCP_NODELAY   =20
=20
>From device monitor I cant' see any device and get error message :
network protocol error (MM status 39).
Veritas support suggest us to increase client_connect_timeout and
client_read_timeout variable. But I believe it doesn't solve the
problem. The problem still happen.
=20
If i  cancel active jobs, it doesn't respond. From the bpsched log I get
:
=20
08:42:46.180 [18814] <2> correct_drive_statuses:
datamover1-hcart-robot-tld-0_MPX-incomplete mounts=3D2, available
drives=3D12 aj=3D3 aj_cm=3D1
08:42:46.180 [18814] <2> correct_drive_statuses:
datamover2_MPX-incomplete mounts=3D2, available drives=3D12 aj=3D2 =
aj_cm=3D0
08:42:46.180 [18814] <2> correct_drive_statuses:
jktgrhxmedia_MPX-incomplete mounts=3D5, available drives=3D12 aj=3D6 =
aj_cm=3D1
08:42:46.180 [18814] <2> correct_drive_statuses: test_DM1-incomplete
mounts=3D0, available drives=3D0 aj=3D0 aj_cm=3D0
08:42:46.181 [18814] <2> correct_drive_statuses:
xl-file02-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available =
drives=3D16
aj=3D0 aj_cm=3D0
08:42:46.181 [18814] <2> correct_drive_statuses:
xl-library-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available
drives=3D15 aj=3D0 aj_cm=3D0
08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50,
invalidate skip count =3D 5
08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50,
invalidate skip count =3D 5
08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50,
invalidate skip count =3D 5
08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50,
invalidate skip count =3D 5
08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50,
invalidate skip count =3D 5
08:42:46.181 [18814] <2> invalidate_a_m_c_entry: cached threshold =3D =
50,
invalidate skip count =3D 5
08:42:47.040 [19990] <2> set_job_details: Sending jobData jobid (90418)=20
08:42:47.040 [19990] <2> send_structure_data: Index 34 Field
m_nKilobytes Value <200735040>
08:42:47.040 [19990] <2> send_structure_data: Index 37 Field m_nKbPerSec
Value <7542>
08:42:47.041 [19990] <2> set_job_details: Sending jobRunData jobid
(90418)=20
08:42:47.041 [19990] <2> send_structure_data: Index 47 Field
m_nCompletion Value <12>
08:42:47.041 [19990] <8> read_bpbrm_stderr: WROTE xl-file01_1095393040
50048 0 7542.663 0
08:42:49.040 [19990] <8> read_bpbrm_stderr: CURRENT POSITION STK724 1135
0
08:42:50.680 [8405] <2> salarm: got signal 14
08:42:51.490 [8352] <2> salarm: got signal 14
08:42:53.500 [5148] <2> salarm: got signal 14
08:43:02.040 [19990] <2> set_job_details: Sending jobData jobid (90418)=20
08:43:02.040 [19990] <2> send_structure_data: Index 35 Field m_nFiles
Value <134000>
08:43:02.041 [19990] <2> set_job_details: Sending jobRunData jobid
(90418)=20
08:43:02.041 [19990] <2> send_structure_data: Index 46 Field
m_szPathname Value </M/Directorat/NetworkOperation/Network
Assurance/5-Monitoring Financial/CER Files/BUDGET DIST 08.01.04.xls>
08:43:02.041 [19990] <8> read_bpbrm_stderr: ADDED FILES TO DB FOR
xl-file01_1095393040 500 /M/Directorat/NetworkOperation/Network
Assurance/5-Monitoring Financial/CER Files/BUDGET DIST 08.01.04.xls
08:43:06.040 [19990] <8> read_bpbrm_stderr: WROTE xl-file01_1095393040
50048 0 7539.643 0
08:43:09.040 [19990] <8> read_bpbrm_stderr: CURRENT POSITION STK724 1136
0

=20
Does anyone have some suggestion how to solve such a bugging problem???

Regards,
Arif Budiman
PT Excelcomindo Pratama

=20

------_=_NextPart_001_01C49C59.7F34CDFE
Content-Type: text/html;
        charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<TITLE>Message</TITLE>

<META content=3D"MSHTML 6.00.2800.1106" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D041002501-17092004>Does =
anyone know,=20
what the following log means (/usr/openv/volmgr/debug/daemon)? Does it =
critical=20
?&nbsp; Because our netbackup process seem to be&nbsp;hung, many of them =
just in=20
mounting state.</SPAN></FONT></DIV>
<DIV><FONT face=3D"Arial Unicode MS" size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT><FONT face=3D"Courier New"><FONT size=3D2>vmd: could not set=20
TCP_NODELAY<SPAN class=3D041002501-17092004>&nbsp;&nbsp;&nbsp;=20
</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial size=3D2>From =
device monitor=20
I cant' see any device and get error message : <FONT face=3D"Courier =
New">network=20
protocol error (MM status 39).</FONT></FONT></SPAN></DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial =
size=3D2>Veritas support=20
suggest us to=20
increase&nbsp;client_connect_timeout&nbsp;and&nbsp;client_read_timeout&nb=
sp;variable.=20
But I believe it doesn't solve the problem. The problem still=20
happen.</FONT></SPAN></DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial size=3D2>If =
i&nbsp; cancel=20
active jobs, it doesn't respond. From the bpsched log I get=20
:</FONT></SPAN></DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial><FONT =
size=3D2><FONT=20
face=3D"Courier New">08:42:46.180 [18814] &lt;2&gt; =
correct_drive_statuses:=20
datamover1-hcart-robot-tld-0_MPX-incomplete mounts=3D2, available =
drives=3D12 aj=3D3=20
aj_cm=3D1<BR>08:42:46.180 [18814] &lt;2&gt; correct_drive_statuses:=20
datamover2_MPX-incomplete mounts=3D2, available drives=3D12 aj=3D2=20
aj_cm=3D0<BR>08:42:46.180 [18814] &lt;2&gt; correct_drive_statuses:=20
jktgrhxmedia_MPX-incomplete mounts=3D5, available drives=3D12 aj=3D6=20
aj_cm=3D1<BR>08:42:46.180 [18814] &lt;2&gt; correct_drive_statuses:=20
test_DM1-incomplete mounts=3D0, available drives=3D0 aj=3D0 =
aj_cm=3D0<BR>08:42:46.181=20
[18814] &lt;2&gt; correct_drive_statuses:=20
xl-file02-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available =
drives=3D16 aj=3D0=20
aj_cm=3D0<BR>08:42:46.181 [18814] &lt;2&gt; correct_drive_statuses:=20
xl-library-hcart-robot-tld-0-MPX-incomplete mounts=3D0, available =
drives=3D15 aj=3D0=20
aj_cm=3D0<BR>08:42:46.181 [18814] &lt;2&gt; invalidate_a_m_c_entry: =
cached=20
threshold =3D 50, invalidate skip count =3D 5<BR>08:42:46.181 [18814] =
&lt;2&gt;=20
invalidate_a_m_c_entry: cached threshold =3D 50, invalidate skip count =
=3D=20
5<BR>08:42:46.181 [18814] &lt;2&gt; invalidate_a_m_c_entry: cached =
threshold =3D=20
50, invalidate skip count =3D 5<BR>08:42:46.181 [18814] &lt;2&gt;=20
invalidate_a_m_c_entry: cached threshold =3D 50, invalidate skip count =
=3D=20
5<BR>08:42:46.181 [18814] &lt;2&gt; invalidate_a_m_c_entry: cached =
threshold =3D=20
50, invalidate skip count =3D 5<BR>08:42:46.181 [18814] &lt;2&gt;=20
invalidate_a_m_c_entry: cached threshold =3D 50, invalidate skip count =
=3D=20
5<BR>08:42:47.040 [19990] &lt;2&gt; set_job_details: Sending jobData =
jobid=20
(90418) <BR>08:42:47.040 [19990] &lt;2&gt; send_structure_data: Index 34 =
Field=20
m_nKilobytes Value &lt;200735040&gt;<BR>08:42:47.040 [19990] &lt;2&gt;=20
send_structure_data: Index 37 Field m_nKbPerSec Value=20
&lt;7542&gt;<BR>08:42:47.041 [19990] &lt;2&gt; set_job_details: Sending=20
jobRunData jobid (90418) <BR>08:42:47.041 [19990] &lt;2&gt; =
send_structure_data:=20
Index 47 Field m_nCompletion Value &lt;12&gt;<BR>08:42:47.041 [19990] =
&lt;8&gt;=20
read_bpbrm_stderr: WROTE xl-file01_1095393040 50048 0 7542.663 =
0<BR>08:42:49.040=20
[19990] &lt;8&gt; read_bpbrm_stderr: CURRENT POSITION STK724 1135=20
0<BR>08:42:50.680 [8405] &lt;2&gt; salarm: got signal 14<BR>08:42:51.490 =
[8352]=20
&lt;2&gt; salarm: got signal 14<BR>08:42:53.500 [5148] &lt;2&gt; salarm: =
got=20
signal 14<BR>08:43:02.040 [19990] &lt;2&gt; set_job_details: Sending =
jobData=20
jobid (90418) <BR>08:43:02.040 [19990] &lt;2&gt; send_structure_data: =
Index 35=20
Field m_nFiles Value &lt;134000&gt;<BR>08:43:02.041 [19990] &lt;2&gt;=20
set_job_details: Sending jobRunData jobid (90418) <BR>08:43:02.041 =
[19990]=20
&lt;2&gt; send_structure_data: Index 46 Field m_szPathname Value=20
&lt;/M/Directorat/NetworkOperation/Network Assurance/5-Monitoring =
Financial/CER=20
Files/BUDGET DIST 08.01.04.xls&gt;<BR>08:43:02.041 [19990] &lt;8&gt;=20
read_bpbrm_stderr: ADDED FILES TO DB FOR xl-file01_1095393040 500=20
/M/Directorat/NetworkOperation/Network Assurance/5-Monitoring =
Financial/CER=20
Files/BUDGET DIST 08.01.04.xls<BR>08:43:06.040 [19990] &lt;8&gt;=20
read_bpbrm_stderr: WROTE xl-file01_1095393040 50048 0 7539.643 =
0<BR>08:43:09.040=20
[19990] &lt;8&gt; read_bpbrm_stderr: CURRENT POSITION STK724 1136=20
0</FONT><BR></FONT></FONT></SPAN></DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D041002501-17092004><FONT face=3DArial size=3D2>Does =
anyone have=20
some suggestion how to solve such a bugging =
problem???</FONT></SPAN></DIV><!-- Converted from text/plain format -->
<P align=3Dleft><FONT face=3D"Arial Unicode MS" color=3D#8080ff=20
size=3D2>Regards,<BR>Arif Budiman<BR><SPAN class=3D041002501-17092004>PT =

Excelcomindo Pratama</SPAN></FONT></P>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV></BODY></HTML>

------_=_NextPart_001_01C49C59.7F34CDFE--

<Prev in Thread] Current Thread [Next in Thread>