This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C4331F.2320C26E
Content-Type: text/plain;
charset="iso-8859-1"
Back in November of 2003, I posted the message below. It took a while to
finally resolve this issue. For some reason, I buried all the email and
docs. Since then, many people on this list have emailed me directory asking
what the fix was. I finally found all the information relating to how I
fixed this problem, and the steps I took to narrow down what I determined
what was directly causing this error.
Here is the original email:
Hi,
I am having a problem with a client. I have a NetBackup 4.5FP4 master, which
is Solaris 8, a Solaris 8 media server at same NB patch level, and a Solaris
8 client, which is backing up to the media server. I started getting socket
errors Monday night. This is what's in the bpbkar logs on the client:
04:12:21.975 [6289] <16> flush_archive(): ERR - Cannot write to STDOUT.
Errno = 32: Broken pipe
04:12:21.975 [6289] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24:
socket write failed
04:12:21.975 [6289] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write
failed
04:12:21.976 [6289] <2> bpbkar Exit: INF - Close of stdout complete
Any help would be greatly appreciated!
description of problem:
I had a very high profile client residing on an E10k-using 5 boards with
over a terabyte of EMC storage. On a filesystem 5 levels deep,
/app/www/webpg/dev/docs, there were 2 files that caused NB to die as soon as
they were next in line to be backed up. The backups of this client went
through the media server, which had a Sun/Storage Tek L700 library directly
attached to it. The bpbkar logs on this client showed exactly the error and
what file NB died on:
23:11:31.927 [11700] <2> bpbkar resolve_path: INF - Actual mount point of
/dev/vx/rdsk/db4 is /dev/vx/rdsk/db4
23:11:31.927 [11700] <2> bpbkar SelectFile: INF - Resolved_path =
/dev/vx/rdsk/db4/V282crbatrl1
23:11:31.932 [11700] <4> bpbkar PrintFile: /dev/vx/rdsk/db4/
23:11:31.934 [11700] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db4/V282crbatrl1 is a character special file. B
acking up the raw partition.
23:12:31.570 [11707] <16> flush_archive(): ERR - Cannot write to STDOUT.
Errno = 32: Broken pipe
23:12:31.570 [11707] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24:
socket write failed
23:12:31.570 [11707] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write
failed
23:12:31.570 [11707] <2> bpbkar Exit: INF - Close of stdout complete
23:13:34.573 [11665] <8> bpbkar process_file: WRN - Short read at byte
1407188992. Read 131072 bytes when attemp
ting to read 524288 bytes, in file /dev/vx/rdsk/db10/V282cprlvl1X.
If I put the entire directory into an exclude list, backups worked fine,
taken out of the exclude list, socket errors occurred.
Steps I went through to narrow the problem down
-ran netstat ian on master/media/client during backups to ensure no packet
failures-checked out fine
-do large (greater than 2 gig) ftp tests between all 3-also checked out
fine.
-cleared out logs for only that day/test backup of that client:
master: bpsched
media: bpcd, bpbrm and bptm
client: bpcd, bpbkar<?xml:namespace prefix = o ns =
"urn:schemas-microsoft-com:office:office" />
Once I started out with fresh logs for that day, I realized that I had 2
clients failing with socket errors. Another client had failed that I I had
missed. Once I scanned these logs, I realized that more than one file on
many different file systems were failing:
19:54:19.094 [10661] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db4/V282crbprfl1
log.112603:19:59:00.802 [10661] <16> flush_archive(): ERR - Cannot write to
STDOUT. Errno = 32: Broken pipe
19:46:02.730 [10663] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db1/V282cvnjrni2
log.112603:19:59:01.011 [10663] <16> flush_archive(): ERR - Cannot write to
STDOUT. Errno = 32: Broken pipe
18:51:36.225 [6601] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db7/V282cvnjrni1
log.112703:19:07:34.615 [6601] <16> flush_archive(): ERR - Cannot write to
STDOUT. Errno = 32: Broken pipe
18:35:10.155 [1866] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db7/V282cvnjrni1 is a character special file. Backing up the
raw partition.
log.112703:18:52:07.961 [1866] <16> flush_archive(): ERR - Cannot write to
STDOUT. Errno = 32: Broken pipe
00:33:47.956 [1865] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db3/V282cbm6X
log.112703:00:47:56.402 [1865] <16> flush_archive(): ERR - Cannot write to
STDOUT. Errno = 32: Broken pipe
04:04:29.011 [7022] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db4/V282cbm5X
log.112803:04:13:46.732 [7022] <16> flush_archive(): ERR - Cannot write to
STDOUT. Errno = 32: Broken pipe
07:29:56.334 [11288] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db3/V282cbm4X
log.112803:07:38:38.399 [11288] <16> flush_archive(): ERR - Cannot write to
STDOUT. Errno = 32: Broken pipe
More steps:
-change backup policy to now back up to silo directory off of the NB Master.
NB media server no longer in the picture. Both backups ran without ANY
failures.
-changed out NIC card in media server, replaced old GE card with a brand new
GE card, updated drives from JNI and Sun, and all relevant patches.-still
made no difference.
Backing up NB client through NB media server still have frequent socket
errors over random filesystems.
-changed ports on switch-still fails going to NB media server every time,
but backups to NBM are successful, even after port switches, and switching
cables on both switches
Throughout this entire ordeal, we (Backup Team) new it was a networking
problem, but had to eliminate Sun/Veritas completely before pointing the
finger elsewhere. Well, now was the time. Here was the fix:
NB Master-connected to switch
NB Media server-connected to separate switch, same subnet, but switches
linked with one crossover cable , both switches 3Com
Replaced the crossover cable and power cycled both switches. Backups now
worked going through the media server 100% for both clients. This is the
only time I have ever had this type of error here so I feel safe saying
that those who have emailed me in the past asking for the fix, look at your
network architecture/hard first. There was never anything wrong with
Veritas.
For those who have emailed me in the past few months, I apologize for not
finding this sooner but I hope what I have provided helps.
Dwayne J. Brzozowski
Department of Veterans Affairs
Austin Automation center
(512)326-6728 work
dwayne.brzozowski AT mail.va DOT gov
------_=_NextPart_001_01C4331F.2320C26E
Content-Type: text/html;
charset="iso-8859-1"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1400" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004>Back in November of
2003, I posted the message below. It took a while to finally resolve this
issue.
For some reason, I buried all the email and docs. Since then, many people on
this list have emailed me directory asking what the fix was. I finally found
all
the information relating to how I fixed this problem, and the steps I took to
narrow down what I determined what was directly causing this
error.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004>Here is the
original
email:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004>Hi, <BR>I am having
a problem with a client. I have a NetBackup 4.5FP4 master, which<BR>is Solaris
8, a Solaris 8 media server at same NB patch level, and a Solaris<BR>8 client,
which is backing up to the media server. I started getting socket<BR>errors
Monday night. This is what's in the bpbkar logs on the
client:<BR><BR><BR>04:12:21.975 [6289] <16> flush_archive(): ERR - Cannot
write to STDOUT.<BR>Errno = 32: Broken pipe<BR>04:12:21.975 [6289] <16>
bpbkar Exit: ERR - bpbkar FATAL exit status = 24:<BR>socket write
failed<BR>04:12:21.975 [6289] <4> bpbkar Exit: INF - EXIT STATUS 24:
socket write<BR>failed<BR>04:12:21.976 [6289] <2> bpbkar Exit: INF -
Close
of stdout complete<BR><BR>Any help would be greatly
appreciated!</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff>description of problem:</FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=107192103-06052004>I had
a very high profile client residing on an E10k-using 5 boards with over a
terabyte of EMC storage. On a filesystem 5 levels deep,
/app/www/webpg/dev/docs,
there were 2 files that caused NB to die as soon as they were next in line to
be
backed up. The backups of this client went through the media server, which
had a Sun/Storage Tek L700 library directly attached to it. The bpbkar
logs on this client showed exactly the error and what file NB died on:
</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT size=2><SPAN class=107192103-06052004><FONT size=2><FONT
size=2> </DIV>
<DIV><FONT face=Arial color=#0000ff>23:11:31.927 [11700] <2> bpbkar
resolve_path: INF - Actual mount point of /dev/vx/rdsk/db4 is
/dev/vx/rdsk/db4</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:11:31.927 [11700] <2> bpbkar
SelectFile: INF - Resolved_path = /dev/vx/rdsk/db4/V282crbatrl1</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:11:31.932 [11700] <4> bpbkar
PrintFile: /dev/vx/rdsk/db4/</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:11:31.934 [11700] <8> bpbkar
process_file: WRN - /dev/vx/rdsk/db4/V282crbatrl1 is a character special file.
B</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>acking up the raw partition.</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:12:31.570 [11707] <16>
flush_archive(): ERR - Cannot write to STDOUT. Errno = 32: Broken
pipe</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:12:31.570 [11707] <16> bpbkar
Exit:
ERR - bpbkar FATAL exit status = 24: socket write failed</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:12:31.570 [11707] <4> bpbkar Exit:
INF - EXIT STATUS 24: socket write failed</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:12:31.570 [11707] <2> bpbkar Exit:
INF - Close of stdout complete</FONT></DIV>
<DIV><FONT face=Arial color=#0000ff>23:13:34.573 [11665] <8> bpbkar
process_file: WRN - Short read at byte 1407188992. Read 131072 bytes when
attemp</FONT></DIV>
<DIV><FONT size=1><FONT face=Arial><FONT color=#0000ff><FONT size=2>ting to
read
524288 bytes, in file
/dev/vx/rdsk/db10/V282cprlvl1X</FONT>.</FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=1></FONT> </DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial color=#0000ff>If I
put the entire directory into an exclude list, backups worked fine, taken
out of the exclude list, socket errors occurred. </FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial color=#0000ff>Steps I went
through to narrow the problem down</FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial color=#0000ff>-ran netstat
ian on master/media/client during backups to ensure no packet failures-checked
out fine</FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial color=#0000ff>-do large
(greater than 2 gig) ftp tests between all 3-also checked out fine.
</FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial color=#0000ff>-cleared out
logs for only that day/test backup of that client:</FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><FONT
color=#0000ff></FONT></SPAN></FONT></SPAN> </DIV>
<DIV><SPAN class=107192103-06052004><FONT size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><FONT color=#0000ff>master:
bpsched</FONT></SPAN></FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"></SPAN></FONT><FONT size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><FONT color=#0000ff>media: bpcd,
bpbrm and bptm</FONT></SPAN></FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT color=#0000ff size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">c</SPAN></FONT><FONT size=2><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"><FONT color=#0000ff>lient: bpcd,
bpbkar<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office"
/><o:p></o:p></FONT></SPAN></FONT></DIV></SPAN></FONT>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT face=Arial color=#0000ff
size=1></FONT></FONT></FONT> </DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial color=#0000ff>Once I
started out with fresh logs for that day, I realized that I had 2 clients
failing with socket errors. Another client had failed that I I had missed. Once
I scanned these logs, I realized that more than one file on many different file
systems were failing:</FONT></SPAN></DIV>
<DIV><SPAN class=107192103-06052004><FONT face=Arial
color=#0000ff></FONT></SPAN> </DIV>
<DIV><SPAN class=107192103-06052004><SPAN class=389533218-28112003><FONT
face=Arial color=#0000ff size=2>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2>19:54:19.094 [10661] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db4/V282crbprfl1</FONT></SPAN></DIV>
<DIV>log.112603:19:59:00.802 [10661] <16> flush_archive(): ERR - Cannot
write to STDOUT. Errno = 32: Broken pipe<BR></DIV>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2>19:46:02.730 [10663] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db1/V282cvnjrni2</FONT></SPAN></DIV>
<DIV>log.112603:19:59:01.011 [10663] <16> flush_archive(): ERR - Cannot
write to STDOUT. Errno = 32: Broken pipe<BR></DIV>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2>18:51:36.225 [6601] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db7/V282cvnjrni1</FONT></SPAN></DIV>
<DIV>log.112703:19:07:34.615 [6601] <16> flush_archive(): ERR - Cannot
write to STDOUT. Errno = 32: Broken pipe<BR></DIV>
<DIV>18:35:10.155 [1866] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db7/V282cvnjrni1 is a character special file. Backing up the raw
partition.<BR>log.112703:18:52:07.961 [1866] <16> flush_archive(): ERR -
Cannot write to STDOUT. Errno = 32: Broken pipe<BR></DIV>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2>00:33:47.956 [1865] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db3/V282cbm6X</FONT></SPAN></DIV>
<DIV>log.112703:00:47:56.402 [1865] <16> flush_archive(): ERR - Cannot
write to STDOUT. Errno = 32: Broken pipe<BR></DIV>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2>04:04:29.011 [7022] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db4/V282cbm5X</FONT></SPAN></DIV>
<DIV>log.112803:04:13:46.732 [7022] <16> flush_archive(): ERR - Cannot
write to STDOUT. Errno = 32: Broken pipe<BR></DIV>
<DIV>
<DIV><SPAN class=389533218-28112003><FONT face=Arial color=#0000ff
size=2>07:29:56.334 [11288] <8> bpbkar process_file: WRN -
/dev/vx/rdsk/db3/V282cbm4X</FONT></SPAN></DIV>log.112803:07:38:38.399 [11288]
<16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 32: Broken
pipe</DIV></FONT></SPAN></SPAN></DIV>
<DIV></FONT></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff></FONT></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff>More steps:</FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff>-change backup policy to now back up to silo directory off of the
NB Master. NB media server no longer in the picture. Both backups ran without
ANY failures. </FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff>-changed out NIC card in media server, replaced old GE card with
a
brand new GE card, updated drives from JNI and Sun, and all relevant
patches.-still made no difference. </FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff></FONT></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff>Backing up NB client through NB media server still have frequent
socket errors over random filesystems. </FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff>-changed ports on switch-still fails going to NB media server
every time, but backups to NBM are successful, even after port switches,
and switching cables on both switches</FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff></FONT></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=107192103-06052004><FONT
color=#0000ff>Throughout this entire ordeal, we (Backup Team) new it was a
networking problem, but had to eliminate Sun/Veritas completely before pointing
the finger elsewhere. Well, now was the time. Here was the
fix:</FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=107192103-06052004>NB
Master-connected to switch</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=107192103-06052004>NB
Media server-connected to separate switch, same subnet, but switches linked
with one crossover cable , both switches 3Com</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004>Replaced the crossover cable and power cycled both
switches. Backups now worked going through the media server 100% for both
clients. This is the only time I have ever had this type of error here so
I feel safe saying that those who have emailed me in the past asking for the
fix, look at your network architecture/hard first. There was never
anything wrong with Veritas. </SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=107192103-06052004>For
those who have emailed me in the past few months, I apologize for not finding
this sooner but I hope what I have provided helps. </SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=107192103-06052004>
<P><FONT size=2>Dwayne J. Brzozowski<BR>Department of Veterans
Affairs<BR>Austin
Automation center<BR>(512)326-6728
work<BR>dwayne.brzozowski AT mail.va DOT gov<BR></FONT></P></SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=107192103-06052004></SPAN></FONT><FONT face=Arial size=2><SPAN
class=107192103-06052004> </DIV>
<DIV><BR></DIV></SPAN></FONT></BODY></HTML>
------_=_NextPart_001_01C4331F.2320C26E--
|