Veritas-bu

[Veritas-bu] RE: Error 24 problems and trouble shooting

2005-03-09 12:54:55
Subject: [Veritas-bu] RE: Error 24 problems and trouble shooting
From: Charles Ballowe <cballowe AT gmail DOT com> (Charles Ballowe)
Date: Wed, 9 Mar 2005 11:54:55 -0600
I applied the updated kernel parameters for solaris 9 from a technote.
After that I still encountered the problem. Here's log entries from
bpbrm on the affected media server affected yesterday. These repeat
until all jobs associated with the process are killed, then everything
returns to normal. Any more thoughts?

-Charlie

12:29:11.837 [27194] <2> sighdl: pipe signal
12:29:11.837 [27194] <2> put_long: (11) network write() error: Broken
pipe (32); socket = 5
12:29:11.837 [27194] <16> bpbrm send_keepalive: could not write
KEEPALIVE to COMM_SOCK
12:29:11.839 [27194] <2> logconnections: BPJOBD CONNECT FROM
xxx.xxx.xxx.xxx.40738 TO yyy.yyy.yyy.yyy.13723
12:29:11.841 [27194] <2> job_authenticate_connection: ignoring VxSS
authentication check for now...
12:29:11.843 [27194] <2> job_connect: Connected to the host nbmaster-b
contype 10 jobid <51195> socket <7>
12:29:11.843 [27194] <2> job_connect: Connected on port 40738
12:29:11.843 [27194] <2> set_job_details: Done 
12:29:11.887 [27194] <2> job_monitoring_exex: ACK disconnect
12:29:11.887 [27194] <2> job_disconnect: Disconnected
12:29:11.888 [27194] <2> logconnections: BPDBM CONNECT FROM
xxx.xxx.xxx.xxx.40739 TO yyy.yyy.yyy.yyy.13721

On Thu, 17 Feb 2005 13:39:43 -0500, Kevin Zhang
<Kevin.Zhang AT rci.rogers DOT com> wrote:
> I will suggest to check the network related performance for this media
> server, also maybe you want to look into the kernel to fine tune some
> parameters.
> 
> Kevin
> 
> Date: Thu, 17 Feb 2005 11:28:28 -0600
> From: Charles Ballowe <cballowe AT gmail DOT com>
> Reply-To: Charles Ballowe <cballowe AT gmail DOT com>
> To: veritas-bu AT mailman.eng.auburn DOT edu
> Subject: [Veritas-bu] Fwd: Error 24 problems and trouble shooting
> 
> It seems when this starts happening, I find at least one job who's
> detailed status is many lines of "Error bpbrm could not write KEEPALIVE
> to COMM_SOCK". Maybe that gives a clue to what's going on? I'm still
> looking for thoughts on this.
> 
> -Charlie
> 
> ---------- Forwarded message ----------
> From: Charles Ballowe <cballowe AT gmail DOT com>
> Date: Wed, 16 Feb 2005 13:34:35 -0600
> Subject: Error 24 problems and trouble shooting
> To: veritas-bu AT mailman.eng.auburn DOT edu
> 
> It seems that every couple of weeks one of my media servers will
> completely forget how to talk to the world and every backup that tries
> to use it will fail with a 24. Reboots can clear this, but there has to
> be a better way.
> 
> A network sniff at the time a backup gets kicked off doesn't show any
> traffic to the clients involved so I believe that the problem is on the
> server. Outside of backup processes, I'm able to send traffic through
> the interface, but backups stop working. In progress backups seem to
> continue on to completion.
> 
> The environment is NB 5.1 MP2, many of the clients are still 4.5 of some
> form. This problem seemed to exist in 4.5 as well. The servers are all
> solaris, clients are a mix of unix and windows. Any idea where to look
> to start troubleshooting this one?
> 
> -Charlie
> 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>

<Prev in Thread] Current Thread [Next in Thread>
  • [Veritas-bu] RE: Error 24 problems and trouble shooting, Charles Ballowe <=