ADSM-L

Re: journal errors - return code 121 \\.\pipe\jnl

2003-02-11 13:28:09
Subject: Re: journal errors - return code 121 \\.\pipe\jnl
From: Pete Tanenhaus <tanenhau AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 11 Feb 2003 12:51:34 -0500
I understand the concern, the point I was trying to make was that in some
cases it's difficult to determine
if  particular return codes merit something being written to the errorlog.

In the case of a named pipe error (on either the journal daemon or the
backup client) it isn't easy to
determine if the error occurs because the other process has terminated (or
timed out) either normally or abnormally.

All that having been said, the error messages in question really provide
diagnostic information that
probably isn't all that useful to the end user and is really is only useful
to development or support
when attempting to track down problems in the field.

So I agree that the messages should either be more meaningful or shouldn't
be in the errorlog
at all (probably more appropriate for the client tracing facility).

By the way, I tracked down the specific message in this thread ( return
code 121 \\.\pipe\jnl).

There are two types of named pipes used to communicate between the two
processes (journal daemin and b/a client).

One named pipe (\\.\pipe\jnl) name is created by the journal daemon during
startup and is used to receive inbound requests from
the b/a client.

The b/a client determines if the journal daemon is running by attempting to
connect to this pipe.

Return code 121 indicates that the b/a client timed out attempting to
connect to the pipe, which most likely means
that the journal daemon isn't running.

I think a more accurate message in the errorlog would be something like
"Connection to Journal Daemon timed out"
(and the specific return code could be documented with tracing on).

The other type(s) of named pipes are created by the b/a client and used to
receive responses from the journal daemon.

There is at least one response pipe created by the b/a client for each
backup session (and with multi-threaded backup
possible more since there can be multiple backup sessions).

Any request sent to the journal daemon which requires a response includes
this response pipe name.

When the journal daemon processes the response, it connects to the supplied
response pipe and sends response
data on it as needed.

The b/a client backup session waits for data to arrive on the response pipe
and the reads it.

An example of this would be a request from the b/a client to query all of
the objects in the journal for a particular file system.

The b/a client creates a pipe for the journal daemon to send the list of
matching objects on, and supplies the name of this
pipe to the journal daemon in the query request.

The journal daemon processes the query request, connects to the supplied
pipe, performs the query, and sends the responses
on the pipe..

The b/a client expects to look  (peeks) at the request pipe to determine if
there is any response data to read and continues
to read until it is empty.

Usually data will be available on the response pipe as soon as the b/a
client requests it, but in some cases (as with very
long running queries) the journal daemon doesn't post the data on the
response pipe in time and the b/a client must continue
to look (peek) at the pipe until it arrives.

I believe the above condition is the source of the  "NpPeek: No data"
message in the errorlog.

Since the b/a client will eventually timeout the session if the data isn't
received on the response pipe in
a reasonable amount of time, I don't see any use of logging the above
message, so I will remove it in a future release/ptf.

Anyway, I hope this helps .....









Pete Tanenhaus
Tivoli Storage Solutions Software Development
email: tanenhau AT us.ibm DOT com
tieline: 320.8778, external: 607.754.4213

"Those who refuse to challenge authority are condemned to conform to it"

---------------------- Forwarded by Pete Tanenhaus/San Jose/IBM on
02/11/2003 11:58 AM ---------------------------

"Magura, Curtis" <curtis.magura AT LMCO DOT COM>@VM.MARIST.EDU> on 02/11/2003
11:32:26 AM

Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>

Sent by:    "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>


To:    ADSM-L AT VM.MARIST DOT EDU
cc:
Subject:    Re: journal errors - return code 121 \\.\pipe\jnl



Pete, think you said it all with the statement below! Very confusing in the
current state to decide if there is a problem or not.

"That being said, I think development (myself) needs to look at  the np
error logging on both sides and try to eliminate logging messages which
aren't really errors, but in some situations it's difficult to determine if
an error condition is legitimate and should be logged or if it is innocuous
and can be ignored."

Curt Magura
Lockheed Martin EIS
Orlando, Fla.
321-235-1203


-----Original Message-----
From: Pete Tanenhaus [mailto:tanenhau AT US.IBM DOT COM]
Sent: Tuesday, February 11, 2003 10:27 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: journal errors - return code 121 \\.\pipe\jnl


Np errors in the backup client errorlog indicate the opposite condition,
that is the backup client
is trying to read a response sent from the journal daemon which isn't
available at the moment
the read is being done.

This error can happen if the journal daemon ends (obviously a problem) or
(I believe) if the
response the backup client is looking for from the journal daemon is still
in progress, meaning
that the journal daemon hasn't finished processing/sending it.

In most cases the response is ready when the backup client goes to read it,
but if it isn't
the backup client will keep trying to read the response until it either
arrives or a timeout occurs
(don't know the exact wait time of the top of my head).

That being said, I think development (myself) needs to look at  the np
error logging on both sides
and try to eliminate logging messages which aren't really errors, but in
some situations it's difficult
to determine if an error condition is legitimate and should be logged or if
it is innocuous and can be ignored.

Hope this helps ......

Pete Tanenhaus
Tivoli Storage Solutions Software Development
email: tanenhau AT us.ibm DOT com
 tieline: 320.8778, external: 607.754.4213

<Prev in Thread] Current Thread [Next in Thread>