ADSM-L

Monitoring ADSM for errors.

1997-05-21 12:10:17
Subject: Monitoring ADSM for errors.
From: Chuck Tomlinson 793-0730 <chuckt AT AUSTIN.IBM DOT COM>
Date: Wed, 21 May 1997 11:10:17 -0500
Automatic digest processor <LISTSERV AT VM.MARIST DOT EDU>  writes:
>Date:    Mon, 19 May 1997 23:55:28 -0400
>From:    Wayne Gorton <wayneg AT AU1.IBM DOT COM>
>Subject: Monitoring ADSM for errors.
>
>Howdy All,
>I am currently revisiting the way we monitor ADSM's activity.
>Our environment is 20 nodes backing up to 1 server (all on an SP) overnight.
>We use NetView to monitor the file dsmerror.log. This issues all sorts of
>messages that don't need to be passed onto NetView.
>The manual "ADSM for AIX: Advanced Topics" recommends monitoring the console
>log instead of the dsmerror.log (but doesn't say why).
>The console log is piped out to /dev/null. Is this the norm?
>Is there a mechanism for pruning the console log or dsmerror.log?
>It also says to monitor for specific messages & ignore the rest.
>I think it would be better to take a sample of the most common messages &
>filter out the messages you don't need to see. This way should ADSM issue any
>new messages (in future versions) we won't be ignoring them.
>
>What's the consensus, monitor the console log or dsmerror.log?
>What does everyone else do?
>
>I'd welcome any opinions/advise before I start scripting.
>
What I do here is I have a perl script that runs the dsmserv process, grabs the
console information and writes it to a file.  At the same time it checks the
error messages to decide which messages to notify on... either with E-mail,
pages or as in your case passing to NetView to do the notification.  Some of
the basic rules to classify the errors is to use the error typing of ADSM
(information, warning, errors, and critical errors).  Then on an exception
basis handle specific error codes.
 Information            ANR....I
 Warnings               ANR....W
 Errors                 ANR....E
 Critical Errors        ANR....D  &  ANR....S

I myself have set up to take the following actions (classes).
 Information            - take no action.
 Warnings               - only log in a warings log file.
 Errors                 - log in an error log file and send e-mail of problem.
 Critical Errors        - log in an error log file , send e-mail, & send page.

Now ss I stated, I do group specific ANR messsage into one of the above classes
that would be different then its default class.  I case you are wondering my
break down is as follows......

Critical Errors Class:
ANR0130E ANR0131E ANR0132E ANR0359E ANR0360E ANR0361E ANR2700E ANR2707E
ANR2708E ANR4565E ANR4570E ANR4571E ANR4573E ANR4575E ANR4576E
ANR4577E ANR4578E ANR4579E ANR4580E ANR4582E ANR4583E ANR7823E ANR8469E

ANR0202W ANR0204W ANR0205W ANR0362W ANR0437W ANR0438W ANR0439W ANR0485W
ANR0522W ANR1025W ANR2574W ANR2575W ANR4581W

ANR4561I ANR4562I ANR4563I ANR4564I

Errors Class:
ANR2572W ANR0206W ANR0208W ANR0214W ANR0215W ANR4550I ANR8326I

Information Class: (basically these will be ignored)
ANR5124E ANR5233E ANR5241E ANR5307E ANR5308E ANR5311E ANR7821E ANR8304E
ANR8447E ANR2000E

Now this is the setup that works for me.  You may have other ANR messages that
you wish to take a different action on then what I am taking.  But hopefully
this will give you a start in looking at the the message groupings.

Hope this helps;
Chuck T.

<Prev in Thread] Current Thread [Next in Thread>