ADSM-L

Re: Réf . : [ADSM-L] generic monitoring question

2005-11-04 08:09:35
Subject: Re: Réf . : [ADSM-L] generic monitoring question
From: Mike <mikee AT MIKEE.ATH DOT CX>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 4 Nov 2005 07:09:10 -0600
On Fri, 04 Nov 2005, Vincent RATAJSZCZAK might have said:

> Mike,
>
> I'm very interesting by your TSM monitoring solution, based on NetIQ
> products.
>
> Could you describe us the minimal products/modules of NetIQ you are using
> to achieve, and a short description of the solution please ?

Happily. I have written a monitoring system for watching my unix
boxes. Management directed to have a monitoring system that would
watch all our platforms (unix, intel, network, excluding mainframe).
There have been problems with NetIQ, though of late I have become
very familiar with how it works from the unix side.

In my original message I included the items for my initial TSM
module. I could buy a TSM module from NetIQ, though I'd need to fix
their code afterwards, so I am just writing my own. Here is a snip
from the first message:

> I'm writing a module for the NetIQ monitoring system. I'm happy
> to pass on this module if anyone is interested when I'm finished.
> So far I have checks for these things:
>
> Administrative Schedules Errors
> Administrative Schedules Failed
> Administrative Schedules Missed
> Schedules Completed with Errors
> Client Schedules Failed
> Client Schedules Missed
> Last Database Backup
> % Database Utilization
> Database Cache Hit Ratio
> % Maximum Recovery Log Utilization
> % Disk Pool Utilization
> Disk Volume Offline
> Number of Drives
> Number of Scratch Volumes
> Number of Read-Only Volumes
> License Compliance

Since then I have added a section for Unavailable Volumes and plan
in the next few days to add some of the other suggestions that have
been offered by this list. My implementation is designed to limit
how much I must interact with NetIQ due to some of the problems I've
had with the company. The main program is a PERL script that runs
from cron, performing the checks above and appending the results as
a report to a log file. The NetIQ module simply reads the log file
parsing those things that are 'GOOD' from those things that are
'WARN' or 'CRIT'. Anything that is not 'GOOD' gets an event. This
approach puts anything funky or advanced PERL coding outside the
NetIQ code so there can't ever be any blame that my 'custom code'
has caused a problem in the monitoring system. This approach also
allows me to monitor the main program, the external PERL script and
its output, from outside NetIQ as a sanity check that all is well.

Does this help?

Mike

<Prev in Thread] Current Thread [Next in Thread>