RE: [nv-l] Netview Redundancy / Failover

2002-08-02 14:07:28
Subject: RE: [nv-l] Netview Redundancy / Failover
From: "Barr, Scott" <Scott_Barr AT csgsystems DOT com>
To: <nv-l AT lists.tivoli DOT com>
Date: Fri, 2 Aug 2002 13:07:28 -0500
As a topic of discussion why do folks back up the database instead of 
maintaining two active servers at the same time? I use the same discovery 
methods on both my servers so the databases are relatively identical. Same 
thing with data collection. I realize that does double the network traffic but 
backed up / restored databases seem to be as large a movement of data? 
comments? I am wondering if my approach is wrong. Again, in my environment, we 
use two servers geographically separated and we have very few client users (so 
maps are really a non-issue here). 

-----Original Message-----
From: Stephen Hochstetler [mailto:shochste AT us.ibm DOT com]
Sent: Friday, August 02, 2002 11:01 AM
To: nv-l AT lists.tivoli DOT com
Subject: RE: [nv-l] Netview Redundancy / Failover

There are several "best practices" that an IBM team of services people have
created over the years for NetView Redundancy.    There are two basic
designs that are slightly tweaked.


      - Only administer one map.

Best Practices:
1.  Use a pair of MLMs for receiving unsolicted traps.  (A pair for
redundancy)  When your NetView servers discover them, NetView adds itself
automatically as a trap receiver to the MLM.    Via a single SNMP put you
can turn on/off trap forwarding from the backup MLM.   Easy to automate
into your MLM failover script.  This also opens the door to set some trap
filters at these MLMs to block all unsolicted traps you don't care about.

2.  If you need failover in a matter of 1 minute -- If you are a service
provider that MUST be managing customer networks 24x7 with no downtime,
then a set of 3 NetView servers are recommended.   This gives you an
Administrative server, an Operations peer1 server and an Operations peer2
server.    This allows your administrative team to put in managed devices
and remove them without impacting operations.   At end of day, the database
is copied from the Administrative server to the peer server not currently
being used.  At shift change operations can move to the new database by
opening maps on the updated server.    This means that peer1 will be
production monday, peer2 on Tuesday...etc.   The other peer is always
available for redudancy.  In a disaster you can go back to the
Administrative server.

In this scenario we have also created a process to move many NetView
clients from one peer server to another in a designed fashion.   This is
now easier with Web clients since you don't have the map sync process using

3.  If you need failover in a matter of 10 minutes -- If you are not a
service provider, then a set of 2 NetView servers are recommended.  Again,
single map adminstration is important.   Also, if you have event
automation, rules or forwarding to TEC,  those activities must check to see
if the NetView server is "production" or in "backup" mode.   Scripts can do
that easily.    Using MLMs to control trap flow helps.  In this case, you
have to decide how often to update the backup NetView.   In most customers
I found this to be a weekly maintenance window.
      - backup server kicked into production mode and verified
      - primary server brought down
      - backup of primary server database, config files
      - primary server brought up and verified
      - backup server brought down
      - restore of primary server database on backup server, config files
      - backup server kicked into backup mode

4. If you need failover in a matter of 30 minutes (more or less) -- You can
configure NetView in an HA environment.   Using a shared disk to share the
NetView database between two servers.   The failover time will equal
detection time + NetView startup time.   The risk is high  that a database
problem for a new device is most likely the cause of your problem.    Thus
any solution that utilizes a single NetView database is at the most risk of
not working when your production NetView has a problem.

Option 1 can and should be used in all cases.
Option 2 and 3 work well, choose it based on your needs.  Takes custom work
from yourself or experienced NetView consultants.
Option 4 is less recommended than Option 3.

Kind regards,
Stephen Hochstetler              shochste AT us.ibm DOT com
International Technical Support Organization  - Austin
Office - 512-436-8564                      FAX - 512-436-9326

ITSO redbooks at  http://www.redbooks.ibm.com

To unsubscribe, e-mail: nv-l-unsubscribe AT lists.tivoli DOT com
For additional commands, e-mail: nv-l-help AT lists.tivoli DOT com

This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)

<Prev in Thread] Current Thread [Next in Thread>