Re: Capacity Planning ....

Ray,

I have worked with customers with large networks managing from AIX.   Some
tips for you to use as you grow.   This is not inclusive of all the
information that would be in a true capacity planning exercise.  Nor is
this a whitepaper with all the answers.    I am just covering some
important points.

1. Will you be expanding operations as well as your network?
- memory on the server is also directly related to how many native clients
(NetView maps) you are running.  Keep that in mind as you think about the
future.   Will you be expanding operations or number of operators as the
network grows.   Will you also be moving them to web-based NetView clients
or leaving them on native clients?
- base memory for non-native  usage is higher for NetView 7.1, but the
system can then have more web clients once NetView 7.1 is installed.


2. Will you be CPU constrained?
- your 4 way should handle the additional interfaces with no problem.
Each NetView process is single threaded, so a 4-way machine will let 4
NetView processes run at once.    You will find netmon, ovwdb, trapd to
usually be the top processes.


3. How can I improve my database performance?
- I think you already have /usr/OV/databases as a separate file system.
Once this starts growing, it will be important to manage this a bit closer.
There are some things you can do to greatly improve database processing.

   The following is quite involved, print this out, read it, send me
questions if you have them.

  Step 1 - make sure you have a GOOD backup of /usr/OV/databases

  Step 2 - find out the size of the real data in your database  files.
Remember, this data is stored in "sparse" databases which means the *.PAG
files can grow a lot, but the real data will be smaller than the file size.
This is to allow NetView to hash into the file to find the data quickly.
This step will verify that specific files are hashed properly for the
amount of data.   In /usr/OV/service/nvTurboDatabase you will find the
following line:
    /usr/OV/service/dbmcompress -h o -m 8 -b 3 -o
/usr/OV/databases/openview/current/value_info

Do the following command:
   du -k /usr/OV/databases/openview/current/value_info.pag

This should give you a value of the actual data, the "-m 8" gives you a
capacity of "8000" from your "du -k" command.    If the "du -k" result is
larger than 8000, then you will want to change the value.   I suggest you
copy nvTurboDatabase to something like nvTurboDatabase.tuned.     Make all
your changes in the "tuned" version, this will keep it from getting
replaced by NetView PTFs in the future.     So if your "du -k" result was
10,000, then I would suggest you change the "-m 8" to a larger number like
"-m 11" or even 12 if you knew your network was growing....this is for
increased performance of larger databases.    Increasing this number will
increase the size of the value_info.pag file when you run nvTurboDatabase,
make sure your filesystem is big enough for this.

Repeat the commands for /usr/OV/databases/openview/current/obj_info.pag.
Note the default value is "-m 2".
Repeat the commands for /usr/OV/databases/openview/current/name.pag.
Note the default value is "-m 4".

Read in the documentation about running nvTurboDatabase before you run the
tuned script /usr/OV/service/nvTurboDatabase.tuned.


4.   You want to stripe a large /usr/OV/database filesystem over a number
of harddrives.    Once a minute there is a process in AIX that dumps to
disk all cached disk files.   ovwdb caches the database, so once a minute
the database gets written to disk.  While AIX does that, it also stops the
process that owns the cache, now being ovwdb.  Since anything that happens
in NetView need information from ovwdb, it is like AIX pausing NetView
processing once a minute.    To alleviate this, stripe the filesystem so
that AIX can dump it to disk very quickly across multiple drives.   In the
past I have used 100 MB per drive  (600 MB of data on 6 drives) and found
the delay to disappear.   Your hardware may give you different numbers than
this.


5. There are documented commands that will let you look at the netmon
polling queue to see if it is behind in its polling cycle.  You can tune
the netmon polling and SNMP queue sizes to give netmon power to do more at
once.   It is recommended to increase these queue sizes by small
increments, making it a hugh number may introduce other problems.


6.  The most common problem for a NetView failure is for some sort of
database corruption.   Since more HA systems share the drives it does not
help you with the most common failure.    Having two boxes where the
NetView database is copied occasionally from one system to the backup
provides a better highly available solution for most customers.   It could
be a nightly copy....or some other appropriate time period.

Kind regards,
Stephen Hochstetler              shochste AT us.ibm DOT com
International Technical Support Organization  - Austin
Office - 512-436-8564                      FAX - 512-436-8701