Re: copy storage pools

My name got mentioned in this dual thread so I guess I should say a few
things to see if I can help.  I have decided to attach to Tom's response
because I think his first statement is the most important issue. A PLAN!

We are deep into developing a comprehensive recovery plan and progressing to
testing right now.  I have about 10TB of active data on TSM right now.  We
are not doing daily offsites yet.  But, we are doing weekly and will be
moving to daily in probably less than a month.

There is so much information missing from your request, Rob, it is difficult
to help you.  I will give you some basic guidance.

What were the company findings for the following:

        BIA (Business Impact Analysis)
        RPO (Recovery Point Objective)
        RTO (Recovery Time Objective)
        CPH (Cost Per Hour)

If your auditors do not have this information then they are asking you to
aim at a target that has not been put on the firing range yet.  The cost per
hour is likely confidential information.  But, lets take some hypothetical
situations.  A financial institution, if they are down more than a certain
period of time without access to their records the FDIC gets involved (legal
ramifications).  A defense system, down 1 minute and it may be all over.  A
manufacturer of a product that takes 5 years to build, maybe a couple weeks.
You see the point.  What all will tell you is that the RPO needs to be
tight, ie., very close to the time of failure.  For financial institutions,
probably the last transaction, but they have to keep the till tape for the
day so they can recreate for that day.

In a retail business, it really depends.  What is likely most important is
inventory, orders, and shipping in the footwear business.  Everyone focuses
on payroll, but the reality is you can have a policy that says we pay you
your last check and have a manual process to cover those new hires.  Then
retroactively correct either way.

Now, to focus on you question how do you do it.  Once you have identified
what applications are required to continue the business you must setup your
storage pools like Tom says so that your "backup stg" commands only backup
what is needed each day to send offsite.  Remember, "backup stg" only copies
what has not already been put in the copy storage pool.  The database knows
what is there and only copies the differences.  Unless you are doing
absolute incrementals or selectives on everything, I have to believe you can
get this done.  "Backup stg" is a misnomer just like "incremental" is.  As
Nick Cassimatis says incremental is an update of the full backup.  "Backup
stg" is an update of the Copy Pool to match the Primary Pool.  Follow the
rules, you have to do the backup commands in order starting with the primary
pool in the management class, the primary pool's next pool, it's next pool,
etc.  Example:

        Primary_Disk_Pool1 has a next pool of Primary_Tape_Pool1

        Offsite_Tape_Pool1 is the target.

Issue:
        Backup stg Primary_Disk_Pool1 stg=Offsite_Tape_Pool1 maxpr=?
Wait=yes
        Backup stg Primary_Tape_Pool1 stg=Offsite_Tape_Pool1 maxpr=?

The "maxpr" can help save lots of time if you have the tape drives to do it,
but it can cause more tapes to be generated.  Tapes are cheap, time is lost
forever.  Notice the Wait=Yes.  You want to wait until the disk has finished
before you do the tape pool.

A little tidbit for everyone.  We use Shark disk for the disk storage pool
and have both an onsite and offsite copy pool.  The onsite is in case our
primary tape fails we can rebuild it without exposing our offsite tapes to
be returned.  When we do the Disk "backup stg" we do it for the onsite and
offsite at the same time (2 tape drives).  What happens is the cache in the
Shark paces the two backups, the first one causes a read cache stage, the
second reads from cache.  And, it is funny you can start a 150GB dump of the
disk pool 5 minutes apart and they will finish at the same time because of
this.  And, it does not slow them down measurably.  Most important try to
make sure your disk pools are large enough so that you do not do much tape
to tape, though it is fast on the Magstar.  Just, in my environment with the
2 copies, I lose the speed benefit if migrations move the  data to tape
before I get it on the copies.

Big files (databases, exchange, oracle, sql server, SAP) go straight to
tape.  Tape to tape copies on these will fly on TSM especially if you have
FC tape.

Our pool setup is very similar to Tom's.  We do some strange things from
traditional TSM sites.  Several times at night we copy the storage pools to
have them up to date by the morning.  Our last run takes less than 2 hours.
This is what is going to surprise you.  We make 2 copies one for onsite and
one for offsite.  We have 12 Exchange servers with stores of 20GB to 35GB
each, over 6TB of other servers not databases and about 2TB of databases
that we backup just on NT.  There is a 1.5 TB Unix File System SGI machine
with about 35GB changed every day.  Ten AIX servers with about 150GB changed
out of 2TB each day.  All total we are sending about 7TB of tapes offsite a
week.  We are doing this with 8 Magstar drives, Gigabit on some servers and
attached to a P660-6H1 and it is idling most of the time.

We have 38 Magstar drives for open systems in our Netbackup implementation;
half that many will be required as we migrate to the TSM environment.  But,
new requirements hit and will chew most of this up.

You may wonder how we could get so much done with just 8 drives.  Heck they
sit idle most of the day.  Going back to Tom, PLANNING.

I did not give you the whole story.  But, basically, we solved the
incremental forever problem for file systems with some elaborate code perl
script extensions to TSM to keep the offsite pool small and always refresh
the tapes (no tape is offsite over 8 weeks offsite).  We never have to do
offsite reclamation and do not require open storage as Don France referred
to.

What you will find is the $30K in tapes and some Gigabit infrastructure and
maybe a new TSM server or some tuning and planning are all you need to be
very successful.  The size of your environment can easily be done with 6
drives (Fabric attached), a 6H1 (TSM Server), and Gigabit to any servers
that have more than 25GB a night and Gigabit to the TSM server.

Again, I do not have enough information about your environment to help
answer this question.  But, I will be glad to help.  If you feel it is of a
confidential nature and want to do it off the list call me or email me
directly.

Yes, I have a lot of resources, but you do not need all of this.  You have
to run a profitable business, not a disaster recovery site, so, get a BIA
done, plan, develop your processes, identify your shortfalls correct,
implement, test, test, test.  Then, demonstrate 9-11 means no impact to your
business except an unforgettable sentiment for those lost and that those
left will be taken care of.

And as Tom said, it is a big job, not an after thought.

Paul D. Seay, Jr.
Technical Specialist
Naptheon, INC
757-688-8180