ADSM-L

Re: [ADSM-L] Why virtual volumes?

2007-08-23 12:41:53
Subject: Re: [ADSM-L] Why virtual volumes?
From: Richard Rhodes <rrhodes AT FIRSTENERGYCORP DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 23 Aug 2007 12:43:33 -0400
I said . . .

>Let me see if I fully understand . . . .

>Bloomington
>  TSM-a
>    local clients backup to disk pool
>    disk pool migrates to TSM-b/VV at Indianapolis
>      ==> primary tape pool is on TSM-b/VV at Indianapolis
>    vv server - contains primary pool from TSM-b

>Indianapolis
>  TSM-b
>    local clients backup to disk pool
>    disk pool migrates to TSM-a/VV at Bloomington
>      ==> primary tape pool is on TSM-a/VV at Bloomington
>    vv server - contains promary pool from TSM-a

>Summary:
>  - All clients backup to local TSM server to disk pool.
>  - Disk pool gets MIGRATED to VV primary pool at other site.
>  - There is NO copy pool at either site.


While I'm munching on lunch, let met say a few words about
this architecture.

Our setup was very similar, except instead of backup local and
migrating to the other site, we just backed up all servers
directly to the remote site.  All backups were by
definition OFFSITE.  It looked like this . . .

Site A
  TSM-A
    nodes for servers at SiteB
      disk pool
      primary tape pool
      copy pool

Site B
  TSM-B
    nodes for servers at SiteA
      disk pool
      primary tape pool
      copy pool

At the beginning, Site A was our main computer center
and Site B was a old datacenter had very few servers.  In other
words, it was hardly used.  At this point the design made
sense.  What happened over time is that old datacenter, SiteB,
grew and grew and grew, to become a peer datacenter.  The
two sites also became DR sites for each other.  We realized
that what looked and worked well at first had MAJOR problems
when they became peer sites and DR for each other.

Here are just some of the problems . . .

- If you loose SiteA, then you loose ALL backups for SiteB - both primary
and copy pools.  Site B is up and running, but it has NO BACKUPS at all.
Of course, the opposite is true also.

- In a DR, since you lost ALL backups for the surviving data center, you
MUST
run FULL backups ASAP.  Big, heavy load doing initial/full backups right in
the middle of the DR. This is in contention with any restores you need to
do!!!!

- Again, if you loose SiteA, at Site B you have to define all those nodes
to
the surviving SiteB TSm server, and change the dsm.sys/dsm.opt files on the
client nodes.  This is lots of work.

Summary:  What we found with this architecture is that we would spend
as much time doing "stuff" to the surviving TSm server and client nodes
as would do for the nodes that needed DR recovery.  NOT a pretty picture.
Now, everyone knew this.

We had preached these problems for a long
time, but couldn't get action to change.  Finally, we called a big
meeting with all the managers, and I stood up front and told them
our TSM DR plan (that is, the dr plan for any servers that relies on
TSM for DR recovery) DIDN'T work.  Again, they already know this, but
hadn't heard it like that before.  They took it very well, and came up
with the money we requested to correct the problem.

So, we changed to the classic design - onsite backups to disk, migrated
to a onsite primary tape pool, with a offsite copy pool.  Since our sites
are close enough, we extended our SAN between the sites via DWDM and
implemented library sharing.  The offsite copy pool is created straight
to remote tape.  In a disaster, we have to restore the lost TSM server and
start restores from the copy pool.  The surviving TSM server just keeps
humming along performing backups for the sirviving datacenter servers.

Our oracle people didn't like this change.  What they liked about the old
design was that Oracle archive log backups went directly offsite, quickly!
They had a small RPO due to this.  Now, there RPO is much longer due to
the time it takes to get the data into the offsite copy pool.

So, if you are considering a all offsite architecture, PLEASE think through
the DR implications completely.  This can be especially hard when initial
assumptions change gradually over time and come back to byte.

Rick





-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.

<Prev in Thread] Current Thread [Next in Thread>