ADSM-L

Re: Partial page writes - TSM mirroring vs OS/hardware mirroring

2003-03-12 11:46:17
Subject: Re: Partial page writes - TSM mirroring vs OS/hardware mirroring
From: bbullock <bbullock AT MICRON DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 12 Mar 2003 09:45:21 -0700
        My 2-cents....

        I've always read notes about OS mirroring V.S. TSM mirroring with 
interest. I understand the arguments and they sure seem logical and scare me 
into thinking I need to do TSM mirroring on my systems.

        However, we have been running 8 TSM servers on AIX hosts for over 7 
years. We mirror the DB and recovery logs at the OS level and have never had 
the problem described in this "scenario". We've had the systems crash because 
of power outages, crash in the middle of DB backups, crash in the middle of 
expirations, crash in the middle of a busy backup window, crash during 
filespace deletes, crash during a "delete dbvolume". You name it, we've crashed 
one of our servers during that activity. None of the "disaster scenarios" have 
appeared.

        Sure, you might say "well you've just been lucky". Maybe so, but with 
luck like this, perhaps I need to go to Vegas ;-). Perhaps credit is due to our 
environment: AIX's LVM (logical volume manager), SSA disks, fast-write cache 
with battery backup.

        Why have I resisted mirroring using TSM? Main reason is that I find it 
cumbersome. At our site, we have many people on the oncall rotation, some with 
very little TSM experience, but all with AIX experience. Since we use OS 
mirroring on all other hosts (Sybase, Oracle, etc.), replacing a failed mirror 
on the TSM servers is much more similar and straightforward when using OS 
mirrors then compared to TSM mirroring.

        Also, with OS mirroring, when I want to move DB volumes around (for 
load balancing across SSA adapters, upgrade to larger disks, replace failed 
disks, etc), I run 1 "delete dbvol" command and it all moves to the new disk 
(previously defined). If I use TSM mirroring, it took 3 or more steps and more 
than twice as long to accomplish the same task.(delete dcopy, define dbcopy)

        There has also been discussion about better performance using one 
mirroring over the other. Although I have no data to substantiate it, my gut 
feeling (right after I switched from TSM mirroring back to OS mirroring) was 
that TSM mirroring was slightly slower than OS mirroring.

        In my case, I trust AIX mirroring, it works better with our oncall 
support model, it's simpler. If it works well and it's not broke, I won't mess 
with it. Your mileage may vary....

Ben Bullock
Unix system admin
Micron Technology Inc.
Boise, Id.

-----Original Message-----
From: Jurjen Oskam [mailto:jurjen AT QUADPRO.STUPENDOUS DOT ORG]
Sent: Wednesday, March 12, 2003 2:05 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Partial page writes - TSM mirroring vs OS/hardware mirroring


Hi everybody,

I have read several discussions in the archive about the TSM mirroring
versus OS/Hardware mirroring of the database and/or log.

Both those discussions and the Administrator's Guide mention "partial
page writes". To see if I understand correctly:

- When writing a database or log page to disk, there is a point
  in time when the on-disk structure of a volume is invalid. If the
  process of writing that page is interrupted (e.g. power outage)
  at the "wrong" time, the on-disk structure remains invalid.

- The TSM server can be configured to create a mirrored database or
  log, and, when updating a page on disk, to first update the page on the
  first mirrored copy and then update the page on the second mirrored copy.
  This way, a partial page write can still occur, but by sequentially
  updating the mirrored copies there is at most one mirrored copy that
  is invalid due to the partial page write. The other copy is valid.

- When starting the TSM server, it cannot use an invalid copy of a
  a database volume. If no valid mirror is available, the TSM server
  cannot start and a database restore is necessary.

- A partial page write is a shortcoming of TSM; the on-disk structure
  should always be valid. Page writes should happen atomically. (Of course,
  the responsibility of TSM doesn't need to go further than the "sync"
  procedures of the OS. If the OS says the data is synced to disk, TSM
  can assume it *is* synced to disk. Otherwise, the OS/drivers/hardware
  should be fixed.)


My question is: in recent versions of TSM, do page writes happen atomically
or not? I would like to use the mirroring in our Symmetrix, but if TSM is
still vulnerable to the problem of partial page writes invalidating volumes
I would have to use TSM mirroring.

Thanks,
--
Jurjen Oskam

PGP Key available at http://www.stupendous.org/

<Prev in Thread] Current Thread [Next in Thread>