TSM and PowerHA (HACMP)

DanGiles

ADSM.ORG Senior Member
Joined
Oct 25, 2002
Messages
626
Reaction score
17
Points
0
Location
Toronto, Ont. Canada
We will soon be implementing TSM on AIX with PowerHA. The environment will consistest of two servers - one library manager and one client - plus a number of LAN-free clients. There is not much out there in terms of documentation, and I am mostly concerned with what happens to drives and tapes that are in use during a fail-over.

I went through the redbook for TSM 5.5 & HACMP, and it looks like TSM should clean itself up on start-up. The problem is that I've seldom experienced this 'clean-up' in real life! There's usually a lot of manual reseting of drives and volumes after a non-graceful shut-down.

So, anyone using this configuration in real life? Any caveats or recommendations?
 
Sorry Dan,

I have no experience with HACMP deployment of TSM.

But, if all that is adverstised is true, device fail-over should be 'automatic' with AIX. The reason I am saying this is the fact that IBM's DS8000 series SAN arrays uses AIX boxes to control the disk arrays. Failover works almost magically.

If you apply the same logic to a HACMP-defined TSM environment, the device is really under the control of the 'virtual' cluster environment which TSM sees. Therefore, TSM does NOT care which node holds the devices. It just interfaces with the 'virtual' machine.
 
I have no concern over the actual transferance of resources between machines. My concerns refer to my other post: what happens on fail-over when drives are in use? Will TSM actually clean up the library (unload drives, move cartridges) when it restarts?
 
I have no concern over the actual transferance of resources between machines. My concerns refer to my other post: what happens on fail-over when drives are in use? Will TSM actually clean up the library (unload drives, move cartridges) when it restarts?

I believe the answer is 'it will stay as is and keep on running'.

In HACMP settings, it is mandatory to have two fiber paths for the devices - one path to each node. This then takes care of the failover. Likewise, the library should have two paths - one for each node.

The switch over is handled by the virtual environment.

It is also my understanding that the devices is presented to TSM using its virtual device name and not using its real physical name/s.
 
Last edited:
As I said, transfering resources is not my concern: it's transfering resources that are not in their default state that's the concern!
 
As I said, transfering resources is not my concern: it's transfering resources that are not in their default state that's the concern!

This is what I mean.

If the cluster is true to its form, TSM would not care where the resource is: whether default or not, and whether it falls back to default or not. As far as TSM goes, it still thinks there is nothing that happened behind the scene since the virtual environment presents it consistently. Thus, things will keep on running.
 
Nyet. Forget about HA for now. A client is backing up to a tape drive on the TSM server. The server goes down un-gracefully, leaving the tape in the drive. TSM comes back up and the client re-connects to finish its backup. What happens? Does the TSM remember its state when it went down a give the drive and cartridge back to the client, or does TSM goes "okay, you'll want this tape. Wait, the tape isn't in it's right element - I'll give you a new one and load it into this drive. Wait, there's a cartridge in there that shouldn't be - spit out an error!"

According to the TSM/HA redbook, it does the former (more or less). According to experience, it does the latter.
 
I see your point from the get go.

The latter statement is true for standalone systems; and I won't argue with you on that since that has and always been my experience.

However, you introduced the HACMP picture. In all my work with clusters - not with TSM in particular - the well setup ones behave properly as advertised; and the app chugs along well. Thus, I am convienced that the former scenario will be true. TSM (as an application sitting on top of the HACMP environment) will pickup where it left-off after a failover occured.

A power lost is another issue :)
 
I guess this is one of those "I'll believe it when I see it" scenarios. I'll probably set up a string of tests, but if the first two actually work as advertised, I'll save myself a lot of time and trust them. ;p

These servers are in major data centres, so we don't have to worry about power failures XO <tongue planted firmly in cheek>
 
Back
Top