ADSM-L

[ADSM-L] Battles with TSM for VE

2013-01-08 14:38:28
Subject: [ADSM-L] Battles with TSM for VE
From: Neil Schofield <neil.schofield AT YORKSHIREWATER.CO DOT UK>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 8 Jan 2013 19:24:07 +0000
We've been using TSM for VE in production for about 6 months now and
although it generally works well, there have been a number of minor issues
which remain unresolved. However one problem stands out above the others
and despite extensive discussions with IBM support, we've been unable to
achieve a resolution. I just wanted to run it past the ADSM-L community to
gain their perspective.

There are about 500 VMs (predominantly Windows Server) on ESX 4.1 that we
back up on a nightly basis, with each VM getting one full backup a week and
incrementals (using change block tracking)  on the other nights.

Consider the following scenario:
- A backup proxy server accessing the ESX disk LUNs over a SAN running TSM
for VE and with LAN-free (Storage Agent) access to tape library A
- A dedicated primary disk storage pool (VMCTLDISK) on the TSM server for
storing the CTL data
- A dedicated primary tape storage pool (VMDATATAPE) using a device class
in library A for the VM backup data
- A dedicated copy tape storage pool (VMCTLTAPE) using a device class in
library  A to provide a backup for the CTL data on disk
- VMDATATAPE is collocated by filespace so each VM's backup data is on the
smallest number of tapes
- VMCTLTAPE contains only one or two tape volumes to hold the copy of all
the CTL data for all VMs (no collocation)
- A daily admin schedule backs up CTL data in the primary storage pool to
the copy storage pool after the backup window for the TSM for VE clients

Now a full backup of a VM works fine. The backup proxy server sends CTL
data over the LAN to disk on the TSM server and VM backup data over the SAN
to library A.

Incremental VM backups work less well. Under the covers, incremental
backups involve a significant amount of restore processing by the client as
it restores previously backed up CTL data. In the scenario above, we
naively expected the process for restoring the CTL data to be the reverse
of the backup process - ie the CTL data would be accessed over the LAN from
the primary disk storage pool on the TSM servers.

However it quickly became evident that the TSM for VE client was favouring
the far slower tape volume in the copy storage pool when it came to
restoring CTL data for every incremental backup (presumably on the basis
that the tape volume could be mounted LAN-free while the disk volume
couldn't). For the relatively small amount of data involved when restoring
the CTL files (compared to the size of the backup data), the overhead of
mounting the tape was significant. Even worse though, those one or two copy
storage pool volumes became a massive source of contention when running
multiple concurrent incremental VM backups.

I can't find an easy way of inhibiting LAN-free access to the copy storage
pool volumes by the backup proxy server without affecting it's ability to
store (and restore) the VM backup data using LAN-free.

When we discovered this behaviour the only relief I could find was to put
in place a spectacularly ugly work-around which involved running for 99% of
the day with the volumes in the copy storage pool holding the CTL data
updated to have an access mode of unavailable. This forces the TSM for VE
client to restore the CTL files from the primary disk storage pool during
incremental VM backups. The script which performs the storage pool backup
first updates the copy storage pool volumes to read/write and then changes
them back to unavailable upon completion. This doesn't sit well with me
because volumes which aren't read/write typically indicate trouble and it
means mods to our monitoring to ignore these volumes.

At the same time as we introduced this, we logged a PMR with IBM. This has
now been relegated to the status of a documentation APAR to describe the
behaviour, so it looks like the kludge we implemented could become
permanent.

Am I being unreasonable in thinking the way IBM have implemented this is
flawed?

For info, TSM for VE is v6.2 and TSM Server is v5.5.6

Regards
Neil Schofield
Technical Leader
Yorkshire Water


 ----------------------------------------

Spotted a leak?
If you spot a leak please report it immediately. Call us on 0800 57 3553 or go 
to http://www.yorkshirewater.com/leaks

Get a free water saving pack
Don't forget to request your free water and energy saving pack, it could save 
you money on your utility bills and help you conserve water. 
http://www.yorkshirewater.com/savewater

The information in this e-mail is confidential and may also be legally 
privileged. The contents are intended for recipient only and are subject to the 
legal notice available at http://www.keldagroup.com/email.htm
Yorkshire Water Services Limited
Registered Office Western House, Halifax Road, Bradford, BD6 2SZ
Registered in England and Wales No 2366682

<Prev in Thread] Current Thread [Next in Thread>