Issues with TSM 6.2.2 and vStorage API (VADP) full image backups

wpinegar

Active Newcomer
Joined
Jan 8, 2011
Messages
20
Reaction score
1
Points
0
We have successfully installed the TSM 6.2.2 release for Windows 64-bit on a virtual machine designed as a TSM vStorage API backup proxy within our vmWare vSphere 4.1 environment (and yes this configuration is supported by both vmWare/EMC and IBM). The backup proxy is a Windows Server 2008 R2 64-bit guest configured as required in the TSM 6.2.2 documentation and is using the vStorage API to backup guests. We are using TSM to take full image backups of guests in a vSphere 4.1 environment. The guests are all running version 7 hardware. At this point we are not performing file-level backups using VADP.

Our testing with 6.2.2 revealed that TSM is capable of backing up some virtual machines but with others the TSM client may crash (generate a dump file with an unhandled exception) while attempting to backup an image of the machine. The backup failure rate with TSM 6.2.2 in our environment is 41% (over 219 machines tested) that fail to backup and crash TSM. There are several issues that appear to cause the TSM client to crash when using the vStorage API to backup virtual machines. They are as follows:

1. TSM has a higher probability of crashing when backing up machines on a vSphere cluster that has been configured to use virtual switches and port groups. Usually removing the NIC card on the virtual machine resolves the issue and the machine can then be backed up without issue. TSM may also crash on a machine with multiple NIC cards.

2. TSM has a higher probability of crashing when backing up a machine with a large number of SCSI disks. Removing several SCSI disks may resolve the issue. We have two (2) virtual machines with 9 SCSI disks and the TSM client consistently crashed when attempting to backup each machine until 3 SCSI disks were removed from each configuration or, strangely, the NIC cards were removed from the configuration.

3. TSM will crash the backup proxy guest if you attempt to take a full image backup of the TSM proxy (from the proxy). At the end of the full image backup TSM will inadvertently remove the C: drive from the proxy virtual machine rendering the machine un-bootable. We're certain this is a side-effect of running the TSM proxy as a vSphere guest, however this is a supported configuration so taking an image backup of the proxy (from the proxy) should also be supported.

The behavior that causes the TSM 6.2.2 client to crash appears to be highly dependent on the configuration of the virtual machine. Some machines back up just fine while others crash the TSM client every time. So TSM can either backup the virtual machine or it cannot and it typically stays this way until a configuration change is made on the virtual machine.

There are two (2) other minor issues:

1. Although the TSM client enables Change Block Tracking (CBT) in the guest during the backup there are currently no provisions in TSM to take incremental image backups of virtual machines.

2. There is a minor visual issue in the GUI client that causes virtual machine backups with similar names to be grouped under one virtual machine while attempting to perform a VM restore through the GUI. For instance if we have virtual machines named test, test2, test3, test4 and backup all of these virtual machines using to the same TSM proxy node when we open the GUI client and attempt to restore the virtual machine named 'test' we see that the following virtual machine listed under the 'test' machine: test, test2, test3, test4 as if they were backed up to one filesystem. This is just a simple visual issue in the GUI. The command-line TSM client correctly shows that the backups for these virtual machines are all being stored under separate filesystems on the TSM server.

PMR's have been opened with IBM to resolve these issues.
 
Last edited:
1. TSM has a higher probability of crashing when backing up machines on a vSphere cluster that has been configured to use virtual switches and port groups. Usually removing the NIC card on the virtual machine resolves the issue and the machine can then be backed up without issue. TSM may also crash on a machine with multiple NIC cards.

The above issue was found to be a defect and TSM apar IC74021 opened today. The issue only occurs when the Virtual machine is configured to use a VMWARE DISTRIBUTED VIRTUAL SWITCH PORT..

2. TSM has a higher probability of crashing when backing up a machine with a large number of SCSI disks. Removing several SCSI disks may resolve the issue. We have two (2) virtual machines with 9 SCSI disks and the TSM client consistently crashed when attempting to backup each machine until 3 SCSI disks were removed from each configuration or, strangely, the NIC cards were removed from the configuration.

While problem is being looked at with TSM Support.

3. TSM will crash the backup proxy guest if you attempt to take a full image backup of the TSM proxy (from the proxy). At the end of the full image backup TSM will inadvertently remove the C: drive from the proxy virtual machine rendering the machine un-bootable. We're certain this is a side-effect of running the TSM proxy as a vSphere guest, however this is a supported configuration so taking an image backup of the proxy (from the proxy) should also be supported.

TSM Support has recreated this issue and it is being investigated with development.

2. There is a minor visual issue in the GUI client that causes virtual machine backups with similar names to be grouped under one virtual machine while attempting to perform a VM restore through the GUI. For instance if we have virtual machines named test, test2, test3, test4 and backup all of these virtual machines using to the same TSM proxy node when we open the GUI client and attempt to restore the virtual machine named 'test' we see that the following virtual machine listed under the 'test' machine: test, test2, test3, test4 as if they were backed up to one filesystem. This is just a simple visual issue in the GUI. The command-line TSM client correctly shows that the backups for these virtual machines are all being stored under separate filesystems on the TSM server.

This problem has been recreated and apar IC74022 has been opened to address it.

Robert DeSelle
TSM Level2 support
 
Last edited:
1. Although the TSM client enables Change Block Tracking (CBT) in the guest during the backup there are currently no provisions in TSM to take incremental image backups of virtual machines.
.

As I understand from IBM presentations for business partners, CBT used for incremental image backups will be available in TSM client soon, but as a feature that is separately licensed, something called TDP for VM.
 
Update for #3 above.

3. TSM will crash the backup proxy guest if you attempt to take a full image backup of the TSM proxy (from the proxy). At the end of the full image backup TSM will inadvertently remove the C: drive from the proxy virtual machine rendering the machine un-bootable. We're certain this is a side-effect of running the TSM proxy as a vSphere guest, however this is a supported configuration so taking an image backup of the proxy (from the proxy) should also be supported.

This problem is been recreated by support and development. It has been found and acknowledged by vmware support that there is a defect in Vmware's hotadd transport method.
TSM development is currently waiting to hear from vmware support on when the fix will be available.
To work around this particular issue, the only option is to disable the hotadd backup by renaming the
diskLibPlugin.dll file.

This file can be found in the \plugins directory.

Robert DeSelle
TSM Level 2 support
 
As I understand from IBM presentations for business partners, CBT used for incremental image backups will be available in TSM client soon, but as a feature that is separately licensed, something called TDP for VM.

That's good to know. It looks like all of the plumbing to perform an incremental backup of a VM is there but it isn't enabled in the current client...
 
Update for #3 above.

3. TSM will crash the backup proxy guest if you attempt to take a full image backup of the TSM proxy (from the proxy). At the end of the full image backup TSM will inadvertently remove the C: drive from the proxy virtual machine rendering the machine un-bootable. We're certain this is a side-effect of running the TSM proxy as a vSphere guest, however this is a supported configuration so taking an image backup of the proxy (from the proxy) should also be supported.

This problem is been recreated by support and development. It has been found and acknowledged by vmware support that there is a defect in Vmware's hotadd transport method.
TSM development is currently waiting to hear from vmware support on when the fix will be available.
To work around this particular issue, the only option is to disable the hotadd backup by renaming the diskLibPlugin.dll file.

This file can be found in the \plugins directory.

Robert DeSelle
TSM Level 2 support

Excellent. Thank you. Yes, we can confirm that forcing the transport to nbdssl by renaming the 'diskLibPlugin.dll' file does resolve this issue. We've enjoyed working with you to resolve the various issues that we've encountered. Let us know if you need anything else.
 
Here are a few other minor issues that we've encountered that may need to be addressed by IBM at some point along with some suggestions for additional changes IBM could make based on our experiences using TSM to backup virtual machines:

1. When TSM crashes while performing a VSTOR backup any hot-add disks mounted by TSM are left connected to the proxy server. So we usually have to spend time cleaning up behind TSM when it would crash attempting to backup a VM. We discovered that we could move the proxy to our testing cluster -- these ESX servers do not have access to the LUN's for our production servers -- which, of course, would force TSM to use an NBDSSL backup (network backup instead of mounting the SAN disks) so that if TSM crashed while attempting to backup a virtual machine no mounted disks would remain connected and we had less cleanup after a failed VM backup (chiefly any snapshots left over by TSM which could easily scripted). Would it be possible to add an option similar to VCB that would allow us to override the transport type for a VSTOR backup instead of allowing TSM to select the best transport type? Also, would it be possible to develop a method to direct TSM to cleanup any hot-add disks gracefully after a failed backup, such as calling an application that would cleanup hotadd disks from the VM proxy? While renaming the DiskLibPlugin.dll, suggested in an earlier discussion, might be a potential workaround to force the TSM client to always use the 'NBDSSL' transport it would be nice for TSM to have a configurable option instead, such as -VMBackTransport with "NBDSSL", "NBD", "HOTADD", "RETRY".

2. The TSM 6.2.2 documentation states that VMBACKDIR isn't used with the VSTOR backup type and we've discovered that TSM does indeed create and use VMBACKDIR during a VSTOR virtual machine backup. It doesn't use it in the same way as VCB to store a copy of the virtual machine but it does create temporary directories and files during a backup and then deletes them after afterwards. It creates the following directories: {VMBACKDIR}\fullvm, {VMBACKDIR}\CDF_Local\{machinename} with several more files and directories underneath. There are also additional directories and files created in the user's %TEMP% directory as "vmware-{username}". Files or directories in %TEMP% may be left behind after a successful or failed backup. Misc files or directories that TSM creates should be automatically cleaned up including those in %TEMP%.

3. When using the VCB backup type to backup virtual machines it appears that TSM leaves empty directories around for each machine that is backed up. "%VMBACKDIR%\fullvm\{machinename}". The VSTOR backup type doesn't do this but instead deletes all files and directories that it creates during a backup. It would be nice for VCB to do this as well unless directed otherwise by VMBACKNODELETE.

4. On the TSM 6.2.2 backup proxy we changed the system PATH and added "C:\Program Files\Tivoli\TSM\baclient" and set the DSM_DIR environment variable so TSM can be started from within any directory or command-prompt on the system. This works very well, however we've discovered that dsmc.exe doesn't honor the VDDK log settings specified in "dsmvddk.opt" in the baclient directory if you start dsmc.exe from any other directory outside of TSM. For example, I can open a command prompt on the proxy and then start dsmc.exe from root of the C: drive. Now if I run the command "backup vm {machinename}" the client will display debugging information as if vixDiskLib.transport.LogLevel and vixDiskLib.nfc.LogLevel were set to "6". The only way we found to correct this issue is to copy the dsmvddk.opt file to the same directory we start dsmc from (such as the root of the C: drive in the prior example). This shouldn't be necessary and the file should be picked up from DSM_DIR.

5. We occasionally come across some virtual machines where we cannot take an application consistent quiesced snapshot in vSphere. This is typically a guest issue with high I/O, a driver issue or an installed application -- it would be helpful if there was a way to direct TSM to take a "crash consistent" snapshot which is essentially a snapshot without quiescing the filesystem. This would require having the ability to tell TSM that we don't want it to request a "quiesced" snapshot during the backup of a virtual machine. An option such as -VMQuiescedsnapshot=no or -VMQuiescedsnapshot=yes would potentially do the trick. In addition adding another option for taking a "memory" snapshot in some instances may resolve other infrequent snapshot issues by pausing the entire virtual machine during a quiesce operation. So adding another option -VMMemorysnapshot=yes or -VMMemorysnapshot=no would be good as well even though TSM may not use the memory snapshot for anything. The default snapshot settings would, of course, still remain as they are today with -VMQuiescedsnapshot=yes and -VMMemorysnapshot=no. The defaults that TSM uses works a vast majority of the time but there are some occasions where these two options would be helpful is addressing a backup of a virtual machine while issues are investigated.
 
Last edited:
Back
Top