Backup method for Big vmware infrastructures

tpesselier

ADSM.ORG Member
Joined
Sep 1, 2013
Messages
102
Reaction score
0
Points
0
Location
paris
PREDATAR Control23

What is the backup method used for the large vmware infrastructure, more than 1000vm? nbd? hottadd

I think the ndb is not fast and not suitable for large vm
 
PREDATAR Control23

I've been backing up about 850 vm's with hotadd from 5 datamovers, dedup/compression enabled.
From IBM's doc https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.4/client/r_opt_vmmaxparallel.html
"Adjust the values of vmmaxparallel, vmlimitperhost, and vmlimitperdatastore to find the values that provide optimal performance for parallel backups, in your environment. "
the datamover opt file has:
VMMAXParallel 8
VMLIMITPERDATASTORE 12
VMMaxVirtualDisks 8
vmlimitperhost 8

This is using TDP4VE 8.1.1, on vmware 6.5.
When I upgraded the DM's to 8.1.4.1, the vm team upgraded to 6.5 u2 and due to a bug in VDDK 6.5 u2 ( I think its in 6.5 u2, and 6.7) I had to change everything from hotadd to nbd. And yes....there's a performance hit. With NBD, I've had to adjust the above to keep in mind NFC limits within vSphere.
In VDDK 6.7 u1 the fix is in that should allow one to switch back to hotadd. Just there isn't a released upgrade path to 6.7 u1 yet. Just announced, but not GA.
https://www-01.ibm.com/support/docview.wss?uid=swg1IT25208 for the above hotadd issue.

I've yet to try out Spectrum Protect Plus.

Hope it helps.
 
PREDATAR Control23

I am disappointed with tsm, vmware limit to 20% of the bandwidth with nbd.my problem with nbd is that I have many vm with 300go / day of Delta ,it is very complicated. With hotadd I go 6 times faster but I do not know if it works with parallelism and I have plenty of persistent independent disk .... You know that it method used spectrum protect plus?
 
PREDATAR Control23

If you are writing backup directly to a block device, then SPP might work for you.

We POC SPP to write to a Data Domain and the results was far from advertised. The initial code had issues on running inventories when other ESXi clusters are remote - initial inventory took days!

Newer code should have solved this.

Also, using SPP will be a total mind shift as you really do not need TSM (Spectrum Protect) to do your backups and restores.
 
PREDATAR Control23

pour SPP IBM parle seulement de déchargement mais j’ai du mal à deviner ce que c’est, il parle de sauvegarde SAN?
 
PREDATAR Control23

for SPP IBM speaks only of offload but I have a hard time guessing what it is, he talks about SAN backup?
 
PREDATAR Control23

I'm trying to implement a SP4VE setup to back up a large enterprise of 8 vCenters and 6100 VMs spread out across 3 datacentres.. primary target will be disk then tiering off to tape..

This is gonna be fun.
 
PREDATAR Control23

Yep.. how big is big? We have lots of disk and a decent 16G FC and network.. that should be enough right?

I've managed individual TSM servers before but this is my first foray into VE. No cloud happening either, it will all be stored locally.
 
PREDATAR Control23

if you use san method it will work ,it can be even easier if you have SVC :) I'm forced to use NBD in my case it's a disaster.
 
PREDATAR Control23

Can you explain SVC and NBD? I'm not familiar with those acronyms.
 
PREDATAR Control23

I'm currently using hotadd, with nbdssl, then nbd as fallback. Unable to use the SAN transport method currently.
Check this link: https://www.ibm.com/support/knowled.../t_client_tuning_tsm4ve_select_transport.html
If I recall correctly, NBD isn't just limited by TSM, its actually limited by VMWare.
"
vSphere 5 and vSphere 6

to an ESXi host

Limited by a transfer buffer for all NFC connections, enforced by the host; the sum of all NFC connection buffers to an ESXi host cannot exceed 32MB.

52 connections through vCenter Server, including the above per-host limit
"
Likely out of date, but here's the link I pulled the above info from:
https://pubs.vmware.com/vsphere-6-0/index.jsp?topic=/com.vmware.vddk.pg.doc/vddkDataStruct.5.5.html

I posted above with my current settings in the datamover opt file and have since changed them to:
VMMAXParallel 10
VMLIMITPERDATASTORE 0
vmmaxbackupsessions 20
and removed the vmlimitperhost option.
My datamover is virtual and has 6 cores with 6gb of memory.
I was able to reduce the backup window from 4 hours to 2 hours 28 mins with just modifying those settings alone.
My VM farm and TSM server sits on a 10g Ethernet. Only using 8gb SAN links.
Not sure if this helps, but there's a snippet of one of my recent backups after I made the change:
Code:
Total objects deduplicated:                 101
Total number of bytes inspected:          22.42 TB
Total number of bytes processed:           1.64 TB
Total bytes before deduplication:          1.64 TB
Total bytes after deduplication:         278.94 GB
Total number of bytes transferred:        99.67 GB
Data transfer time:                   36,572.19 sec
Network data transfer rate:           48,242.34 KB/sec
Aggregate data transfer rate:        197,767.65 KB/sec
Objects compressed by:                       65%
Deduplication reduction:                  83.43%
Total data reduction ratio:               99.57%
Elapsed processing time:               02:28:41
I could likely push it harder...Just haven't yet. I want to make my VM and Storage team yell at me in the near future however :)


You mentioned independent disks disks above. Check this: https://www-01.ibm.com/support/docview.wss?uid=swg21626104


Just come by and grab those disks with a traditional client. Had to do that for a few workloads as well.
Beyond what I posted, my environment is small vs 6100 VM's. Guess lots of datamovers, enough cpu/memory given to the datamovers if they are virtual, and a nice large pipe to the TSM server?

I'm forced to use NBD in my case it's a disaster.
Ouch. How come you can't use hotadd?

Anyhow, I hope the above has helped at least somewhat.
 
PREDATAR Control23

For your initial VE backup, how did you ensure that all of your VMs are backed up?

Did you stage it per ESX host, or host cluster?

Say for example, in one vCenter I have ~40 clusters, ~240 ESX hosts within supporting ~4800 VMs.

What would be the best way to schedule and organize my VE backups for these? How many datamovers should I go for? Can I have more than one datamover per VE server, or do those sit on VMs themselves?

Thanks.. and sorry for the noob questions, this is all new to me.
 
PREDATAR Control23

We opted to use a folder based structure. X number of VM's per folder and a datamover is assigned to that folder via the "domain.vmfull VMFolder=folder1,folder2" for one datamover, then vmfolder=folder3,4 for the next and so on.
We've a few smaller clusters and we just lumped them into the folder structure.
Right now, each datamover is working about 100vm's. Your results may very.

It may be worth while to get AVP support or start a PMR to assist with this. I just don't have any experience with such a scale and would hate to send you down the wrong path.

Also check out
. Jeremy Brovage does a great job explaining the ins and outs of tdp4ve. This is the first of a 3 part series he posted.

First backup was painful :)

I would be cautious on database / storage space and hope the TSM side of things is sized correctly to handle the size of your environment.
 
Top