• Please help support our sponsors by considering their products and services.
    Our sponsors enable us to serve you with this high-speed Internet connection and fast webservers you are currently using at ADSM.ORG.
    They support this free flow of information and knowledge exchange service at no cost to you.

    Please welcome our latest sponsor Tectrade . We can show our appreciation by learning more about Tectrade Solutions
  • Community Tip: Please Give Thanks to Those Sharing Their Knowledge.

    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.

  • Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)

    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

[HOWTO] Restore 1,000's of VMs in Hours

rowl

ADSM.ORG Senior Member
#1
I am being asked how we can restore 1,000's of VMs in a few hours (no more details than that). This is for recovery from patching cycle gone wrong, ransomware, or other large scale software corruption.

My first thought would be that this problem would be best solved by crash consistent storage level snapshots augmented with a VM backup solution that offered instant access for those VMs that couldn't be booted from the snapshots.

Curious if others have had these sort of vague requirements in their environments and what you came up with.

Thanks,
-Rowl
 

RecoveryOne

ADSM.ORG Senior Member
#2
Wow. That's a tall order.
I'm honestly not sure how that could be done without a massive overhaul of the entire infrastructure. 10, 100gb Ethernet? 32g or InfiniBand SAN? SSD's for all your storage for both TSM and VM's ?
Even then, the TSM Server would likely have to be pretty beefy. Heck, would it make sense to to have 5+ servers and storage so they could each process a subset of the workload?

Perhaps Spectrum Protect Plus? I know there's some new kids on the backup block like zerto that claim to be able to do just that.

I'd be interested, rowl, if you do manage to come up with a way.
 

rowl

ADSM.ORG Senior Member
#3
Started looking into this, 1,000 average sized VMs would be around 100TB in size. The network infrastructure required to move that much data in "hours" doesn't seem realistic, not to mention if the source/target could support that sort of I/O load. I think we need a way to bring up the VM's live on the backup storage, then start a long process of vMotion jobs to get everything back where it belongs on production storage.

I wonder what 1 PB of solid state storage for my disk pools will cost :)
 

RecoveryOne

ADSM.ORG Senior Member
#4
Right.
Short of 'hot standby'? Or maybe delayed replication, or replication with versions.
Perhaps a HA cluster for every workload in VM? But that doesn't stop ransomware at least.
Everything should be NOW right? :)

I feel your pain, we have 700ish VM's and its been stated to my face many times, we couldn't rely on TSM for a true DR scenario due to the amount of time it takes to restore a single VM. So, when I start down the path of infrastructure requirements at the physical layer, and the costs associated with that upgrade. Server requirements for VM farm and TSM. The fact that Corporate made us buy 7200rpm drives for 'capacity not speed' I kindly point out those facts and such limitations.

What gets me is these cloud vendors claim they are able to deploy hundreds of vm's in minutes, and executive leadership seems to think that it also translates to 'restoring and hundreds of vm's in minutes' as well. I would love to see a product that can restore 700VM's when each VM has over 1tb of storage associated with it in minutes. From scratch and not just simply reattaching the vmdk's that were present on disk.
Just from the VM Host CPU and IO limitations alone, I'm not entirely sure its feasible within normal budgetary means.

Then again, I could be wrong and would love someone to prove me wrong.
 

Advertise at ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

UpCloud high performance VPS at $5/month

Get started with $25 in credits on Cloud Servers. You must use link below to receive the credit. Use the promo to get upto 5 month of FREE Linux VPS.

The Spectrum Protect TLA (Three-Letter Acronym): ISP or something else?

  • Every product needs a TLA, Let's call it ISP (IBM Spectrum Protect).

    Votes: 16 18.8%
  • Keep using TSM for Spectrum Protect.

    Votes: 52 61.2%
  • Let's be formal and just say Spectrum Protect

    Votes: 10 11.8%
  • Other (please comement)

    Votes: 7 8.2%

Forum statistics

Threads
31,449
Messages
133,979
Members
21,549
Latest member
idhelmyy
Top