ADSM-L

Re: How do you back up 2 PB of data?

2002-11-19 12:29:35
Subject: Re: How do you back up 2 PB of data?
From: Orville Lantto <orville.lantto AT DATATREND DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 19 Nov 2002 11:21:08 -0600
A 72 drive, 10 I/O slot 3584 library will hold 2207 cartridges.  with 175
GB/cartridge that works out to 6 libraries.

Orville L. Lantto
Datatrend Technologies, Inc.  (http://www.datatrend.com)
IBM Premier Business Partner
121 Cheshire Lane, Suite 700
Minnetonka, MN 55305
Email: Orville.Lantto AT datatrend DOT com
V: 952-931-1203
F: 952-931-1293
C: 612-770-9166




Dan Foster <dsf AT GBLX DOT NET>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
11/19/2002 11:06 AM
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        How do you back up 2 PB of data?


2 PB is 2,048 TB, or 2,097,152 GB.

A fun thought exercise:

http://www.cnn.com/2002/TECH/biztech/11/19/ibm.supercomputerr.ap/index.html

Well, assuming several things:

        1. Using LTO (just because I know the numbers for this best off
           the top of my head) -- a 3584 library

        2. LTO delivers maximum of 30 MB/sec in compressed mode, but
           22-23 MB/sec is usually realistic. Let's use 22.5 MB/sec.

        3. Typically 1.7:1 to 1.8:1 ratio for hardware compression
           Let's use 1.75, or 175 GB for a 100 GB uncompressed tape.

        4. 72 drives per maxed out LTO setup (1 base frame + 5 expansion
           frames) for about 2000 tapes in all frames?

        5. A single 3584 complex therefore delivers (using hardware
           compression) a grand total of 175 GB * 72 = 12.6 TB of
           compressed data *within* the library at any one time, and
           assuming the client is constantly streaming data to the ITSM
           server at peak efficiency, can back up 81 GB per hour at
           max write-to-tape speeds.

        6. Assuming a 16 hour window for all backups to complete per
           day (so that you have time for other ITSM server processing),
           that's 81 * 16, or 1.3 TB per 3584 _drive_ per day. 72 * 1.3
           means a single 3584 complex can do about 94 TB per day.

        7. For a single full backup of 2 PB, that's 2048 TB, or 2,097,152
           GB... or about 12,000 maxed out LTO tapes. Since a single fully
           fleshed out 3584 library is about 2,000 tapes... that would
mean
           6 3584 libraries for tape capacity alone.

        8. 2048 TB divided by 94 TB yields about 22 3584 libraries.

        9. Then you've got the small problem of having to come up with
           an appropriate ITSM server design... for starters, number
           of slots required would be incredible. You'd put max of 2
           3580 drives on a single Ultra HVD SCSI adapter... so 72 drives
           per complex would be 36 slots alone! 36 slots multiplied by
           22 complexes would be 792 slots!

        10. Not sure about a p690 but think it's got a couple hundred
slots?

        11. Then you need more adapters for disk and network controllers.
            To support 22 MB/sec over 1,584 drives concurrently would
be...
            465 gigabit ethernet adapters assuming a perfectly tuned setup
            that can push 600 Mbps per adapter through.

        12. You'd probably kill the bus with so much data zipping around
            long before you max out the slots... more likely you would
need
            multiple (6-10?) p690 Regatta systems *just* to deal with ITSM
            backups for 2 PB of data alone.

        13. The HVAC requirements for all these disks must be interesting
;)
            For the disks -- data, diskpool, db... total BTUs/hr would
            possibly be in neighborhood of about 3 million BTUs/hr which
            demands *seriously* beefy HVAC units for the disks alone, and
            nevermind for the servers, routers, etc...!

        14. Probably has their own electrical substation for the computer
            room(s) alone. Run on an UPS? If they went to the extent of
            having own electrical substation, they might as well... The
            disks alone are probably going to eat about 15,300 amps at
            the bare minimum... total for entire room could be in
            neighborhood of 30-40,000 amps when you consider the large
            network equipment, servers, and other supporting
infrastructure.

I listed LTO and pSeries here just simply because I know the numbers and
hardware the best, but feel free to offer other possible approaches.

Keep in mind, all that is only a small part of the big picture... this
one is *just* for a single full backup, and doesn't take into account
the long-term needs such as ITSM db sizing or I/O loading of db or
diskpool
disks; each hard drive has a finite amount of I/Os it can do at any given
time. Then you've got other issues such as performance vs reliability,
which becomes even more tricky with the extremely large scale setups
because use of RAID-5 could become a *very* real serious bottleneck that
gums up the entire works.

I actually wonder if ITSM on zSeries hardware would actually be better in
this particular scenario because mainframes typically have superior I/O
management, far beyond simple tricks like I/O pacing that exists on
commercial UNIX OSes. Mainframes also have incredible I/O capabilities.
Saw a zSeries box, had about 500 I/O controllers, and was still humming
along just fine even under varying workloads. But I think that's balanced
somewhat by the extensive training and support requirements, along with
licensing and support contract costs.

I do imagine that if I was the data center manager for that site, I'd
be hiring an entire team of senior ITSM administrators with 20 years of
experience ;) Teams of operators to deal with tape loads/unloads alone!

I also can't imagine the vaulting requirements if that's 12,000 tapes for
a full backup and assuming 10% incremental change daily... 1,200 tapes
multiplied by say, a 8 week cycle... is 72,000 plus that 12k for a full
backup... 84,000 tapes. That also assumes the data can be recycled every
8 weeks... if there are special legal considerations (such as that
sometimes involves very sensitive stuff such as nuclear test results),
that could be kept for years. In which case... 1,200 * 365 * 20 would be
8.76 million tapes. ;)

Encryption might be required -- would 56 bit DES satisfy legal and site
requirements? Or you might also have to do network path-based encryption
such as IPSec and 3DES in addition to client side encryption; the network
encryption in such a large setup would probably incur a serious CPU hit.
You could install crypto accelerators, but that'd imply even more cards...

I'd also be concerned about potential for hitting some internal ITSM
limits that 99.9999% of the sites out there don't ever hit. Don't even
want to think about any disaster recovery requirements which would make
the entire setup *even* larger and more complex!

If I was the (DoE?) IT team looking at this purchase, I'd have put in a
condition in the vendor RFP indicating that a sale of such a large
system must also demonstrate how one would deal with backups. Hopefully
they did it as an integral part of the evaluation process, and not as
an afterthought.

Anybody want to do the hardware installation? Months, if not years,
of assembling and cabling up :-)

Where do I sign up for such an unique and extremely challenging job of
administering such a setup? ;)

-Dan

<Prev in Thread] Current Thread [Next in Thread>