ADSM-L

Re: How do you back up 2 PB of data? - done

2002-11-19 19:56:00
Subject: Re: How do you back up 2 PB of data? - done
From: Zlatko Krastev <acit AT ATTGLOBAL DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 20 Nov 2002 02:51:54 +0200
--> 2 PB is 2,048 TB, or 2,097,152 GB.

Not in this case. The article says 2 petabyte, this ought to mean 2x 10^15
not 2^51. Not a big difference, just cheating to simplify calculations :-)
Please accept capital letters below as powers of ten and forget for a
while you are computer-minded person. Disk/tape vendors prefer to
calculate powers of ten and not powers of two when quoting capacity.

Contents:
0-4. Facts
10,11. A solution
12. An imaginary solution
I, II, III. Some remarks


0. Assume everything is IBM-brand (the article is saying IBM will make the
system so forget about any STK, EMC, etc.). So we have to count IBM
Anaconda (3584) libraries - 72 drives, 2206 slots (one is for cleaning
tape). This means 72x 15MB/s = 1080 MB/s (or 3888 GB/h) uncompressed and
capacity of 220 TB.

1. 2000 TB / 220 TB = 9 (nine) libraries just to fit the data whatever
time it takes plus 40% for "Filling" cartridges (4 libraries) plus at
least 10-20% scratches (another 1-2 libraries). Total of about 15
libraries. Compression will be discussed further.

2. Let assume 10 hours backup window to have enough time for reclamation,
backup to copypool, etc. This results 200 TB/hour or ~56 GB/s. So (still
assuming uncompressed) we will need 52 libraries (200 TBph / 3888 GBph ~=
51.44) to get all this backed up in time. Or slightly over 3700 drives.

3. pSeries 690 or new p650 can have up to 8 I/O drawers which should mean
7 GB/s per drawer (read only, read+write should be 14 GB/s). This is
impossible in a single server (at the moment!!)

4. Single ESS with 146 GB HDDs will be 368 x 146 GB = 53728 GB (384 disks
of which 16 are hot-spare). This means we will need at least 38 Sharks to
hold 2 PB.


So lets design a solution:

10. Data is spread over at least 38 Sharks and is drilled by at least
390-400 p690s in an IBM Cluster 1600 (new name of SP). There ought to be a
huge SAN fabric compromised of many switches/directors between Sharks and
p690s. Now let have for each Shark an "SAN-edge" director with 256 ports -
16 for the ESS itself, another 16 for connection to SAN-core switches,
72+72 for two Anacondas and we still have 80 ports free.

11. (Solution A) Lets double the number of Sharks used and put FlashCopy
there. This will use 16 more ports on the switch for it plus another 16
for SAN-core connectivity. Afterwards lets put a TSM server on each pair
of FlashCopy Sharks and backup the mirrors. We will have to backup ~54 TB
in 10 hours or about 1.5 GB/s.
-       Two maxed Anaconda libraries will give us 2.16 GB/s (OK);
-       The TSM server can be installed on p650 with 32 (hit the max #)
2Gb FC HBAs (16x ~200MB = 1.6 GB in each direction if no adapter is used
in full-duplex; OK);
-       Eight I/O drawers should provide 55 slots (OK, maybe 6 will be
enough too);
-       Four RIO loops even if same speed as in p690 will provide 1000 MB
each for total of 4GB/s (OK);
-       144 drives connecting to 16 HBAs means 9 drives per adapter, 9x 15
MB/s = 135 MB/s < 200MB/s (OK);
-       32 FC HBAs will use 32 ports of 48 (80-16-16) free on the
switch/director (OK).

12. (Solution B) Imagine ITSM v5.2 or 6.0 is out and server-free is
available for AIX. Inside the switch or connected to those 80 free ports
on the switch/director we can have SAN data-movers. A single mighty ITSM
server can dispatch all data-movers which will move the data between
corresponding Shark and libraries.


I. Now remember I left compression at the end. All those calculations made
a solution using neither node nor drive compression. If we can get 1.75
compression we can actually use only one 3584 library for each ESS
II. Calculation were made assuming LTO 1-st generation. In year 2004 at
least 2-nd generation LTO should be available doubling the speed. Also at
that time for sure new 3590 drives based on 1GB prototype ought to be also
available and I bet their speed will be higher than LTO.
III. You forgot about "progressive backup". If we perform daily
incrementals are we talking about 100% daily change. First incremental (or
selective backup over the weekend) may last more than 10 hours.
IV. What about such a limit of ITSM - database size of single server. Now
the limit is 530 GB and it would take many hours to backup it - server DB
backup is still single-threaded. With the split to 38 servers DBs become
manageable but the servers ....
V. If you sign for this job give me a hint how to be contracted too :-))

Zlatko Krastev
IT Consultant






Dan Foster <dsf AT GBLX DOT NET>
Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
19.11.2002 19:06
Please respond to "ADSM: Dist Stor Manager"


        To:     ADSM-L AT VM.MARIST DOT EDU
        cc:
        Subject:        How do you back up 2 PB of data?


2 PB is 2,048 TB, or 2,097,152 GB.

A fun thought exercise:

http://www.cnn.com/2002/TECH/biztech/11/19/ibm.supercomputerr.ap/index.html

Well, assuming several things:

        1. Using LTO (just because I know the numbers for this best off
           the top of my head) -- a 3584 library

        2. LTO delivers maximum of 30 MB/sec in compressed mode, but
           22-23 MB/sec is usually realistic. Let's use 22.5 MB/sec.

        3. Typically 1.7:1 to 1.8:1 ratio for hardware compression
           Let's use 1.75, or 175 GB for a 100 GB uncompressed tape.

        4. 72 drives per maxed out LTO setup (1 base frame + 5 expansion
           frames) for about 2000 tapes in all frames?

        5. A single 3584 complex therefore delivers (using hardware
           compression) a grand total of 175 GB * 72 = 12.6 TB of
           compressed data *within* the library at any one time, and
           assuming the client is constantly streaming data to the ITSM
           server at peak efficiency, can back up 81 GB per hour at
           max write-to-tape speeds.

        6. Assuming a 16 hour window for all backups to complete per
           day (so that you have time for other ITSM server processing),
           that's 81 * 16, or 1.3 TB per 3584 _drive_ per day. 72 * 1.3
           means a single 3584 complex can do about 94 TB per day.

        7. For a single full backup of 2 PB, that's 2048 TB, or 2,097,152
           GB... or about 12,000 maxed out LTO tapes. Since a single fully
           fleshed out 3584 library is about 2,000 tapes... that would
mean
           6 3584 libraries for tape capacity alone.

        8. 2048 TB divided by 94 TB yields about 22 3584 libraries.

        9. Then you've got the small problem of having to come up with
           an appropriate ITSM server design... for starters, number
           of slots required would be incredible. You'd put max of 2
           3580 drives on a single Ultra HVD SCSI adapter... so 72 drives
           per complex would be 36 slots alone! 36 slots multiplied by
           22 complexes would be 792 slots!

        10. Not sure about a p690 but think it's got a couple hundred
slots?

        11. Then you need more adapters for disk and network controllers.
            To support 22 MB/sec over 1,584 drives concurrently would
be...
            465 gigabit ethernet adapters assuming a perfectly tuned setup
            that can push 600 Mbps per adapter through.

        12. You'd probably kill the bus with so much data zipping around
            long before you max out the slots... more likely you would
need
            multiple (6-10?) p690 Regatta systems *just* to deal with ITSM
            backups for 2 PB of data alone.

        13. The HVAC requirements for all these disks must be interesting
;)
            For the disks -- data, diskpool, db... total BTUs/hr would
            possibly be in neighborhood of about 3 million BTUs/hr which
            demands *seriously* beefy HVAC units for the disks alone, and
            nevermind for the servers, routers, etc...!

        14. Probably has their own electrical substation for the computer
            room(s) alone. Run on an UPS? If they went to the extent of
            having own electrical substation, they might as well... The
            disks alone are probably going to eat about 15,300 amps at
            the bare minimum... total for entire room could be in
            neighborhood of 30-40,000 amps when you consider the large
            network equipment, servers, and other supporting
infrastructure.

I listed LTO and pSeries here just simply because I know the numbers and
hardware the best, but feel free to offer other possible approaches.

Keep in mind, all that is only a small part of the big picture... this
one is *just* for a single full backup, and doesn't take into account
the long-term needs such as ITSM db sizing or I/O loading of db or
diskpool
disks; each hard drive has a finite amount of I/Os it can do at any given
time. Then you've got other issues such as performance vs reliability,
which becomes even more tricky with the extremely large scale setups
because use of RAID-5 could become a *very* real serious bottleneck that
gums up the entire works.

I actually wonder if ITSM on zSeries hardware would actually be better in
this particular scenario because mainframes typically have superior I/O
management, far beyond simple tricks like I/O pacing that exists on
commercial UNIX OSes. Mainframes also have incredible I/O capabilities.
Saw a zSeries box, had about 500 I/O controllers, and was still humming
along just fine even under varying workloads. But I think that's balanced
somewhat by the extensive training and support requirements, along with
licensing and support contract costs.

I do imagine that if I was the data center manager for that site, I'd
be hiring an entire team of senior ITSM administrators with 20 years of
experience ;) Teams of operators to deal with tape loads/unloads alone!

I also can't imagine the vaulting requirements if that's 12,000 tapes for
a full backup and assuming 10% incremental change daily... 1,200 tapes
multiplied by say, a 8 week cycle... is 72,000 plus that 12k for a full
backup... 84,000 tapes. That also assumes the data can be recycled every
8 weeks... if there are special legal considerations (such as that
sometimes involves very sensitive stuff such as nuclear test results),
that could be kept for years. In which case... 1,200 * 365 * 20 would be
8.76 million tapes. ;)

Encryption might be required -- would 56 bit DES satisfy legal and site
requirements? Or you might also have to do network path-based encryption
such as IPSec and 3DES in addition to client side encryption; the network
encryption in such a large setup would probably incur a serious CPU hit.
You could install crypto accelerators, but that'd imply even more cards...

I'd also be concerned about potential for hitting some internal ITSM
limits that 99.9999% of the sites out there don't ever hit. Don't even
want to think about any disaster recovery requirements which would make
the entire setup *even* larger and more complex!

If I was the (DoE?) IT team looking at this purchase, I'd have put in a
condition in the vendor RFP indicating that a sale of such a large
system must also demonstrate how one would deal with backups. Hopefully
they did it as an integral part of the evaluation process, and not as
an afterthought.

Anybody want to do the hardware installation? Months, if not years,
of assembling and cabling up :-)

Where do I sign up for such an unique and extremely challenging job of
administering such a setup? ;)

-Dan

<Prev in Thread] Current Thread [Next in Thread>
  • Re: How do you back up 2 PB of data? - done, Zlatko Krastev <=