ADSM-L

Re: Incremental forever -- any problems? (Scary thoughts)

2001-12-19 06:18:27
Subject: Re: Incremental forever -- any problems? (Scary thoughts)
From: Daniel Sparrman <daniel.sparrman AT EXIST DOT SE>
Date: Wed, 19 Dec 2001 12:09:51 +0100
The bottleneck in a solution like this is probably not the 3584 drives,
which, as you correctly suggest, should perform about 12-15MB/s.

But, can the local area network perform this can of speed? A 100Mbs
ethernet cannot do 15MB/s.

So, 4-5 for a restore of 700GB is a dream that won't come true.

All test restores that listed her shows about 15-20GB/hour.

This would mean some time to do a restore.

My suggestion was to look at other solutions, like HSM, Application
Extenders, Bare Metal Restore.

With these products, you can at least cut the restore time in
half(according to Gartner, 60% of a fileserver containts archived data,
that can be migrated. This data doesn't have to be restored in case of a
disaster, only the stubfiles, which normally are 512 bytes).

Best Regards

Daniel Sparrman
-----------------------------------
Daniel Sparrman
Daniel Sparrman
Exist i Stockholm AB
Bergkällavägen 31D
192 79 SOLLENTUNA
Växel: 08 - 754 98 00
Mobil: 070 - 399 27 51


                                                                                
                                   
                    Zlatko                                                      
                                   
                    Krastev/ACIT         To:     ADSM-L AT VM.MARIST DOT EDU    
                                          
                    <acit@ATTGLOB        cc:                                    
                                   
                    AL.NET>              Subject:     Re: Incremental forever 
-- any problems? (Scary thoughts)    
                    Sent by:                                                    
                                   
                    "ADSM: Dist                                                 
                                   
                    Stor Manager"                                               
                                   
                    <ADSM-L AT VM DOT MA                                        
                                          
                    RIST.EDU>                                                   
                                   
                                                                                
                                   
                                                                                
                                   
                    2001-12-19                                                  
                                   
                    11:19                                                       
                                   
                    Please                                                      
                                   
                    respond to                                                  
                                   
                    "ADSM: Dist                                                 
                                   
                    Stor Manager"                                               
                                   
                                                                                
                                   
                                                                                
                                   




It is up to the (file) server administrator how he/she will setup, tune and
maintain the node.
1 TB of data can be put on a single filesystem - it is up to you. In my
practice I've even seen an Oracle server on Netware with three (separate)
disks compromising a single very large SYS: volume !?! With all files -
both users' and Oracle's there. You cannot say this is odd design of
Netware or Oracle, can you? And guess what happened when one of those disks
crashed with no *any* backup.
So TSM as many other software products is very flexible. If tuned properly
(and fed with enough resources) it can do amazing things. Your LTO drives
in 3584 must be able to achieve about 12-15 MB/s which ought to be 40-50
GB/hour or nearly 100 GB/hour with two drives. If you cannot achieve this
then the things are not tuned. And if 8 hours restore is too much for you
(700 / 100 + 1 hour for OS installation) there is enough place for more
drives in 3584. The management has to decide what is cheaper - 2, 3 or 4
tape drives or several hours downtime more in case of disaster. This is not
a technical decision to be made.
And do not forget about test restores. A sergeant does not start train
rookies only after the enemy has attacked. And even the rookies do not load
357 Magnum with .22 bullets.

Zlatko Krastev
IT Consultant





Daniel Sparrman <daniel.sparrman AT EXIST DOT SE> on 18.12.2001 22:50:44
Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To:     ADSM-L AT VM.MARIST DOT EDU
cc:

Subject:        Re: Incremental forever -- any problems? (Scary thoughts)

Hi
One of our customers is running a medium site with 180 servers, with about
10TB of storage.
Their using an IBM 3584 Anaconda with 2 fibre attached drives.
The machine is a IBM P-Series 640, with RAID 1+0.
One of the largest servers is about 700GB. Its the fileserver running user
data and home directories.
Mount time, including search of files on the tape is about 2 min. When
restoring 1GB, the total time is about 8 min. This means That a total
restore of the server would take about 70 hours to complete. This formula
is 2 mins to search and mount tape, 6 mins to restore data.
The customers P-Series machine is equipped with 2 100Mbs Ethernet cards,
and 1 IBM Token-Ring 100Mbs card. The test was on one of the ethernet
cards.
Today, the customer is using OTG DiskXtender. This is for two reasons; one
to save primary diskspace, the other to minimize the amount of data that
has to be restored in an event of disaster.
The LTO drives can perform 15MB/s, or 30MB/s compressed.
The P-Series machine is not the bottleneck. Usually, the network sends
about 3000 packets with a peak at 7000. During backup, 27.000 packets is
sent with a peak at 50.000. According to the communications guys, this is
very high.
The clients is Compaq Proliant machines with about 4GB of memory, two
processors running at Xeon 750(i think).
So, there shouldn't be a bottleneck.
According to the communications guys, the maximum theoretical speed of
100Mbs ethernet is about 12.5MB/s, or running at full duplex, 25MB/s. The
first problem with this is that this is a one way communication(server to
client).
With 12.5MB/s restore time, the total restore of 700GB would take 15 hours.
Who has a primary fileserver that can be down for 15 hours?
And, this is only theoretical.
With 1GB ethernet, the theoretical capacity is about 30MB/s. And this is
only theoretical. The restore would take about 7.5 hours. Whats if this
happend in the morning? Would the users take vacation and come back the
next day?
Just some thoughts....
Best Regards
Daniel Sparrman
-----------------------------------
Daniel Sparrman
Daniel Sparrman
Exist i Stockholm AB
Bergkllavgen 31D
192 79 SOLLENTUNA
xel: 08 - 754 98 00
Mobil: 070 - 399 27 51