ADSM-L

Re: Incremental forever -- any problems? (Scary thoughts)

2001-12-19 05:18:56
Subject: Re: Incremental forever -- any problems? (Scary thoughts)
From: Zlatko Krastev/ACIT <acit AT ATTGLOBAL DOT NET>
Date: Wed, 19 Dec 2001 12:19:03 +0200
It is up to the (file) server administrator how he/she will setup, tune and
maintain the node.
1 TB of data can be put on a single filesystem - it is up to you. In my
practice I've even seen an Oracle server on Netware with three (separate)
disks compromising a single very large SYS: volume !?! With all files -
both users' and Oracle's there. You cannot say this is odd design of
Netware or Oracle, can you? And guess what happened when one of those disks
crashed with no *any* backup.
So TSM as many other software products is very flexible. If tuned properly
(and fed with enough resources) it can do amazing things. Your LTO drives
in 3584 must be able to achieve about 12-15 MB/s which ought to be 40-50
GB/hour or nearly 100 GB/hour with two drives. If you cannot achieve this
then the things are not tuned. And if 8 hours restore is too much for you
(700 / 100 + 1 hour for OS installation) there is enough place for more
drives in 3584. The management has to decide what is cheaper - 2, 3 or 4
tape drives or several hours downtime more in case of disaster. This is not
a technical decision to be made.
And do not forget about test restores. A sergeant does not start train
rookies only after the enemy has attacked. And even the rookies do not load
357 Magnum with .22 bullets.

Zlatko Krastev
IT Consultant





Daniel Sparrman <daniel.sparrman AT EXIST DOT SE> on 18.12.2001 22:50:44
Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
To:     ADSM-L AT VM.MARIST DOT EDU
cc:

Subject:        Re: Incremental forever -- any problems? (Scary thoughts)

Hi
One of our customers is running a medium site with 180 servers, with about
10TB of storage.
Their using an IBM 3584 Anaconda with 2 fibre attached drives.
The machine is a IBM P-Series 640, with RAID 1+0.
One of the largest servers is about 700GB. Its the fileserver running user
data and home directories.
Mount time, including search of files on the tape is about 2 min. When
restoring 1GB, the total time is about 8 min. This means That a total
restore of the server would take about 70 hours to complete. This formula
is 2 mins to search and mount tape, 6 mins to restore data.
The customers P-Series machine is equipped with 2 100Mbs Ethernet cards,
and 1 IBM Token-Ring 100Mbs card. The test was on one of the ethernet
cards.
Today, the customer is using OTG DiskXtender. This is for two reasons; one
to save primary diskspace, the other to minimize the amount of data that
has to be restored in an event of disaster.
The LTO drives can perform 15MB/s, or 30MB/s compressed.
The P-Series machine is not the bottleneck. Usually, the network sends
about 3000 packets with a peak at 7000. During backup, 27.000 packets is
sent with a peak at 50.000. According to the communications guys, this is
very high.
The clients is Compaq Proliant machines with about 4GB of memory, two
processors running at Xeon 750(i think).
So, there shouldn't be a bottleneck.
According to the communications guys, the maximum theoretical speed of
100Mbs ethernet is about 12.5MB/s, or running at full duplex, 25MB/s. The
first problem with this is that this is a one way communication(server to
client).
With 12.5MB/s restore time, the total restore of 700GB would take 15 hours.
Who has a primary fileserver that can be down for 15 hours?
And, this is only theoretical.
With 1GB ethernet, the theoretical capacity is about 30MB/s. And this is
only theoretical. The restore would take about 7.5 hours. Whats if this
happend in the morning? Would the users take vacation and come back the
next day?
Just some thoughts....
Best Regards
Daniel Sparrman
-----------------------------------
Daniel Sparrman
Daniel Sparrman
Exist i Stockholm AB
Bergkllavgen 31D
192 79 SOLLENTUNA
xel: 08 - 754 98 00
Mobil: 070 - 399 27 51