ADSM-L

Re: Incremental forever -- any problems? (Scary thoughts)

2001-12-18 17:43:38
Subject: Re: Incremental forever -- any problems? (Scary thoughts)
From: Kelly Lipp <lipp AT STORSOL DOT COM>
Date: Tue, 18 Dec 2001 15:38:16 -0700
Excellent analysis.

No, the users would not tolerate this.

So what are people doing about this?

Nothing.  Hoping.  Praying.

The good news is the RAID will save most of the days.  But every now and
then something is going to break.

I am strongly advocating a smaller is better line of thought.  Big things
are very hard to manage.  This includes databases and fileservers.  Too many
eggs in one basket.

Kelly J. Lipp
Storage Solutions Specialists, Inc.
PO Box 51313
Colorado Springs, CO 80949
lipp AT storsol DOT com or kelly.lipp AT storserver DOT com
www.storsol.com or www.storserver.com
(719)531-5926
Fax: (240)539-7175

  -----Original Message-----
  From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
Daniel Sparrman
  Sent: Tuesday, December 18, 2001 1:51 PM
  To: ADSM-L AT VM.MARIST DOT EDU
  Subject: Re: Incremental forever -- any problems? (Scary thoughts)


  Hi

  One of our customers is running a medium site with 180 servers, with about
10TB of storage.

  Their using an IBM 3584 Anaconda with 2 fibre attached drives.

  The machine is a IBM P-Series 640, with RAID 1+0.

  One of the largest servers is about 700GB. Its the fileserver running user
data and home directories.

  Mount time, including search of files on the tape is about 2 min. When
restoring 1GB, the total time is about 8 min. This means That a total
restore of the server would take about 70 hours to complete. This formula is
2 mins to search and mount tape, 6 mins to restore data.

  The customers P-Series machine is equipped with 2 100Mbs Ethernet cards,
and 1 IBM Token-Ring 100Mbs card. The test was on one of the ethernet cards.

  Today, the customer is using OTG DiskXtender. This is for two reasons; one
to save primary diskspace, the other to minimize the amount of data that has
to be restored in an event of disaster.

  The LTO drives can perform 15MB/s, or 30MB/s compressed.

  The P-Series machine is not the bottleneck. Usually, the network sends
about 3000 packets with a peak at 7000. During backup, 27.000 packets is
sent with a peak at 50.000. According to the communications guys, this is
very high.

  The clients is Compaq Proliant machines with about 4GB of memory, two
processors running at Xeon 750(i think).

  So, there shouldn't be a bottleneck.

  According to the communications guys, the maximum theoretical speed of
100Mbs ethernet is about 12.5MB/s, or running at full duplex, 25MB/s. The
first problem with this is that this is a one way communication(server to
client).

  With 12.5MB/s restore time, the total restore of 700GB would take 15
hours.

  Who has a primary fileserver that can be down for 15 hours?

  And, this is only theoretical.

  With 1GB ethernet, the theoretical capacity is about 30MB/s. And this is
only theoretical. The restore would take about 7.5 hours. Whats if this
happend in the morning? Would the users take vacation and come back the next
day?

  Just some thoughts....

  Best Regards

  Daniel Sparrman

  -----------------------------------
  Daniel Sparrman
  Exist i Stockholm AB
  Bergkdllavdgen 31D
  192 79 SOLLENTUNA
  Vdxel: 08 - 754 98 00
  Mobil: 070 - 399 27 51


  Alex Paschal <AlexPaschal AT FREIGHTLINER DOT COM>
  Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
  2001-12-18 11:06 PST
  Please respond to "ADSM: Dist Stor Manager"

  To: ADSM-L AT VM.MARIST DOT EDU
  cc:
  bcc:
  Subject: Re: Incremental forever -- any problems?




  I agree with Wanda.  Any kind of modern library and tape technology adds
  very little time to the restore.  WELL, ok, the costly ones, anyway.  My
1TB
  NT fileserver (I know, I know) lives on 99 primary pool tapes right now.
  Collocate=filespace, so I'll be doing 3 restores at the same time,
assuming
  even distribution, and assuming every tape must be mounted during the
  restore, that's about 33 mounts per filespace, or, assuming a 60 second
  mount, an additional half hour due to mounts.  I'm willing to bet that's
not
  my bottleneck.  STK 9840, STK Powderhorn 9310 (6000 slot library), ACSLS 6
  (library manager), DTELM 6.1 (external library manager for TSM to talk to)

  I can really see no point in doing full backups except to give management
a
  warm fuzzy and justify buying more network.

  How about the rest of you?  What mount times are you seeing with your
  libraries and how many tapes does your largest box live on?  SHOW
  VOLUMEUSAGE NODENAME is a quick way to eyeball it.  It's an unsupported
  command, so I assume no liability if it brings your server down.

  Alex Paschal
  Storage Administrator
  Freightliner, LLC
  (503) 745-6850 phone/vmail

  -----Original Message-----
  From: Prather, Wanda [mailto:Wanda.Prather AT JHUAPL DOT EDU]
  Sent: Tuesday, December 18, 2001 7:17 AM
  To: ADSM-L AT VM.MARIST DOT EDU
  Subject: Re: Incremental forever -- any problems?


  Hi Robin,

  We use STK 9840 drives in an STK9710 robot (forerunner of the L700, I
  think).
  Mount time for the 9840 is under 30 seconds; maybe 40 seconds to write for
  an append.
  Dismount is also very fast because they rewind to the middle of the tape
  instead of the beginning, I think.

  The faster drives make running collocation quite painless; even if you
have
  to mount 10 tapes on a restore, that only adds 5 minutes total to the
  restore time.

  The 9840 is in the same class as the IBM 3590 drive; MUCH faster to mount
  and locate than DLT. (and yep, lots more $)

  We tried DLT drives in the 9710 first.  Worked OK for our low-volume TSM
  server; just couldn't take the pounding on our high-volume TSM server.  We
  especially got hurt by the DLT "false cleans".  Too much start/stop/append
  activity on the tapes made them subject to I/O errors on readback; each
time
  you get an I/O error it triggers a false clean; so doing the restore you
get
  up to 3 minutes to mount, plus another 3-6 minutes to process the cleaning
  tape after the tape dismounts!  That was a killer.

  Your mileage may vary....

  ************************************************************************
  Wanda Prather
  The Johns Hopkins Applied Physics Lab
  443-778-8769
  wanda_prather AT jhuapl DOT edu

  "Intelligence has much less practical application than you'd think" -
  Scott Adams/Dilbert
  ************************************************************************




  -----Original Message-----
  From: Robin Sharpe [mailto:Robin_Sharpe AT BERLEX DOT COM]
  Sent: Monday, December 17, 2001 1:59 PM
  To: ADSM-L AT VM.MARIST DOT EDU
  Subject: Re: Incremental forever -- any problems?


  That's interesting... what kind of tape drives?
  We have two DLT libraries:  An ATL P3000 and AN HP 20/700 (rebadged STK
  L700).  The ATL has rather slow robotics, but is very reliable.  The HP
has
  somewhat faster robotics, but for some reason takes much longer to label
  new tapes.

  I think the major bottleneck is the tapes... DLTs take a long time to
  mount... at least 90 seconds, usually more like 2 minutes.  If you are
  appending to the end of the tape, even longer... and that does not
  necessarily correspond to  percentage full, because DLT is serpentine; it
  writes to the end of tape, then back to the front several times.

  So if you need, say, two dozen tape mounts for a restore (which is not
  uncommon),  that could easily add an hour to the restore time.

  Robin Sharpe
  Berlex Labs



                     "Prather, Wanda"
                     <Wanda.Prather@J
                     HUAPL.EDU>       To:    ADSM-L AT VM.MARIST DOT EDU
                                      cc:    (bcc: Robin Sharpe/WA/USR/SHG)
                     12/17/01 11:18   Subject:
                     AM                      Re: Incremental forever -- any
  problems?
                     Please respond
                     to "ADSM: Dist
                     Stor Manager"







  We have the opposite situation - we have fast robotics and use
collocation.
  With collocation on fast tape, it doesn't matter whether you are doing 2
  weeks or 2 years of data, a restore takes the same amount of time.

  Doing periodic fulls doesn't "refresh" anything, from TSM's point of
view -
  the original backups are still in the TSM DB and still available, even if
  they are 5 years old.  If you do periodic fulls, you have to retransmit
  everything over the network again, and you have to adjust your policies to
  make sure you allow those redundant versions to be kept; you increase the
  size of your DB and the amount of reclaims you have to do.

  Doing periodic "fulls" would do nothing whatever for us, except bog down
  the
  network.

  I suggest you try doing a large restore to test your own capabilities.  If
  you can't restore in a timely fashion, FIRST figure out what your
  bottleneck
  is before you decide to "fix" it by doing full backups.

  Then if you find out you still can't do restores in a timely fashion, at
  least check out the use of BACKUPSETS.  They give you all the client's
  active data on one tape, without retransmitting all the data, and without
  creating an extra zillion entries in your DB.