Re. War stories: Restores

There are several keys to speed in restoring a large number files with TSM; 
they are:
  1.. If using WindowsNT/2000 or AIX, be sure to use DIRMC, storing primary 
pool on disk, migrate to FILE on disk, then copy-pool both (this avoids tape 
mounts for the directories not stored in TSM db due to ACL's);
  I've seen *two* centralized ways to implement DIRMC -- (1) using 
client-option-set, or (2) establish the DIRMC management class as the one with 
the longest retention (in each affected policy domain); 
  2.. Restore the directories first, using -DIRSONLY (this minimizes NTFS 
db-insert thrashing); 
  3.. Consider multiple, parallel restores of high-level directories -- despite 
potential contention for tapes in common, you want to keep the data flowing on 
at least one session to maximize restore speed; 
  4.. Consider using CLASSIC restore, rather than no-query restore -- this will 
minimize tape mounts, as classic restore analyzes which files to request and 
has the server sort the tapes needed -- though tape mounts may not be an issue 
with your high-performance configuration; 
  5.. If you must use RAID-5, realize that you will spend TWO write cycles for 
every write;  if using EMC RAID-S (or ESS), you may want to increase 
write-cache to as large as allowed (or turn it off, altogether).  Using 9 or 15 
physical disks will help.
A client of mine just had a server disk failure last weekend;  it had local 
disk configured with RAID-5 (hardware RAID controller attached to Dell-Win2000 
server) -- after addressing items 1 to 3, above, we were able to saturate the 
100Mbps network, achieving 10-15 GB/Hr for the entire restore -- only delays 
incurred were attributable to tape mounts... this customer had an 
over-committed silo, so tapes not in silo had to be checked-in on demand.  316 
GB restored in approx. 30 hours.  Their data was stored under 10 high-level 
directories, so we ran two restore sessions in parallel -- only had two tape 
drives -- and disabled other client schedules during this exercise.

For your situation, 250 GB and millions of files, and assuming DIRMC (item #1, 
above), you should be able to see 5 - 10 GB/Hr -- 50 hours at 5 GB/Hr, 25 hours 
at 10 GB/Hr.  So you are looking at two or three days, typically.

Large numbers of small files is the "Achilles Heal" of any file-based 
backup/restore operation -- restore is the slowest (since you are fighting with 
the file system of the client OS) because of the way file systems traverse 
directories and reorganize branches "on the fly", it's important to minimize 
the "re-org" processing (in NTFS, by populating the branches with leaves AFTER 
first creating all the branches). We did some benchmarks and compared notes 
with IBM;  on another client, we developed the basic expectation that 2-7 GB/Hr 
was the "standard" for comparison purposes -- you can exceed that number by 
observing the first 3 recommended configuration items, above.

How to mitigate this:  (a) use image backup (now available for Unix, soon to be 
available on Win2000) in concert with file-level progressive incremental; and 
(b) limit your file server file systems to either 100 GB or "X" million files, 
then start a separate file system or server upon reaching that threshold... You 
need to test for your environment to determine what is the acceptable standard 
to implement.

Hope this helps.

Don France

Technical Architect - Tivoli Certified Consultant



Professional Association of Contract Employees (P.A.C.E.)
San Jose, CA
Re. War stories: Restores > 200GB ?