ADSM-L

Re: Backup Strategy and Performance using a PACS Digital Imaging System

2003-03-04 17:26:40
Subject: Re: Backup Strategy and Performance using a PACS Digital Imaging System
From: bbullock <bbullock AT MICRON DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 4 Mar 2003 15:11:52 -0700
        Hmmm, an issue we all deal with in varying degrees...

Answer #1:
        Yes, it will take longer to do your expire inventory. TSM will scan 
through all the files looking for candidates, and performance is typically 
measured in " X thousand files examined in X seconds", the more files the more 
seconds... no way around the math.

Answer #2, 3 & 4:
        If the data is fairly static once it's written and will be on an NT 
host, it sounds like the journaling option for backups would be the best 
solution to improve backup performance. IMHO.

        I have a few monster hosts as you do, with millions of files & even 
more directories. They reside on Unix hosts, so journaling backups are not 
available.

        On some, the option of "multiple TSM instances" works well as the data 
is on separate mount points. You can create a TSM instance for each mount point 
and have multiple backups running on the host at the same time. The downside is 
that each will compete for CPU, memory, and I/O, so performance may suffer.

        Additional issue you will eventually encounter....

         While you are concerned with the backing up of the data, my 
concern/problem has always been with restoring the data in a timely manner. The 
more files in a filesystem, the longer it takes to do restores on it. I won't 
bore you with the details, but you may want to look in the ADSM archives for 
2001 at http://search.adsm.org/ for a discussion called "Performance Large 
Files vs. Small Files" that went on about the type of data you have (static, 
small files, long term).

Thanks,
Ben

-----Original Message-----
From: Todd Lundstedt [mailto:Todd_Lundstedt AT VIA-CHRISTI DOT ORG]
Sent: Tuesday, March 04, 2003 11:38 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Backup Strategy and Performance using a PACS Digital Imaging
System


Heya *SMers,
This is a long one, grab a beverage.

I am running TSM 4.2.1.7 on AIX 4.3.3 on a dual processor B80 backing up
around 112 nodes with 5.2TB capacity 2.3TB used, storing 18.6 million
objects (3.2TB) in the primary and copy storage pools.  The TSM database is
8.7GB running around 80% utilization.  All but a few of the nodes are
server class machines.  We backup about 250GB a night, and on weekends we
do full TDP SQL database backups to the tune of an additional 450 GB (and
growing).  Expiration processing occurs daily, and completes in
approximately 70 minutes.  The four Fuji PACS servers we have are included
in the above numbers, but only the application and OS, not the images and
clinical notes (less than 1k text files).

FYI.. where TSM and disk management are concerned, Fuji is the DEVIL!.
Each image, and each 1k note file with text referencing an image are stored
in their own directory.. image_directory/imagefile and
text_directory/textfile.. a one to one relationship.  To backup the
directories/textfiles now takes the backup process over 12 hours to
complete, incrementally backing up very little.  The backup has to scan the
files to see what needs to be backed up (this is not TSM yet, but some
other backup software).

The powers that be are asking what it would take to move all of the data
stored on DVDs in the DVD jukebox (images) to magnetic media disk based
storage.  Then, start backing all of that up to TSM.  I have some numbers
from the PACS administrator.  On the four PACS servers, the additional data
they would like TSM to backup tallies up to...
1.5+ million folders
1.0+ million files (yes... more folders than files...)
2.2+ TB storage (images and text files)

All of this data will not change.  Once it backs up, it will very likely
never need to be backed up again.  Because of that, I am recommending three
tape storage pools at a minimum: one primary, one on-site copy, and one
off-site copy.  I would actually like to have two off-site copy storage
pools.  Since this data doesn't change, and no additional backups will
occur for the files, there will be no need for reclamation.  The extra copy
storage pools are a safety net in case we have bad media spots/tapes.
Without reclamation, we will never know if we have bad media.

So, at a minimum, 3 storage pools containing a total of 7.5+ million
objects ((directory+files)*3) will use up 4.3GB of a TSM database (7.5
million * 600 bytes).  The amount of growth per year is being estimated at
about 4+ GB of TSM database, so, approximately another 2.3+ million
files/folders each year.  It will very likely be more.  (Daily estimates
are 6500 additional files/folders).
Keep in mind.  This data will NOT be changed or deleted in the foreseeable
future.  New data incoming daily.  NO data expires.  I don't know if Fuji
will ever change the way they store their images/text files.

So, here is what I am trying to figure out.
1.  Will adding the additional objects from the PACS servers significantly
increase my expiration processing run time?  Will TSM have to scan all of
those database objects during expiration processing?
2.  I have heard it is possible to run another instance of TSM server on
the same machine.  Would that be a good idea here?  It makes sense to this
novice user.  I wouldn't have to run expiration processing daily on the
PACS TSM instance.
3.  If a second TSM server instance is the recommended course, how
difficult is that to setup?  Any redbooks or how-tos out there regarding
that?  What issues do I have with sharing my LTO library between the two
TSM server instances?  Any redbooks or how-tos on sharing a single 3584
library (five LTO drives, and hoping to get more out of this project)
between two TSM server instances on the same machine?
4.  Regardless of how many TSM server instances, journaling will have to be
setup on each of the NT4.0 PACS servers.  What kind of overhead can we
expect to run journaling on the NT servers (I haven't setup journaling
anywhere, yet)?  Three of the servers each have about 400-500K image
objects, and the fourth server (the one with all of the <1k text files) has
close to 1 million image/text objects (none of that includes the OS or
application files/databases, just image/textreport files).
5.  Due to the fact that the directory and file objects will likely not
change, would there be a pressing need to use a DIRMC (non-migrating)?  I
would suppose not.
6.  Is there a better way to do this?  (probably should have asked that
question first.)
<:+)

Thanks in advance.

Todd Lundstedt
Technical Specialist
Via Christi Information Management Services
ofc.   (316) 261-8385
fax.   (316) 660-0036
Todd_Lundstedt AT via-christi DOT org

<Prev in Thread] Current Thread [Next in Thread>