ADSM-L

Re: Litigation! Wish

2006-12-26 23:59:09
Subject: Re: Litigation! Wish
From: Steven Harris <steve AT STEVENHARRIS DOT INFO>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 27 Dec 2006 14:58:41 +1000
Speaking of needles in haystacks,  a one time colleague of mine was
working for a company that analyzed oil seismic survey data on a large
array of IBM clustered machines.  He was a very smart cookie and
understood the geophysics of it all (Hi Stephen if you are listening)

The data came in on reels of tape and represented survey data from a
surveyed line.  A full survey consisted of a series of evenly spaced
lines that mapped an area.

He took this data and using the TSM API somehow stored it  on 3590s
(this was back in the old ADSM 3 days).  The smart part was that oil
companies could ask for an analysis of an area and give the coordinates
that they wanted.  His software would figure out which bits of data he
needed from TSM mount the appropriate tapes and gather the data then
feed it into the machines for analysis.  This enabled his company to
effectively leverage  their investment in surveys and also provide the
data faster to customers than anyone else could.

At least that's what he told me :)

The TSM API could be used to do a whole stack of this sort of storage
work, but it is hampered by the lack of an API in something we can use,
eg perl, python, or my current favourite ruby.  I've taken a brief look
at writing a library interface to ruby,  but it is somewhat difficult -
especially for an old COBOL programmer like me - why does everyone have
to write in C anyway!

Has anyone  on the list done any work along these lines?

Regards

Steve

Steven  Harris
AIX and TSM Admin
Brisbane Australia

Robin Sharpe wrote:
Well, what I meant was "move data" or "move nodedata"... but now that I
think about it, those commands will have no effect on retention.  They will
only move the data to different volumes, maybe in different storage pools.
A "generate backupset" would make a copy that has it's own retention
criteria, but IMO backupsets are too hard to manage effectively... but then
I haven't really used them that much.  Also, backupsets will only contain
active data, and so may be incomplete in a litigation context.

I think the bottom line here, unfortunately, is that we're trying to make
TSM fulfill a need it was not designed for.  TSM is great for backing up a
system and getting it back to a known operational state.  It's also great
for restoring a single file, set of files, directories, filesystems, etc.
It's not too useful for finding a "needle in a haystack", like "we need all
emails from John Doe to XYZ Corp regarding product X"... there are
archiving systems emerging that can do that kind of function.  It would be
nice if TSM could serve as the back end for such a system so you can
minimize the back-end data store.  I believe there are a couple that can do
that.  Of course, there's no free lunch... implementing an archive solution
like that will cost significant bucks.

-Robin



<Prev in Thread] Current Thread [Next in Thread>