Networker

Re: [Networker] HSM Modules (was Re: Migrating from Networker to Commvault galaxy)

2008-07-09 13:06:24
Subject: Re: [Networker] HSM Modules (was Re: Migrating from Networker to Commvault galaxy)
From: Bruce Breidall <Bruce.Breidall AT CONCUR DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 9 Jul 2008 10:01:01 -0700
I am basically referring to very large file systems, with very many
millions of objects. Software based HSM components that are typically
sold with the backup products you buy are tough to manage when the
numbers get really big. The internal HSM structures used to keep track
of the file systems they are controlling have upper limits, and you
typically won't find that out from the marketing manuals.

There are a lot of factors here that would dictate one direction or
another, and what I said was very vague - the subject is too far
reaching for what I commented on to be anywhere near adequate in making
sense.

I have some experience with software based HSM products that ended up
being unmanageable because of the sheer number of objects under HSM
control, and we had to go after a different solution that was
application controlled and hardware based. Fortunately, we had a good
application group to work with. Probably because they were also directly
affected by some of the negative outcomes of dealing with these large
file systems and the pain involved in trying to get backups.

The comment I was making is that if you have file systems like the one I
described, and it is owned and populated by some type of application, it
makes great sense to somehow convince the application owners to add
policy logic around the data so that it can also move this data to an
appropriate tier of storage that is replicated on the backend for DR
purposes. If the application already has indexes to all the data, then
why not put in logic to have it move the data to where it makes the most
sense?

If there is no hope to put the intelligence in the application, then you
are forced to look at some other type of solution like you are
describing. Just be prepared to have a lot more headaches, depending on
the HSM product you end up choosing. Definitely ask for a customer
reference before buying.

This application capability scenario typically does not exist, but it
should because inserting another layer (i.e. HSM product like Veritas
Storage Migrator) typically just makes your life more difficult -
especially when you are dealing with millions and millions of objects.

I think I was initially trying to make a NW connection to this topic,
but I forgot what it was.



-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of John Stoffel
Sent: Wednesday, July 09, 2008 11:00 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] HSM Modules (was Re: Migrating from Networker to
Commvault galaxy)

>>>>> "Bruce" == Bruce Breidall <Bruce.Breidall AT CONCUR DOT COM> writes:

Bruce> I would be cautious installing any third party HSM module,
Bruce> especially if the number of files reaches into the millions.

Are you talking in regards to Networker and third party HSM tools? 

Bruce> You will find that it will be impossible to finish your sweeps
Bruce> of the objects, and then you really have a mess. The management
Bruce> side will be a nightmare (unable to clean up orphaned objects,
Bruce> unable to finish migrations, etc...).

I can sorta see what you're saying, but I'd like to know more context
here, esp since I think HSM is a valid way to improve the
manageability of data and backups.  

For example, we have quite a number of NFS file systems on NetApps
which we'd love to do HSM on because it would allow us to migrate
un-used data off to cheaper storage.  And it would allow us to NOT
have to do full backups every month of data that never changes.

Maybe synthetic fulls are the answer here, so that we only run
Incrementals each night and then once a month we build a synthetic
full.  This could be a big help if it works properly.

Bruce> If the data is currently indexed and owned by an application,
Bruce> it is there that policies should be enforced and data movement
Bruce> is controlled.  Easier said than done....but start planting the
Bruce> seed now.

What if the applications are just plain dumb?  Or the users who manage
the data don't *have* any type of indexing tool that works with their
tools to help manage their data.  

I find your statement confusing and I'd like to know more. 

Bruce> Archiving and backups should be completely separate entities,
Bruce> and they should never touch - especially when you are trying to
Bruce> address large file systems with 100's of millions of files. The
Bruce> vendors will have you thinking otherwise, because they know how
Bruce> difficult it is to get this accomplished.

Let's get our terminology straight, because we can't discuss stuff
properly if we don't agree on the basics.  *grin*


Backup:

Copying the data in a filesystem to some other media.  Can be a true
copy, an incremental copy of just changed files/dirs.  The idea is
that this data is used for restoration of entire filesystems,
directories or single files.  

Archive:

Copy and DELETION of data from the source to another form of Media.
Can be indexed to the file level or not.  Restore to the source file
system involves intervention.  This process grabs all files/dirs and
their contents when run.

HSM:

The movement of files/dirs from one storage device to another while
presenting a consistent and unified interface to the enduser so that
they do not know that anything has changed.  This process is designed
to move files/dirs not accessed in a configurable time frame to
cheaper media.

Ideally HSM integrates into the Backup/Archive section above so that
when you move files from primary to secondary storage, the backup does
NOT need to continue to backup the secondary storage.  But on restore,
the end-user visible filesystem can be restored easily and quickly.

John

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER