ADSM-L

Lots of newbie questions

2006-08-10 13:11:41
Subject: Lots of newbie questions
From: Dave Mussulman <mussulma AT UIUC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 10 Aug 2006 12:10:09 -0500
Hello,

Forgive the laundry list of questions, but here's a few things as a
newbie I don't quite understand.  Each paragraph is a different
question/topic, so feel free to chime in on just a few or any that
you're comfortable answering.  Thanks!

I'm using Operational Reporting with the automatic notification turned
on for failed or missed schedules.  I have a node associated with a
schedule that no longer exists (not powered on,) just to test failures
and notifications.  However, I never get notifications about failed or
missed schedules from it (not the email or mentioned in the daily
report.)  In the client schedules part of the report, it's always in a
Pending status.  At what point does pending turn into failed or missed?
How can I configure that so I get notifications about systems that
missed their scheduled backup?

I'm using an administrative schedule to backup my DB to a FILE class
twice a day, and then I do full backups of the DB to tape right before
my offsite rotation.  I read somewhere that since I'm using DRM, I
shouldn't use 'del volhist' to remove old db backups.  However, I don't
think the DRMDBBACKUPEXPIREDAYS configuration setting is applying to my
FILE backups.  Is that normal?  Should I be running both drm and 'del
volhist'?

I do my backups to a DISK pool that has a 85/40 migrate percentage and a
tape next storage pool.  If everything (my disk and tape pool) is synced
up to my copypool before a backup runs, and the backup only goes to
disk, the 'backup stg' for the tape has nothing to do.  I understand
that.  If I backup the disk pool and manually migrate the data from disk
to tape, and then backup the tape pool, it has nothing to do.  (Since
that data was already in the copypool.)  I understand that.  But, if
during a backup the tape pool starts an automatic migration, the next
time I do a 'backup stg' for the tape pool, it has data to copy to the
copypool.  So, what's happening?  Since the migration is going on, does
TSM automatically route data from the node directly to the tape?  (My
maxsize parameter for the disk pool is No Limit, so I would guess no.)
Or is TSM migrating from the disk pool the newest data that wasn't
already copied to the copypool?  In that case, why doesn't it migrate
older data that's already copied?  What's the selection criteria for
what gets migrated?  .... Or would best practice say to manually migrate
the disk pool daily to minimize the chance of this condition?

Best practice-wise, do you try to define different private groups and
tape labels for onsite, archive and offsite storage?  Or do people
really just make one large 'pool' of tapes and not care if tape 0004 has
archive data on it and stays for years, 0005 goes offsite, 0006 has a
two week DB backup on it, and 0007 is a normal tape pool tape?  Since
there's not one (standard) unified view for volumes (DB backups, offsite
backups, checked in scratch volumes, not checked in scratch volumes) I
worry a little about keeping track of and 'losing something' if they're
all in one group.  How do sites handle that issue?

I have a LTO3 tape library and an external LTO3 drive.  In our Networker
environment, we found it a pretty good practice to have a drive outside
of the jukebox for one-off operations (old restores, etc.) as well as
some sort of fault tolerance if the jukebox or that SCSI bus went south.
How do I setup that environment in TSM?  It looks like I cannot use the
same device class across two libraries.  Doesn't that hinder me if I
want to use the external drive in the same way as the jukebox drives,
sharing storage pools, etc.?  My jukebox isn't very large and I
anticipate having to use overflow storage pools, which is where being
able to mix the manual library (external drive) and SCSI library would
be nice.

Consolidating copypool tapes for offsite use.  I had my reclaimation
threshold for my copypool at 40%.  I used 'backup stg' with maxpr=[# of
drives] to minimize the amount of time the backup took.  However, it
left me with many under-utilized offsite drives, that as soon as I moved
offsite, were then reclaimed (which then sat until the reusedelay
expired.)  That seems inefficient - that I move a larger than necessary
number of tapes each time I do an offsite rotation (right now, weekly)
and as soon as they're offsite, they're reclaimed.  To fix this, I put
the reclaimation threshold back to 100%, and set it down just before my
offsite rotation.  I've also taken a look at the to-be-offsited tapes
and do some 'move data's as required to try to minimize the amount of
offsite tapes.  Is that standard practice?  I feel like I'm fighting the
natural way TSM works, given that it makes so many other decisions just
fine without my direct intervention.  (And that's a compliment - I can't
say that for Networker.) Is there something I'm missing to make offsite
tape usage more streamlined?  So my offsite rotation procedure is
starting to look like:
- expire inventory
- backup all the local storage groups to copy groups
- reduce copypool reclaimation threshold to 40%
- wait for offsite reclaimation to finish
- raise the copypool reclaimation threshold to 100%
- do a series of 'move data' commands for volumes to try to minimize
  number of to-be-offsited volumes (not sure how I'd script this)
- do a database backup
- move drm the copypool tapes and take them offsite

One of the selling points for me to switch from Networker to TSM is it
gave me MacOS backups "for free."  I'm seeing more Intel based Macs, and
I've heard the 5.3.3 client doesn't support them.  Does anyone know the
status of an Intel Mac working client?

I'm believe I understand the way a DR scenario should work with
copypools and a complete DB restore from an offsite tape.  (We're not
production with TSM yet, and that's still on my list to test/learn.) I'm
not as confident about the scenario where the database is intact, but I
want to restore some data that still exists on copypool tapes but has
been expired from the active database?  Do I have to create a full DR
scenario to get that data back, or is there a way to 'merge' a DB backup
containing data about a certain node or timeframe into a live, working
system?  Networker had tools to do this, but I don't think I've found an
similar practice for TSM.  (Or is the standard BOFH answer "We don't do
that." and adjust the copygroups so that the revisions they need are
kept on the server, or fall back to archives or backupsets?)

Phew!  I think that's it -- for now.  Thanks again,

Dave

<Prev in Thread] Current Thread [Next in Thread>