ADSM-L

Re: strange reclaimation behavor

2005-01-04 13:52:40
Subject: Re: strange reclaimation behavor
From: "Mark D. Rodriguez" <mark AT MDRCONSULT DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 4 Jan 2005 12:52:26 -0600
Hi,

I will attempt to clarify these issues here in regards to reclamation
and migration.  My explanation is based on several years of experience,
course material that I have taught, and conversation with other ITSM
experts.  Another words I have not seen the actual code and without that
no one can say for sure what it is doing.

Migration:

   * Migration process(es) start with the himig threshold is passed.
     The number of processes started is controlled by maxproc value.
   * ITSM will evaluate which node has the most data in the storage
     pool at that time.
   * It then begins to move all of that nodes data on a per file space
     basis, i.e. each file space will get its own process.
   * If there are more processes available then there are file space
     for the node with the most data then it will start to move the
     node with the second most data, again a file space at a time.
   * It will move all the nodes data before it re-evaluates its
     threshold values.  Therefore, once all the nodes data is moved it
     looks at the threshold and if it needs to continue migrating then
     it selects the node with the most data and start to move it.
   * This continues until the low threshold is achieved.
   * Therefore, it is possible for you to adjust the values and have
     migration continue for sometime, even to the point of emptying the
     pool beyond the low threshold value!  Or it may stop right away
     depending one the situation.

Reclamation of Primary Pools (note copy pools are discussed below)

   * When a reclamation value is adjusted down to trigger reclamation a
     list of tapes to be reclaimed is generated (visible in activity
     log) and these are the tapes to be reclaimed.
   * ITSM will start processing the tapes on the list.  Each tape in
     this reclamation will be a separate process.
   * After completing a tape ITSM will re-evaluate the reclamation
     threshold and decide whether to continue.
   * Therefore, for primary storage pools you may in fact see the
     reclamation process end without completing all the tapes on the
     list, but it will not end (unless you cancel it) until the current
     tape has finished reclaiming.

Reclamation of Copy Pools

   * The process is similar for copy pools but it is different.  The
     big difference is that it process all the tapes at one time!
   * When reclaiming copy pool tapes that are off site it uses a
     primary copy of the data that is local.  In order to improve the
     reclamation process ITSM will mount the primary tape and move all
     the NECESSARY files off that primary tape once it is mounted even
     if it means it is in fact reclaiming files from 2 or more
     different tapes.  By doing this, there is no longer the concept of
     reclaiming a single copy pool tape it is in fact reclaiming all
     eligible tapes at once!
   * Therefore, when you adjust the reclamation threshold upwards it
     will not stop the reclamation of copy pool tapes.  They only way
     that process ends is for it to finish all the tapes or you cancel it.

Again I will reiterate, I have not seen the code for any of these
processes, but I have through observation and through printed material
deduced this as being the methodology that ITSM is using.  I am open to
others observations if they are different then mine.  I wish that IBM
would just clearly state what the algorithm truly is.

BTW, there are many techniques that have been talked about on this list
as ways of scripting around some of these difficulties.

Good Luck and I hope this helped clear things up.

--
Regards,
Mark D. Rodriguez
President MDR Consulting, Inc.

===============================================================================
MDR Consulting
The very best in Technical Training and Consulting.
IBM Advanced Business Partner
SAIR Linux and GNU Authorized Center for Education
IBM Certified Advanced Technical Expert, CATE
AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux
Red Hat Certified Engineer, RHCE
===============================================================================



Roger Deschner wrote:

That's not right. I just proved it, inadvertently.

I wanted to deliberately empty out a Disk Storage Pool. I had set
lowmig=0 highmig=0 migprocess=4 and it had dutifully mounted 4 tapes and
started emptying things out in a hurry.

Then 10:00 came, and a daily schedule I had set months ago to shut down
migration at that time, happened, and set it to highmig=75 lowmig=25.
(Classic self-inflicted foot-shooting.) The effect was to cancel all of
the migration processes, even though the storage pool was still 10%
full. These process cancelations took effect when each process reached
the end of the current file, so they didn't all happen at once although
it was fairly quick. Fortunately all the tapes were still mounted so I
got them started again quickly. But what I proved (again) was that
Migration will stop as soon as you change the thresholds, not when the
original lowmig has been reached.

If you are seeing something different, I wonder if your client nodes
are backing up very large individual files?

Reclamation is different from migration. While it does not work on a
pre-built list, you are correct that it continues until the end of the
volume it is currently reclaiming. Once a reclamation process finishes
work on a volume, then the server examines the thresholds all over again
to see if it wants to reclaim another tape. So if the reclamation
threshold for that storage pool has been changed, it will take effect at
that time - at the end of the volume currently being processed, not
after all volumes in some list have been processed. As a result, if you
are doing this on some schedule, and you need the tape drives at a
particular time for something else such as database backup, you can
either change the reclamation theshold several hours in advance, or
develop an OS script to query the process, parse the result, and cancel
it by process number. Even then, the reclamation process cancelation
will not take effect until the end of the file currently being
processed.

The strategy I use to make sure I get the tape drives for database
backup at the scheduled time, is to run migration between reclamation
and DB backup, since migration can stop itself and free the drives
pretty quickly, in contrast to reclamation.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu
======I have not lost my mind -- it is backed up on tape somewhere.=====





Date:      Jan 04, 09:45
From:      Wheelock, Michael D <nobody at nowhere.com>

Hi,

When you start reclamation it begins a process and decides which tapes
it is going to reclaim based on the value given.  These tapes can be
seen in the activity log of the server.  It will continue to reclaim
until it has reclaimed all of the tapes in the list or you cancel the
reclamation process. =20

I have seen similar behavior in the migration process as well.  If you
set a storage pool to migrate (ie. By setting high=3D0 and low=3D0) then =
it
will empty it out. If you subsequently reset the values (and the storage
pool is now above the low value and below the high value) it will
continue to migrate all of the data out.

The migration_stop is there to set up the storage pools for the next
day's backup.  The reclamation stop doesn't seem to be terribly useful.

I set my storage pools to a decent value all of the time (something like
rec=3D90).  Then on weekends (usually Friday afternoon) I set it to
something deeper (currently rec=3D75 especially for the storage pools on
the remote server). =20

Michael Wheelock
Integris Health



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Tyree, David
Sent: Tuesday, January 04, 2005 8:18 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: strange reclaimation behavor

          I have something strange going on with my reclaimations. I
have
the system issue a command in the afternoons to start reclaiming tapes
by
dropping the levels to 60%. I then have it issue another command around
midnight to raise the levels back to 100%.

          When I get here in the morning I will usally find that
reclaiming is still going on. I have verified that the stop_reclaim
scripts
are actually running by looking at the actlog for the time period
involved.
I end up doing a cancel process to get it to stop.

          I could understand it still running if it was in the middle
of a
50-60 gig file but it's only moving small (several meg) files. It's had
about eight hours to stop on it's own. I would have thought it would
have
found an opertunity to stop after eight hours.

          The contents of all of the scripts involved are correct. I
have
one starting the tapepool and another to start the copypool and another
couple to raise the level back up. I alternate different pools each
night.

          Any ideas here?

I'm running 5.2.2 on a Win2k server with 4 LTO2 drives in a PV136T
library.





David Tyree
Enterprise Backup Administrator
South Georgia Medical Center
229.333.1155

Confidential Notice:  This e-mail message, including any attachments, is
for
the sole use of the intended recipient(s) and may contain confidential
and
privileged information.  Any unauthorized review, use,  disclosure or
distribution is prohibited.  If you are not the intended recipient,
please
contact the sender by reply e-mail and destroy all copies of the
original
message.
*************************************************************************=
*************************
This e-mail may contain identifiable health information that is subject t=
o protection=20
under state and federal law. This information is intended to be for the u=
se of the=20
individual named above. If you are not the intended recipient, be aware t=
hat any=20
disclosure, copying, distribution or use of the contents of this informat=
ion is prohibited=20
and may be punishable by law. If you have received this electronic transm=
ission in=20
error, please notify us immediately by electronic mail (reply).
*************************************************************************=
*************************






<Prev in Thread] Current Thread [Next in Thread>