ADSM-L

Re: strange reclaimation behavor

2005-01-04 15:38:20
Subject: Re: strange reclaimation behavor
From: Steve Roder <spr AT REXX.ACSU.BUFFALO DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 4 Jan 2005 15:38:02 -0500
>
> I will attempt to clarify these issues here in regards to reclamation
> and migration.  My explanation is based on several years of experience,
> course material that I have taught, and conversation with other ITSM
> experts.  Another words I have not seen the actual code and without that
> no one can say for sure what it is doing.
>
> Migration:
>
>     * Migration process(es) start with the himig threshold is passed.
>       The number of processes started is controlled by maxproc value.

The setting is migpr for migration.  maxproc is an option on the backup
stg command for how many processes to run.

>     * ITSM will evaluate which node has the most data in the storage
>       pool at that time.
>     * It then begins to move all of that nodes data on a per file space
>       basis, i.e. each file space will get its own process.

I don't think this is necessarily correct.  Each filespace does not get
it's own (server) process, unless that is how it works for colloction by
filespace.  With Collocate set to yes, this certainly cannot work that
way, as it would be contrary to the collocation setting on the storage
pool.  Each node would be dedicated to a server process.

>     * If there are more processes available then there are file space
>       for the node with the most data then it will start to move the
>       node with the second most data, again a file space at a time.
>     * It will move all the nodes data before it re-evaluates its
>       threshold values.  Therefore, once all the nodes data is moved it
>       looks at the threshold and if it needs to continue migrating then
>       it selects the node with the most data and start to move it.
>     * This continues until the low threshold is achieved.
>     * Therefore, it is possible for you to adjust the values and have
>       migration continue for sometime, even to the point of emptying the
>       pool beyond the low threshold value!  Or it may stop right away
>       depending one the situation.
>
> Reclamation of Primary Pools (note copy pools are discussed below)
>
>     * When a reclamation value is adjusted down to trigger reclamation a
>       list of tapes to be reclaimed is generated (visible in activity
>       log) and these are the tapes to be reclaimed.
>     * ITSM will start processing the tapes on the list.  Each tape in
>       this reclamation will be a separate process.
>     * After completing a tape ITSM will re-evaluate the reclamation
>       threshold and decide whether to continue.
>     * Therefore, for primary storage pools you may in fact see the
>       reclamation process end without completing all the tapes on the
>       list, but it will not end (unless you cancel it) until the current
>       tape has finished reclaiming.
>
> Reclamation of Copy Pools
>
>     * The process is similar for copy pools but it is different.  The
>       big difference is that it process all the tapes at one time!
>     * When reclaiming copy pool tapes that are off site it uses a
>       primary copy of the data that is local.  In order to improve the
>       reclamation process ITSM will mount the primary tape and move all
>       the NECESSARY files off that primary tape once it is mounted even
>       if it means it is in fact reclaiming files from 2 or more
>       different tapes.  By doing this, there is no longer the concept of
>       reclaiming a single copy pool tape it is in fact reclaiming all
>       eligible tapes at once!
>     * Therefore, when you adjust the reclamation threshold upwards it
>       will not stop the reclamation of copy pool tapes.  They only way
>       that process ends is for it to finish all the tapes or you cancel it.
>
> Again I will reiterate, I have not seen the code for any of these
> processes, but I have through observation and through printed material
> deduced this as being the methodology that ITSM is using.  I am open to
> others observations if they are different then mine.  I wish that IBM
> would just clearly state what the algorithm truly is.
>
> BTW, there are many techniques that have been talked about on this list
> as ways of scripting around some of these difficulties.
>
> Good Luck and I hope this helped clear things up.
>
> --
> Regards,
> Mark D. Rodriguez
> President MDR Consulting, Inc.
>
> ===============================================================================
> MDR Consulting
> The very best in Technical Training and Consulting.
> IBM Advanced Business Partner
> SAIR Linux and GNU Authorized Center for Education
> IBM Certified Advanced Technical Expert, CATE
> AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux
> Red Hat Certified Engineer, RHCE
> ===============================================================================
>
>
>
> Roger Deschner wrote:
>
> >That's not right. I just proved it, inadvertently.
> >
> >I wanted to deliberately empty out a Disk Storage Pool. I had set
> >lowmig=0 highmig=0 migprocess=4 and it had dutifully mounted 4 tapes and
> >started emptying things out in a hurry.
> >
> >Then 10:00 came, and a daily schedule I had set months ago to shut down
> >migration at that time, happened, and set it to highmig=75 lowmig=25.
> >(Classic self-inflicted foot-shooting.) The effect was to cancel all of
> >the migration processes, even though the storage pool was still 10%
> >full. These process cancelations took effect when each process reached
> >the end of the current file, so they didn't all happen at once although
> >it was fairly quick. Fortunately all the tapes were still mounted so I
> >got them started again quickly. But what I proved (again) was that
> >Migration will stop as soon as you change the thresholds, not when the
> >original lowmig has been reached.
> >
> >If you are seeing something different, I wonder if your client nodes
> >are backing up very large individual files?
> >
> >Reclamation is different from migration. While it does not work on a
> >pre-built list, you are correct that it continues until the end of the
> >volume it is currently reclaiming. Once a reclamation process finishes
> >work on a volume, then the server examines the thresholds all over again
> >to see if it wants to reclaim another tape. So if the reclamation
> >threshold for that storage pool has been changed, it will take effect at
> >that time - at the end of the volume currently being processed, not
> >after all volumes in some list have been processed. As a result, if you
> >are doing this on some schedule, and you need the tape drives at a
> >particular time for something else such as database backup, you can
> >either change the reclamation theshold several hours in advance, or
> >develop an OS script to query the process, parse the result, and cancel
> >it by process number. Even then, the reclamation process cancelation
> >will not take effect until the end of the file currently being
> >processed.
> >
> >The strategy I use to make sure I get the tape drives for database
> >backup at the scheduled time, is to run migration between reclamation
> >and DB backup, since migration can stop itself and free the drives
> >pretty quickly, in contrast to reclamation.
> >
> >Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT 
> >edu
> >======I have not lost my mind -- it is backed up on tape somewhere.=====
> >
> >
> >
> >
> >
> >>Date:      Jan 04, 09:45
> >>From:      Wheelock, Michael D <nobody at nowhere.com>
> >>
> >>Hi,
> >>
> >>When you start reclamation it begins a process and decides which tapes
> >>it is going to reclaim based on the value given.  These tapes can be
> >>seen in the activity log of the server.  It will continue to reclaim
> >>until it has reclaimed all of the tapes in the list or you cancel the
> >>reclamation process. =20
> >>
> >>I have seen similar behavior in the migration process as well.  If you
> >>set a storage pool to migrate (ie. By setting high=3D0 and low=3D0) then =
> >>it
> >>will empty it out. If you subsequently reset the values (and the storage
> >>pool is now above the low value and below the high value) it will
> >>continue to migrate all of the data out.
> >>
> >>The migration_stop is there to set up the storage pools for the next
> >>day's backup.  The reclamation stop doesn't seem to be terribly useful.
> >>
> >>I set my storage pools to a decent value all of the time (something like
> >>rec=3D90).  Then on weekends (usually Friday afternoon) I set it to
> >>something deeper (currently rec=3D75 especially for the storage pools on
> >>the remote server). =20
> >>
> >>Michael Wheelock
> >>Integris Health
> >>
> >>
> >>
> >>-----Original Message-----
> >>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On 
> >>Behalf Of
> >>Tyree, David
> >>Sent: Tuesday, January 04, 2005 8:18 AM
> >>To: ADSM-L AT VM.MARIST DOT EDU
> >>Subject: strange reclaimation behavor
> >>
> >>           I have something strange going on with my reclaimations. I
> >>have
> >>the system issue a command in the afternoons to start reclaiming tapes
> >>by
> >>dropping the levels to 60%. I then have it issue another command around
> >>midnight to raise the levels back to 100%.
> >>
> >>           When I get here in the morning I will usally find that
> >>reclaiming is still going on. I have verified that the stop_reclaim
> >>scripts
> >>are actually running by looking at the actlog for the time period
> >>involved.
> >>I end up doing a cancel process to get it to stop.
> >>
> >>           I could understand it still running if it was in the middle
> >>of a
> >>50-60 gig file but it's only moving small (several meg) files. It's had
> >>about eight hours to stop on it's own. I would have thought it would
> >>have
> >>found an opertunity to stop after eight hours.
> >>
> >>           The contents of all of the scripts involved are correct. I
> >>have
> >>one starting the tapepool and another to start the copypool and another
> >>couple to raise the level back up. I alternate different pools each
> >>night.
> >>
> >>           Any ideas here?
> >>
> >>I'm running 5.2.2 on a Win2k server with 4 LTO2 drives in a PV136T
> >>library.
> >>
> >>
> >>
> >>
> >>
> >>David Tyree
> >>Enterprise Backup Administrator
> >>South Georgia Medical Center
> >>229.333.1155
> >>
> >>Confidential Notice:  This e-mail message, including any attachments, is
> >>for
> >>the sole use of the intended recipient(s) and may contain confidential
> >>and
> >>privileged information.  Any unauthorized review, use,  disclosure or
> >>distribution is prohibited.  If you are not the intended recipient,
> >>please
> >>contact the sender by reply e-mail and destroy all copies of the
> >>original
> >>message.
> >>*************************************************************************=
> >>*************************
> >>This e-mail may contain identifiable health information that is subject t=
> >>o protection=20
> >>under state and federal law. This information is intended to be for the u=
> >>se of the=20
> >>individual named above. If you are not the intended recipient, be aware t=
> >>hat any=20
> >>disclosure, copying, distribution or use of the contents of this informat=
> >>ion is prohibited=20
> >>and may be punishable by law. If you have received this electronic transm=
> >>ission in=20
> >>error, please notify us immediately by electronic mail (reply).
> >>*************************************************************************=
> >>*************************
> >>
> >>
> >
> >
> >
>
>

Steve Roder
University at Buffalo
(spr AT buffalo DOT edu | (716)645-3564)

<Prev in Thread] Current Thread [Next in Thread>