ADSM-L

Re: Fundamental Migration Design Flaw

2003-10-03 16:17:30
Subject: Re: Fundamental Migration Design Flaw
From: David Longo <David.Longo AT HEALTH-FIRST DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 3 Oct 2003 16:08:32 -0400
That's the way it works.  The solution you mentioned, get bigger disks,
is the best solution.  Tweaking on migration thresholds should work till you
get more disk space.  Some of us adjust these levels throughout the day
to help with this.

Problem is you can't send a file to tape that is stilll in process of being
written to disk.  Also the size of these files can be "incorrect" until they
are completed.  This is certainly true with TDP objects, they tend to
"overallocate" until file transfer is complete.

I suspect this is a problem also due to fact that the file "in process"
is using a part of the Recovery Log until it is finished.  Don't know
details of internals, but this is probably one of the problems.

Also could have a tape drive tied up for a considerable time waiting
for this file to complete.

Again, bigger disk pools is the simplest and best solution.



David B. Longo
System Administrator
Health First, Inc.
3300 Fiske Blvd.
Rockledge, FL 32955-4305
PH      321.434.5536
Pager  321.634.8230
Fax:    321.434.5509
david.longo AT health-first DOT org


>>> rogerd AT UIC DOT EDU 10/03/03 02:59PM >>>
I am having problems on my ITSM V5.1 server with a disk storage pool
completely filling up. When that happens, the server attempts to mount a
tape for each client backup session, there are of course not enough tape
drives, so everything comes crashing to a halt and a large number of
client nodes don't get backed up.

What was amazing when this first happened, was that there was no
migration process running, despite HIGHMIG=75. The server had made no
attempt to protect itself. I started tracking the answers to Q STGPOOL,
and I reread the doc three more times just to be sure, and I think I
have found the problem - open files. That is, files which are in the
process of being transmitted across the net from client nodes to the
server. File sizes are balooning these days. 1gb individual files are
commonplace.

This disk storage pool has cacheing turned OFF.

 Storage      Device       Estimated     Pct     Pct   High   Low   Next Stora-
 Pool Name    Class Name    Capacity    Util    Migr    Mig   Mig   ge Pool
                                (MB)                    Pct   Pct
 -----------  ----------  ----------   -----   -----   ----   ---   -----------
 DESKTOPDIS-  DISK          42,000.0   100.0    57.4     75    25   DESKTOPTAP-
  KPOOL                                                              EPOOL

At that time, migration was incredibly not running, and plenty of tape
drives were free. It could have saved the day. The server looks at the
Pct Migr number to tell when to start and stop migration. THIS IS
WORKING EXACTLY HOW IT IS DESIGNED AND DOCUMENTED. And it is also very
wrong.

To verify what I was seeing, I restarted the server, and Pct Util
dropped to 60%. Yup, it's open files.

1. I know I need a lot more disks! Hardware arrives at its own pace,
dictated by budgets, Purchasing Departments, how long it takes me to
bolt it into the rack, how long dsmfmt takes (too long), etc. More disks
should be online by sometime next week, I hope.

2. I have already adjusted the settings to limit the number of sessions
and spread out the scheduled backups for the entire night, from 5PM to
8AM. I cannot spread it any further.

3. To deal with this, I am lowering the migration threshold until it no
longer fills up. During tonight's backup window, it will be set at 15%,
even though that is a bit extreme. Basically, it will migrate any closed
file almost as soon as it is closed. That's no way to run a railroad. Is
there any other workaround, or perhaps could this be fixed? Migration
algorithms should work to prevent fillups like I am experiencing, but
they don't, so it is broken.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu 
============ "In theory, theory and practice are the same, =============
========= but in practice, theory and practice are different." =========

##############################################################
This message is for the named person's use only.  It may 
contain confidential, proprietary, or legally privileged 
information.  No confidentiality or privilege is waived or 
lost by any mistransmission.  If you receive this message 
in error, please immediately delete it and all copies of it 
from your system, destroy any hard copies of it, and notify 
the sender.  You must not, directly or indirectly, use, 
disclose, distribute, print, or copy any part of this message
if you are not the intended recipient.  Health First reserves
the right to monitor all e-mail communications through its
networks.  Any views or opinions expressed in this message
are solely those of the individual sender, except (1) where
the message states such views or opinions are on behalf of 
a particular entity;  and (2) the sender is authorized by 
the entity to give such views or opinions.
##############################################################

<Prev in Thread] Current Thread [Next in Thread>