ADSM-L

Re: Using FILE instead of DISK devclass to avoid disk under-utilization

2006-10-26 10:17:14
Subject: Re: Using FILE instead of DISK devclass to avoid disk under-utilization
From: Thomas Denier <Thomas.Denier AT JEFFERSONHOSPITAL DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 26 Oct 2006 10:15:26 -0400
-----Daniel Clark  wrote: -----

>On 10/25/06, Thomas Denier wrote:
>> The first problem is that things get really ugly if the storage
>> pools collectively get larger than the shared disk space.
>
>Ugly in what kind of way? Clients don't just block until one of the
>FILE class devices on disk is migrated to a tape storage pool, and
>thus reset to scratch?

No, the clients keep trying to send data, and transaction failures
occur until some disk space is made available. In addition, when
a write to a file volume fails for lack of disk space, TSM will
allocate another scratch volume, get another write failure, allocate
another scratch volume, and so on until the storage pool involved
reaches its maximum number of scratch volumes. The empty scratch
volumes created this way end up with read-only access, and are not
deleted automatically by migration. I ended up writing a script to
find and delete empty, read-only, file volumes.

>> The second problem is that file volumes are treated like tapes in
>> many respects. In particular, only one session or process can have
>> a particular file volume open at a given time. We occasionally
>> have a client session that slows to a crawl, keeping a file volume
>> open for hours. This can cause backup stgpool and migration
>> processes to hang when they want to read the volume involved.
>
>I was thinking of creating a decently large number of FILE devclass
>devices on disk - say 100GB each over 4TB = 40 devices. I would
>think the chances of all of them being locked by hanging clients
>would be pretty small.

Our file volumes are only one gigabyte, and we sometimes have several
hundred of them. We have never come close to having all of the volumes
locked by hanging clients. As far as I can tell, migration and storage
pool backup processes select input volumes without regard to open
status. Once a process selects a volume that is being used by client
session, the process is hung until the session relinquishes the
volume. It wouldn't matter if there were a thousand other volumes notin use
at the time.