This is going to sound like a question with an obvious answer, but the subject
matter is HSM...
Is there any way to reliably determine how much space is actually, truly,
really available for use in an HSM-managed file system?
I lost half a day's work today cleaning up the aftermath of the loss of an HSM
client system (and all its timesharing users), caused by an HSM file system
claiming to be full when there is no external indication to indicate that it
is. The situation was an archival file system which overnite receives aged
files from around a large RS/6000 AIX system. The 'mv' of one such file hung,
and thereafter the system clogged such that its load average went from its
usual value of 2 to over 250; and it had to be rebooted, at the expense of all
the time sharing users on that system. (Something that has happened before,
with another HSM file system.)
I took a look at the file system, and all indications (as shown below) are
fine: plenty of free space (some 16MB), lots of unused inodes; and yet
attempting to copy in a file of 100KB would either hang or result in "No space
left in file system", with the partially copied file containing exactly 32768
bytes. The ADSM environment surrounding this file system is fine: other
archival file systems sharing the same staging disk are operating as they
should, and all HSM dsm* processes are active. A dsmreconcile does not help.
------------------
df -v /archive/files
df -v /archive/files
Filesystem Total KB used free %used iused ifree %iused Mounted on
/dev/lv-hsmfs 81920 65912 16008 80% 15691 4789 76%
/archive/files
/archive/files 81920 65912 16008 80% 15691 4789 76%
/archive/files
dsmdf /archive/files
ADSTAR Distributed Storage Manager
space management Interface - Version 2, Release 1, Level 0.4
(C) Copyright IBM Corporation, 1990, 1996, All Rights Reserved.
FSM FS Mgrtd Pmgrtd Mgrtd Pmgrtd Unused Free
Filesystem State KB KB Files Files Inodes KB
/archive/files a 1906292 0 12638 0 4789 16012
dsmmigfs query /archive/files
ADSTAR Distributed Storage Manager
space management Interface - Version 2, Release 1, Level 0.4
(C) Copyright IBM Corporation, 1990, 1996, All Rights Reserved.
File System High Low Premig Age Size Quota Stub File
Name Thrshld Thrshld Percent Factor Factor Size
/archive/files 90 80 - 1 1 999999999 4095
------------------
So though both AIX and HSM say when asked that there is plenty of space
So though both AIX and HSM say when asked that there is plenty of space
available to accept a file of this size, they say there is no space when you
go to use it. This makes no sense, and certainly thwarts any attempt to
administer such file systems, as in trying to determine when it is necessary
to extend such a file system. This is one of the reasons that customers are
so frustrated with HSM. (Another is waiting over a year for IBM to provide an
HSM version for our AIX 4.2 systems.)
Neither the HSM manual nor the "Using ADSM HSM" redbook contain any
information about administration issues such as this. If anyone can provide
any insights, I'd be happy to hear them - before another of our cluster
systems falls victim to this.
thanks, Richard Sims Boston University OIT
|