ADSM-L

Re: Help! AIX HSM problem

2000-03-20 17:50:17
Subject: Re: Help! AIX HSM problem
From: John Valdes <j-valdes AT UCHICAGO DOT EDU>
Date: Mon, 20 Mar 2000 16:50:17 -0600
On Fri, Mar 17, 2000 at 10:42:12AM -0500, Richard Sims wrote:
> >Please check /etc/security/limits or use smitty users. You probably want =
> >to set fsize to -1 and fsize_hard to -1. This is in AIX not ADSM or HSM.
>
> The /etc/security/limits fsize value would affect the creation or extension
> of all files operated upon by the designated user; but John indicated that
> "I can create files bigger than 4MB on other filesystems w/o problems",
> so it doesn't look like a limits problem.

That's right; it's not a limit problem.

In another email, Richard Sims continued:
>        The most baffling thing about HSM, for its users and
> administrator, is that the space remaining on the file system
> as per 'df' and 'dsmdf' has NOTHING to do with how much more
> data can be copied into the file system.  We often see our modest
> HSM file systems some 85% full, saying they have like 80MB free,
> and still we get File System Full when trying to copy in a file
> substantially smaller than that.  This seems to be a manifestation
> of the FSM device driver having true control over what happens,

Yup.  The FSM driver basically wedges itself between the
kernel/processes and the lower level disk drivers.  When the
filesystem is full, rather than returning a full condition to the
kernel/process, it blocks the process and tries to clear off
space.  Once enough space is available, the process can then
continue.  Likewise, when the kernel/process tries to read a migrated
file, the driver blocks the process while it restores the data from
the ADSM server.

> and it thinking perhaps that it has to save space for the .Spaceman
> db.

Actually, I would LOVE it if did this (reserved space for the
.SpaceMan files), but it doesn't seem to, as in the past, we've
managed to fill the system to the point where dsmreconcile doesn't
have enough space to write out its candidates file or create temp
files when migrating files, which would cause problems for HSM.

That said, I have some more observations on this problem.  I've now
determined that this problem shows up only when the filesystem gets to
around 72% full.  Once it's at this point, if I try to "cp" a file to
this filesystem, the cp hangs once it's copied 4MB.  However, if I
leave the cp process running and delete another file off the
filesystem, the cp succeeds.  It's just as if the filesystem were
really 100% full, and the FSM driver is blocking the cp until space is
freed.  If the filesystem is below 72% full, then I have no problems
whatsoever.

I'm still not 100% convinced it's HSM/the FSM driver and not the
underlying filesystem/raid/disk drivers at fault, but the scale has
now tipped towards HSM.

I'm still open to suggestions on how to fix this... :)

John

-------------------------------------------------------------------------
John Valdes                        Department of Astronomy & Astrophysics
John Valdes                        Department of Astronomy & Astrophysics
j-valdes AT uchicago DOT edu                               University of Chicago
<Prev in Thread] Current Thread [Next in Thread>