ADSM-L

Re: DSMFMT takes forever ( 15 hours). 100 gb

2003-10-07 11:57:17
Subject: Re: DSMFMT takes forever ( 15 hours). 100 gb
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 7 Oct 2003 10:55:31 -0500
(reopening an old thread from a couple of weeks ago that got nowhere,
because I just shot myself in the foot again with this one...)

dsmfmt can take forever, and can almost totally lock up your whole
system, if the extent you are dsmfmt-ing shares any physical volume with
any OS paging space. I use only JBOD disks, but I assume that this also
means that if your RAID box has any paging space in it, you cannot
dsmfmt in any part of that RAID box, because RAID boxes can put anything
together wherever they please and Murphy's law prevails here.

Paging and dsmfmt must be completely and totally physically separate, at
the volume level. (Sharing the same SCSI or SSA channel appears to only
have a minor nuisance effect, compared to sharing the same volume.) If
you must reboot in order to remove paging temporarily from a certain
disk, then do that - the reboot will be faster and less intrusive.

It appears that dsmfmt is a memory pig - probably constructing huge
blocks (of what? The word "Fish" over and over?) in memory to write out
at once, so it can run faster. It places a very heavy load on the paging
subsystem. The OS could handle this, except that those huge, long I/O
operations of the formatting itself interfere with its access to its
paging space. So it's a classic "deadly embrace" and nothing goes
anywhere. I had to reboot because I could not even get the system's
attention long enough to enter a Unix ps command to find out the number
of the dsmfmt process so I could kill it. I've got the bullet holes in
my foot to prove it.

IBM: I'm not asking for a solution here, as "solving" this would likely
make dsmfmt run slower. Documentation of this restriction would be nice,
though. It helps to be on AIX5L where you can drain a paging space
without a reboot (BEFORE you start dsmfmt, of course), and that should
be mentioned in any description of this restriction.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu
== You are making progress when every mistake you make is a new one. ===

<Prev in Thread] Current Thread [Next in Thread>