ADSM-L

Re: DSMFMT takes forever ( 15 hours). 100 gb

2003-10-07 20:12:16
Subject: Re: DSMFMT takes forever ( 15 hours). 100 gb
From: Paul Ripke <stix AT STIX.HOMEUNIX DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 8 Oct 2003 10:01:21 +1000
On Wednesday, Oct 8, 2003, at 01:55 Australia/Sydney, Roger Deschner
wrote:

<snip>

It appears that dsmfmt is a memory pig - probably constructing huge
blocks (of what? The word "Fish" over and over?) in memory to write out
at once, so it can run faster. It places a very heavy load on the
paging
subsystem. The OS could handle this, except that those huge, long I/O
operations of the formatting itself interfere with its access to its
paging space. So it's a classic "deadly embrace" and nothing goes
anywhere. I had to reboot because I could not even get the system's
attention long enough to enter a Unix ps command to find out the number
of the dsmfmt process so I could kill it. I've got the bullet holes in
my foot to prove it.

IBM: I'm not asking for a solution here, as "solving" this would likely
make dsmfmt run slower. Documentation of this restriction would be
nice,
though. It helps to be on AIX5L where you can drain a paging space
without a reboot (BEFORE you start dsmfmt, of course), and that should
be mentioned in any description of this restriction.

What you're seeing is due to normal file I/O under AIX going through
the virtual memory subsystem. There appears to be a bit of a problem
with certain vmtune settings - where the scenario you describe happens
when creating large files. It's not dsmfmt - I've seen the same problem
with 'dd' and Oracle. You'll also see the 'lrud' kernel thread spinning.

We've managed to fix our immediate problem by remounting the Oracle
database filesystems with the 'dio' option. Since this bypasses the
AIX VM file cache, system responsiveness has improved dramatically, and
paging space utilisation has dropped, and lrud rarely appears at the
top of the busy process list. Oracle operation also seems slightly
better.

There's absolutely no point to the OS caching file data for TSM db, log
or stg pool volumes - or Oracle DB files or DB2 or... With TSM, my
recommendation is to use raw devices - this is what we do with our
servers. It can make a huge difference.

The 'dio' option appeared (I believe) in a maintenance level of 5.1.
However, I think it's only mentioned in doco for AIX 5.2.

Cheers,
--
Paul Ripke
Unix/OpenVMS/TSM/DBA
I love deadlines. I like the whooshing sound they make as they fly by.
-- Douglas Adams

<Prev in Thread] Current Thread [Next in Thread>