I forgot to say this is all Fibrechannel based luns on this NetApp head. The
partner head handles CIFS.
The aggregate is less than a third full: 9tb available, 2.6tb used.
It is comprised of 22x600gb HDD built as 2x(9d+2p).
Aggregate 'aggrfcp'
Total space WAFL reserve Snap reserve Usable space BSR NVLOG
A-SIS Smtape
10321542144KB 1032154212KB 0KB 9289387932KB 0KB
0KB 0KB
<snip - vol info removed>
Aggregate Allocated Used Avail
Total space 2655868604KB 2616289680KB 6519196976KB
Snap reserve 0KB 0KB 0KB
WAFL reserve 1032154212KB 114641872KB 917512340KB
All volumes (63 of them) hold only luns are are THIN.
All volumes have 32 snaps.
All volumes are snapmirrored to a 2nd datacenter.
All volumes are snapvaulted to another local NetApp/nSeries system.
Both lpars use VIO based virtual Fibrechannel adapters.
I'm going to test sequential I/O to another vendors storage system to rule (or
point to) AIX/VIO as the problem.
From: Steiner, Jeffrey [mailto:Jeffrey.Steiner AT netapp DOT com]
Sent: Thursday, June 30, 2016 7:05 AM
To: Sebastian Goetze <spgoetze AT gmail DOT com>; Rhodes, Richard L. <rrhodes
AT firstenergycorp DOT com>; toasters AT teaparty DOT net
Subject: RE: OnTap read block size?
In theory, if read_realloc was off and the aggregate was close to 100% full you
could get this kind of IO pattern. I doubt that's happening, but I can't rule
it out.
I did a test with an all-Flash system where I pretty much puréed an aggregate.
In a healthy environment, everything should be nicely allocated and a
sequential read operation should result in huge read chains, like 64x4K blocks
read as a unit. I took an aggregate and filled it up to 100% and then ran about
72 hours of random overwrites. The end result was an array nothing was
contiguous. All the 8K blocks were distributed randomly across all the disks.
The read chains during sequential IO's were just 2. That would destroy
performance on a system with spinning disk, but surprisingly it had no impact
on my all-Flash system. Not a whit. That's why part of why there is no
read_realloc on AFF systems at this time. It doesn't do anything useful.
I had to deliberately misconfigure the system to make that happen, though. I
wouldn't expect a real-world environment to get into that situation.
From: Sebastian Goetze [mailto:spgoetze AT gmail DOT com]
Sent: Thursday, June 30, 2016 12:34 PM
To: Steiner, Jeffrey <Jeffrey.Steiner AT netapp DOT com<mailto:Jeffrey.Steiner
AT netapp DOT com>>; Rhodes, Richard L. <rrhodes AT firstenergycorp DOT
com<mailto:rrhodes AT firstenergycorp DOT com>>; toasters AT teaparty DOT
net<mailto:toasters AT teaparty DOT net>
Subject: Re: OnTap read block size?
Hi Rick,
in addition to what Jeff said:
What's going on with the GREADs? Is there a RAID-rebuild in progress?
That column should be 0 in normal circumstances and having this load in
parallel to your DB load completely messes up the performance picture IMHO...
Oh, and the 'read_realloc' option on a volume with a "random write/sequential
read" load often leads to nice performance improvements over time, dynamically
optimizing the DB layout on disk and keeping the volume/file 'defragmented'.
Sebastian
On 6/30/2016 7:33 AM, Steiner, Jeffrey wrote:
NFS behavior depends on the OS. For example, on Linux if the application tries
to do a 1MB read and you have an rsize set to 65536 what happens is the OS
issues 8 parallel 64KB requests. The ONTAP system will pick up what's happening
and start doing read requests.
You are indeed showing 16KB IO requests here. The read chain is about 4, which
means 4 times 4K blocks.
Are you certain that you don't just have a database with a 16KB block size and
you're doing 16KB random reads? If this was sequential IO, the read chain
should be a lot larger. I can't think of a realistic scenario where AIX would
break a sequential IO operation into a series of 16KB reads by itself.
Here's a theory - is someone misreading Oracle IO stats? If you see activity
that is primarily db_file_sequential_read, then everything is doing exactly
what it's supposed to do because db_file_sequential_read is random IO.
Depending on who you ask, it's either a random reads of an index sequence or a
sequence of random IO operations. Either way, it's random IO, so if you see a
database doing db_file_sequential_io and it has a 16KB block size, that would
explain this.
Sequential IO is performed as either direct_path_read or db_file_scattered
read. Yes, that means random is sequential and sequential is scattered.
Everyone confused yet? Specifically, db_file_scattered_read is a large-block
sequential IO operation that is loaded into scattered memory buffers.
I can't tell you how many times this has caused confusion for DBA's who are
certainly their IO pattern is random and it's actually sequential or they think
it's sequential and it's actually random.
Once you have the AWR we'll have a better idea what's happening. It's not just
the IO sizes I'd be looking for, it's the associated latencies and some of the
configuration files. If there's no explanation there, we'll have to look at the
AIX configuration.
From: toasters-bounces AT teaparty DOT net<mailto:toasters-bounces AT teaparty
DOT net> [mailto:toasters-bounces AT teaparty DOT net] On Behalf Of Rhodes,
Richard L.
Sent: Wednesday, June 29, 2016 9:07 PM
To: toasters AT teaparty DOT net<mailto:toasters AT teaparty DOT net>
Subject: RE: OnTap read block size?
I've asked a dba to look at your questions/comments.
I'm looking at a blog post
http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/
It discusses how to read a STATIT for sequential I/O size. I have a statit
listing . . .
disk ut% xfers ureads--chain-usecs writes--chain-usecs
cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggrfcp/plex0/rg0:
0b.01.0 54 107.88 0.00 2.11 2211 1.00 34.35 214 1.91
13.48 276 104.97 64.00 188 0.00 .... .
0b.01.1 55 107.96 0.00 2.11 1684 1.13 30.56 216 1.86
12.90 347 104.97 64.00 192 0.00 .... .
0b.01.10 56 111.70 4.14 4.76 1852 0.98 29.22 258 1.61
6.35 750 104.97 64.00 195 0.00 .... .
0b.01.2 56 110.67 4.07 4.72 1814 0.65 43.40 192 0.98
9.70 565 104.97 64.00 200 0.00 .... .
0b.01.3 56 110.75 4.16 4.72 1856 0.66 43.15 199 0.97
10.01 517 104.97 64.00 201 0.00 .... .
0b.01.4 57 110.85 4.23 4.71 1751 0.65 42.99 194 1.00
9.96 517 104.97 64.00 206 0.00 .... .
0b.01.5 57 110.62 4.06 4.97 1770 0.65 43.42 194 0.94
10.15 522 104.97 64.00 210 0.00 .... .
0b.01.6 57 110.63 4.05 4.82 1764 0.65 43.55 197 0.96
9.83 562 104.97 64.00 210 0.00 .... .
0b.01.7 57 110.73 4.12 4.61 1853 0.66 43.27 196 0.98
9.13 603 104.97 64.00 217 0.00 .... .
0b.01.8 57 110.74 4.16 4.72 1844 0.65 43.54 197 0.95
9.18 583 104.97 64.00 218 0.00 .... .
0b.01.9 57 110.75 4.16 4.76 1819 0.65 43.06 207 0.97
9.13 560 104.97 64.00 223 0.00 .... .
This looks like it's doing sequential reads in 4k I/O's.
I have multiple of these listings and they are all the same.
rick
From: Steiner, Jeffrey [mailto:Jeffrey.Steiner AT netapp DOT com]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. <rrhodes AT firstenergycorp DOT com<mailto:rrhodes AT
firstenergycorp DOT com>>; toasters AT teaparty DOT net<mailto:toasters AT
teaparty DOT net>
Subject: RE: OnTap read block size?
Is this NFS or FC?
By default, Oracle does sequential reads in 1M chunks. If they have a 16k block
size on the database, it should be reading in units of 64, not 128. Also, just
because Oracle tries to read 1MB chunks doesn't mean the database can do that.
They really shouldn't be using cio as a mount option either. Any remotely
current version of Oracle will mount the datafiles with concurrent IO so long
as they have filesystemio_options=setall, which is also what they should have.
If you can send me a sample report from 'awrrpt.sql' of no more than one hour
elapsed time from a period where they are unhappy with performance, I will take
a look and what's going on. I can say with 100% certainty that if they really
are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it
could be a 16K block size on a badly fragmented jfs2 filesystem, but I really
doubt it. I think something is being misinterpreted.
From: toasters-bounces AT teaparty DOT net<mailto:toasters-bounces AT teaparty
DOT net> [mailto:toasters-bounces AT teaparty DOT net] On Behalf Of Rhodes,
Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: toasters AT teaparty DOT net<mailto:toasters AT teaparty DOT net>
Subject: OnTap read block size?
OnTap 8.1.2p1
Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really
slow due to it only returning small 16k blocks. The DBA's are saying the
Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but
it's only seems to be reading/returning 16k at a time.
On a AIX filesystem mounted CIO, if I run
"dd if=/dev/zero of=z bs=1m count=9999"
I see writes of 500k.
In the same filesystem mounted CIO, if I read an existing db file
"dd if=<dbfile> of=/dev/null bs=1m"
I see reads of up to 30k.
Q) Is there a limit in OnTap on read size?
Thanks
Rick
________________________________
________________________________
The information contained in this message is intended only for the personal and
confidential use of the recipient(s) named above. If the reader of this message
is not the intended recipient or an agent responsible for delivering it to the
intended recipient, you are hereby notified that you have received this
document in error and that any review, dissemination, distribution, or copying
of this message is strictly prohibited. If you have received this communication
in error, please notify us immediately, and delete the original message.
________________________________
________________________________
The information contained in this message is intended only for the personal and
confidential use of the recipient(s) named above. If the reader of this message
is not the intended recipient or an agent responsible for delivering it to the
intended recipient, you are hereby notified that you have received this
document in error and that any review, dissemination, distribution, or copying
of this message is strictly prohibited. If you have received this communication
in error, please notify us immediately, and delete the original message.
_______________________________________________
Toasters mailing list
Toasters AT teaparty DOT net<mailto:Toasters AT teaparty DOT net>
http://www.teaparty.net/mailman/listinfo/toasters
-----------------------------------------
The information contained in this message is intended only for the personal and
confidential use of the recipient(s) named above. If the reader of this message
is not the intended recipient or an agent responsible for delivering it to the
intended recipient, you are hereby notified that you have received this
document in error and that any review, dissemination, distribution, or copying
of this message is strictly prohibited. If you have received this communication
in error, please notify us immediately, and delete the original message.
|