Re: [ADSM-L] OnTap read block size?

I forgot to say this is all Fibrechannel based luns on this NetApp head.  The 
partner head handles CIFS.

The aggregate is less than a third full:  9tb available, 2.6tb used.
It is comprised of 22x600gb HDD built as 2x(9d+2p).

  Aggregate 'aggrfcp'

    Total space    WAFL reserve    Snap reserve    Usable space       BSR NVLOG 
          A-SIS          Smtape
  10321542144KB    1032154212KB             0KB    9289387932KB             0KB 
            0KB             0KB

  <snip - vol info removed>

  Aggregate                       Allocated            Used           Avail
  Total space                  2655868604KB    2616289680KB    6519196976KB
  Snap reserve                          0KB             0KB             0KB
  WAFL reserve                 1032154212KB     114641872KB     917512340KB


All volumes (63 of them) hold only luns are are THIN.
All volumes have 32 snaps.
All volumes are snapmirrored to a 2nd datacenter.
All volumes are snapvaulted to another local NetApp/nSeries system.

Both lpars use VIO based virtual Fibrechannel adapters.
I'm going to test sequential I/O to another vendors storage system to rule (or 
point to) AIX/VIO as the problem.






From: Steiner, Jeffrey [mailto:Jeffrey.Steiner AT netapp DOT com]
Sent: Thursday, June 30, 2016 7:05 AM
To: Sebastian Goetze <spgoetze AT gmail DOT com>; Rhodes, Richard L. <rrhodes 
AT firstenergycorp DOT com>; toasters AT teaparty DOT net
Subject: RE: OnTap read block size?

In theory, if read_realloc was off and the aggregate was close to 100% full you 
could get this kind of IO pattern. I doubt that's happening, but I can't rule 
it out.

I did a test with an all-Flash system where I pretty much puréed an aggregate. 
In a healthy environment, everything should be nicely allocated and a 
sequential read operation should result in huge read chains, like 64x4K blocks 
read as a unit. I took an aggregate and filled it up to 100% and then ran about 
72 hours of random overwrites. The end result was an array nothing was 
contiguous. All the 8K blocks were distributed randomly across all the disks. 
The read chains during sequential IO's were just 2. That would destroy 
performance on a system with spinning disk, but surprisingly it had no impact 
on my all-Flash system. Not a whit. That's why part of why there is no 
read_realloc on AFF systems at this time. It doesn't do anything useful.

I had to deliberately misconfigure the system to make that happen, though. I 
wouldn't expect a real-world environment to get into that situation.

From: Sebastian Goetze [mailto:spgoetze AT gmail DOT com]
Sent: Thursday, June 30, 2016 12:34 PM
To: Steiner, Jeffrey <Jeffrey.Steiner AT netapp DOT com<mailto:Jeffrey.Steiner 
AT netapp DOT com>>; Rhodes, Richard L. <rrhodes AT firstenergycorp DOT 
com<mailto:rrhodes AT firstenergycorp DOT com>>; toasters AT teaparty DOT 
net<mailto:toasters AT teaparty DOT net>
Subject: Re: OnTap read block size?


Hi Rick,



in addition to what Jeff said:

What's going on with the GREADs? Is there a RAID-rebuild in progress?

That column should be 0 in normal circumstances and having this load in 
parallel to your DB load completely messes up the performance picture IMHO...



Oh, and the 'read_realloc' option on a volume with a "random write/sequential 
read" load often leads to nice performance improvements over time, dynamically 
optimizing the DB layout on disk and keeping the volume/file 'defragmented'.





Sebastian

On 6/30/2016 7:33 AM, Steiner, Jeffrey wrote:
NFS behavior depends on the OS. For example, on Linux if the application tries 
to do a 1MB read and you have an rsize set to 65536 what happens is the OS 
issues 8 parallel 64KB requests. The ONTAP system will pick up what's happening 
and start doing read requests.

You are indeed showing 16KB IO requests here. The read chain is about 4, which 
means 4 times 4K blocks.

Are you certain that you don't just have a database with a 16KB block size and 
you're doing 16KB random reads? If this was sequential IO, the read chain 
should be a lot larger. I can't think of a realistic scenario where AIX would 
break a sequential IO operation into a series of 16KB reads by itself.

Here's a theory - is someone misreading Oracle IO stats? If you see activity 
that is primarily db_file_sequential_read, then everything is doing exactly 
what it's supposed to do because db_file_sequential_read is random IO. 
Depending on who you ask, it's either a random reads of an index sequence or a 
sequence of random IO operations. Either way, it's random IO, so if you see a 
database doing db_file_sequential_io and it has a 16KB block size, that would 
explain this.

Sequential IO is performed as either direct_path_read or db_file_scattered 
read. Yes, that means random is sequential and sequential is scattered. 
Everyone confused yet? Specifically, db_file_scattered_read is a large-block 
sequential IO operation that is loaded into scattered memory buffers.

I can't tell you how many times this has caused confusion for DBA's who are 
certainly their IO pattern is random and it's actually sequential or they think 
it's sequential and it's actually random.

Once you have the AWR we'll have a better idea what's happening. It's not just 
the IO sizes I'd be looking for, it's the associated latencies and some of the 
configuration files. If there's no explanation there, we'll have to look at the 
AIX configuration.

From: toasters-bounces AT teaparty DOT net<mailto:toasters-bounces AT teaparty 
DOT net> [mailto:toasters-bounces AT teaparty DOT net] On Behalf Of Rhodes, 
Richard L.
Sent: Wednesday, June 29, 2016 9:07 PM
To: toasters AT teaparty DOT net<mailto:toasters AT teaparty DOT net>
Subject: RE: OnTap read block size?

I've asked a dba to look at your questions/comments.

I'm looking at a blog post 
http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/

It discusses how to read a STATIT for sequential I/O size.  I have a statit 
listing . . .


disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs 
cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggrfcp/plex0/rg0:
0b.01.0           54 107.88    0.00   2.11  2211   1.00  34.35   214   1.91  
13.48   276 104.97  64.00   188   0.00   ....     .
0b.01.1           55 107.96    0.00   2.11  1684   1.13  30.56   216   1.86  
12.90   347 104.97  64.00   192   0.00   ....     .
0b.01.10          56 111.70    4.14   4.76  1852   0.98  29.22   258   1.61   
6.35   750 104.97  64.00   195   0.00   ....     .
0b.01.2           56 110.67    4.07   4.72  1814   0.65  43.40   192   0.98   
9.70   565 104.97  64.00   200   0.00   ....     .
0b.01.3           56 110.75    4.16   4.72  1856   0.66  43.15   199   0.97  
10.01   517 104.97  64.00   201   0.00   ....     .
0b.01.4           57 110.85    4.23   4.71  1751   0.65  42.99   194   1.00   
9.96   517 104.97  64.00   206   0.00   ....     .
0b.01.5           57 110.62    4.06   4.97  1770   0.65  43.42   194   0.94  
10.15   522 104.97  64.00   210   0.00   ....     .
0b.01.6           57 110.63    4.05   4.82  1764   0.65  43.55   197   0.96   
9.83   562 104.97  64.00   210   0.00   ....     .
0b.01.7           57 110.73    4.12   4.61  1853   0.66  43.27   196   0.98   
9.13   603 104.97  64.00   217   0.00   ....     .
0b.01.8           57 110.74    4.16   4.72  1844   0.65  43.54   197   0.95   
9.18   583 104.97  64.00   218   0.00   ....     .
0b.01.9           57 110.75    4.16   4.76  1819   0.65  43.06   207   0.97   
9.13   560 104.97  64.00   223   0.00   ....     .

This looks like it's doing sequential reads in 4k I/O's.
I have multiple of these listings and they are all the same.


rick







From: Steiner, Jeffrey [mailto:Jeffrey.Steiner AT netapp DOT com]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. <rrhodes AT firstenergycorp DOT com<mailto:rrhodes AT 
firstenergycorp DOT com>>; toasters AT teaparty DOT net<mailto:toasters AT 
teaparty DOT net>
Subject: RE: OnTap read block size?

Is this NFS or FC?

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block 
size on the database, it should be reading in units of 64, not 128. Also, just 
because Oracle tries to read 1MB chunks doesn't mean the database can do that.

They really shouldn't be using cio as a mount option either. Any remotely 
current version of Oracle will mount the datafiles with concurrent IO so long 
as they have filesystemio_options=setall, which is also what they should have.

If you can send me a sample report from 'awrrpt.sql' of no more than one hour 
elapsed time from a period where they are unhappy with performance, I will take 
a look and what's going on. I can say with 100% certainty that if they really 
are doing multiblock reads with 16K units the problem isn't ONTAP. I suppose it 
could be a 16K block size on a badly fragmented jfs2 filesystem, but I really 
doubt it. I think something is being misinterpreted.

From: toasters-bounces AT teaparty DOT net<mailto:toasters-bounces AT teaparty 
DOT net> [mailto:toasters-bounces AT teaparty DOT net] On Behalf Of Rhodes, 
Richard L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: toasters AT teaparty DOT net<mailto:toasters AT teaparty DOT net>
Subject: OnTap read block size?

OnTap 8.1.2p1

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really 
slow due to it only returning small 16k blocks.  The DBA's are saying the 
Oracle multi-block read ahead should be reading 128 x 16k blocks = 2m read, but 
it's only seems to be reading/returning 16k at a time.

On a AIX filesystem mounted CIO, if I run
    "dd if=/dev/zero of=z bs=1m count=9999"
I see writes of 500k.

In the same filesystem mounted CIO, if I read an existing db file
  "dd if=<dbfile> of=/dev/null bs=1m"
I see reads of up to 30k.


Q) Is there a limit in OnTap on read size?


Thanks

Rick

________________________________
________________________________

The information contained in this message is intended only for the personal and 
confidential use of the recipient(s) named above. If the reader of this message 
is not the intended recipient or an agent responsible for delivering it to the 
intended recipient, you are hereby notified that you have received this 
document in error and that any review, dissemination, distribution, or copying 
of this message is strictly prohibited. If you have received this communication 
in error, please notify us immediately, and delete the original message.

________________________________
________________________________

The information contained in this message is intended only for the personal and 
confidential use of the recipient(s) named above. If the reader of this message 
is not the intended recipient or an agent responsible for delivering it to the 
intended recipient, you are hereby notified that you have received this 
document in error and that any review, dissemination, distribution, or copying 
of this message is strictly prohibited. If you have received this communication 
in error, please notify us immediately, and delete the original message.





_______________________________________________

Toasters mailing list

Toasters AT teaparty DOT net<mailto:Toasters AT teaparty DOT net>

http://www.teaparty.net/mailman/listinfo/toasters



-----------------------------------------
The information contained in this message is intended only for the personal and 
confidential use of the recipient(s) named above. If the reader of this message 
is not the intended recipient or an agent responsible for delivering it to the 
intended recipient, you are hereby notified that you have received this 
document in error and that any review, dissemination, distribution, or copying 
of this message is strictly prohibited. If you have received this communication 
in error, please notify us immediately, and delete the original message.