Sorry for not getting a reply in sooner, I'm way behind on email lists...
I see one flaw in your script -- to wit, you're writing
twice as much total data in your first loop as your second.
Your first part does three writes of 256MB in parallel
for a total of 768MB. So 768MB / 32sec == 24MB/sec.
Your second does one 3-stripe RAIT write of 256MB, which does
256MB of data plus 128M of checksum blocks for a total of 384Mb.
So 384MB / 16sec == 24MB/sec.
Now how the 7 second run happened, I couldn't tell you, except
that the physical blocksize is divided by the number of
data tapes (i.e total tapes minus one); so maybe you get a performance
kick at a blocksize of 32k, (and again at 2048k?) which you miss in your
other loop, 'cause your blocksizes are twice as big there.
If you want to compare apples to apples; add a:
blksz=$(($blksz / 2))
just before your parallel dd's in the first half of your script;
then you're really writing the same blocksizes, and the same amount
of total data. I'd be interested to see if you get the same
speedup at 65536 -> 32k physical writes in your first loop
that way.
-----
Marc Mengel <mengel AT fnal DOT gov>
test.sh
Description: Bourne shell script
|