Re: [Bacula-users] Quantum Scalar i500 slow write speed

On Thu, Aug 05, 2010 at 17:17:39PM +0200, Christian Gaul wrote:
> Am 05.08.2010 16:57, schrieb Henry Yen:

First, I welcome this discussion, however arcane (as long as the
List permits it, of course) -- I am happy to discover if I'm wrong
in my thinking.  That said, I'm not (yet) convinced.

This part in particular I stand by, as a response to the notion of
using *either* /dev/random *or* /dev/urandom:

> > Again, on Linux, you generally can't use /dev/random at all -- it will
> > block after reading just a few dozen bytes.  /dev/urandom won't block,
> > but your suggestion of creating a large file from it is very sensible.

For this part, however, I don't agree with your assertion, for two reasons:

> > /dev/urandom seems to measure about 3MB/sec or thereabouts, so creating
> > a large "uncompressible" file could be done sort of like:
> >
> >    dd if=/dev/urandom of=tempchunk count=1048576
> >    cat tempchunk tempchunk tempchunk tempchunk tempchunk tempchunk > bigfile
> >   
> cat-ting random data a couple of times to make one big random file wont
> really work, unless the size of the chunks is way bigger than the
> "library" size of the compression algorithm.

Reason 1: the example I gave yields a file size for "tempchunk" of 512MB,
not 1MB, as given in your counter-example.  I agree that (at least now-a-days)
catting 1MB chunks into a 6MB chunk is likely (although not assured)
to lead to greatly reduced size during later compression, but I disagree
that catting 512MB chunks into a 3GB chunk is likely to be compressible
by any general-purpose compressor.

On a 32-bit 3GB machine, I created "chunk" from the above "dd" command,
and then ran gzip/bzip2/lzma, all with the "-9" flag, resulting in:

    536870912 chunk
    536957165 chunk.gz
    544157933 chunk.lzma
    539244413 chunk.bz2
   3221225472 bigchunk
   3221746982 bigchunk.gz
   3235476896 bigchunk.bz2
   3265163180 bigchunk.lzma

> your example will probably lead to a 5:1 compression ratio :-)
> so this will more than likely not be a really good test.

> Also, afaik tar
> has an "optimization" when outputting to /dev/null, better output to
> /dev/zero instead if using tar to check possible speeds.

(Yes, although there is considerable disagreement over this (mis)feature;
 my take is that the consensus is "yes, probably bad, definitely
 under-documented (the behavior does match the "info" document), but
 too late to change now".)

> P.S.: Checked to make sure.. depends on the compression algorithm of course:
> 
> $ dd if=/dev/urandom of=chunk bs=1M count=1
> 1+0 Datensätze ein
> 1+0 Datensätze aus
> 1048576 Bytes (1,0 MB) kopiert, 0,154357 s, 6,8 MB/s
> 
> $ gzip chunk
> $ ls -al chunk*
> -rw-r--r-- 1 christian christian 1048576  5. Aug 17:06 chunk
> -rw-r--r-- 1 christian christian 1048764  5. Aug 17:06 chunk.gz
> 
> $ cat chunk chunk chunk chunk chunk chunk > bigchunk
> $ gzip bigchunk
> $ ls -al bigchunk*
> -rw-r--r-- 1 christian christian 6291456  5. Aug 17:07 bigchunk
> -rw-r--r-- 1 christian christian 6292442  5. Aug 17:07 bigchunk.gz
> 
> $ lzma bigchunk
> $ ls -al bigchunk*
> -rw-r--r-- 1 christian christian 6291456  5. Aug 17:07 bigchunk
> -rw-r--r-- 1 christian christian 1063718  5. Aug 17:07 bigchunk.lzma

Reason 2: Although the compression of data on a general-purpose machine
will certainly get faster and more capable of detecting duplication inside
larger and larger chunks, I daresay that this ability with regards to hardware
compression is unlikely to increase dramatically.  For instance, the lzma
of that 3GB file as shown above ran for about 30 minutes.  By contrast,
with a 27MB/sec write physical write speed, that same 3GB would only take
about 2 minutes to actually write.  Even at a 6:1 compression ratio
(necessarily limited to that exact number because of this example), it would
still take more than twice a long just to analyze the data to yield
that compression than to write the uncompressed stream.  Put another way,
I don't see tape drives (currently in the several-hundred-gigabyte range)
increasing their compression buffer sizes or CPU capabilities to analyze
more than a couple dozens of megabytes, at most, anytime in the near future.
That is why I think that generating a "test" file for write-throughput
testing that's a few GB's or so in length, made up from chunks that are 
a few hundreds of MB's (larger chunks take more and more time to create),
is quite sufficient.

-- 
Henry Yen                                       Aegis Information Systems, Inc.
Senior Systems Programmer                       Hicksville, New York

------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users