Bacula-users

Re: [Bacula-users] Quantum Scalar i500 slow write speed

2010-08-09 10:56:43
Subject: Re: [Bacula-users] Quantum Scalar i500 slow write speed
From: Paul Mather <paul AT gromit.dlib.vt DOT edu>
To: Henry Yen <henry AT AegisInfoSys DOT com>
Date: Mon, 9 Aug 2010 10:53:53 -0400
On Aug 9, 2010, at 2:55 AM, Henry Yen wrote:

> On Fri, Aug 06, 2010 at 10:48:10AM +0200, Christian Gaul wrote:
>> Even when catting to /dev/dsp i use /dev/urandom.. Blocking on
>> /dev/random happens much too quickly.. and when do you really need that
>> much randomness.
> 
> I get about 40 bytes on a small server before blocking.

On Linux, /dev/random will block if there is insufficient entropy in the pool.  
Unlike /dev/random, /dev/urandom will not block on Linux, but will reuse 
entropy in the pool.  Thus, /dev/random produces higher quality, lower quantity 
random data than /dev/urandom.  For the purposes of compressibility tests, the 
pseudorandom data of /dev/urandom is perfectly fine.  The /dev/random device is 
better used, e.g., for generating cryptographic keys.

> 
>>> Reason 1: the example I gave yields a file size for "tempchunk" of 512MB,
>>> not 1MB, as given in your counter-example.  I agree that (at least 
>>> now-a-days)
>>> catting 1MB chunks into a 6MB chunk is likely (although not assured)
>>> to lead to greatly reduced size during later compression, but I disagree
>>> that catting 512MB chunks into a 3GB chunk is likely to be compressible
>>> by any general-purpose compressor.
>> 
>> Which is what i meant with "way bigger than the library size of the
>> algorithm".  Mostly my "Information" was pitfalls to look out for when
>> testing the speed of your equipment, if you went ahead and cat-ted 3000
>> x 1MB, i believe the hardware compression would make something highly
>> compressed out of it.
>> My guess is it would work for most chunks around half as large as the
>> buffer size of the drive (totally guessing).
> 
> I think that the tape drive manufacturers don't make large buffer/CPU
> capacity in their drives yet.  I finally did a test on an SDLT2 (160GB)
> drive; admittedly, it's fairly old as tape drives go, but tape technology
> appears to be rather a bit slower than disk technology, at least as far
> as raw capacity is concerned.  I created two files from /dev/urandom;
> one was 1GB, the other a mere 10K.  I then created two identically-sized
> files corresponding to each of these two chunks (4 of the first and approx.
> 400k of the second).  Writing them to the SDLT2 drive using 60k blocksize,
> with compression on, yielded uncanny results: the writable capacity before
> hitting EOT was within 0.01%, and the elapsed time was within 0.02%.

As I posted here recently, even modern LTO tape drives use only a 1 KB (1024 
byte) history buffer for its sliding window-based compression algorithm.  So, 
any repeated random chunk greater than 1 KB in size will be incompressible by 
LTO tape drives.

> I see there's a reason to almost completely ignore the so-called "compressed
> capacity" claims by tape drive manufacturers...

By definition, random data are not compressible.  It's my understanding that 
the "compressed capacity" of tapes is based explicitly on an expected 2:1 
compression ratio for source data (and this is usually cited somewhere in the 
small print).  That is a reasonable estimate for text.  Other data may compress 
better or worse.  Already-compressed or encrypted data will be incompressible 
to the tape drive.  In other words, "compressed capacity" is heavily dependent 
on your source data.

Cheers,

Paul.
------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users