On Aug 9, 2010, at 2:55 AM, Henry Yen wrote:
> On Fri, Aug 06, 2010 at 10:48:10AM +0200, Christian Gaul wrote:
>> Even when catting to /dev/dsp i use /dev/urandom.. Blocking on
>> /dev/random happens much too quickly.. and when do you really need that
>> much randomness.
>
> I get about 40 bytes on a small server before blocking.
On Linux, /dev/random will block if there is insufficient entropy in the pool.
Unlike /dev/random, /dev/urandom will not block on Linux, but will reuse
entropy in the pool. Thus, /dev/random produces higher quality, lower quantity
random data than /dev/urandom. For the purposes of compressibility tests, the
pseudorandom data of /dev/urandom is perfectly fine. The /dev/random device is
better used, e.g., for generating cryptographic keys.
>
>>> Reason 1: the example I gave yields a file size for "tempchunk" of 512MB,
>>> not 1MB, as given in your counter-example. I agree that (at least
>>> now-a-days)
>>> catting 1MB chunks into a 6MB chunk is likely (although not assured)
>>> to lead to greatly reduced size during later compression, but I disagree
>>> that catting 512MB chunks into a 3GB chunk is likely to be compressible
>>> by any general-purpose compressor.
>>
>> Which is what i meant with "way bigger than the library size of the
>> algorithm". Mostly my "Information" was pitfalls to look out for when
>> testing the speed of your equipment, if you went ahead and cat-ted 3000
>> x 1MB, i believe the hardware compression would make something highly
>> compressed out of it.
>> My guess is it would work for most chunks around half as large as the
>> buffer size of the drive (totally guessing).
>
> I think that the tape drive manufacturers don't make large buffer/CPU
> capacity in their drives yet. I finally did a test on an SDLT2 (160GB)
> drive; admittedly, it's fairly old as tape drives go, but tape technology
> appears to be rather a bit slower than disk technology, at least as far
> as raw capacity is concerned. I created two files from /dev/urandom;
> one was 1GB, the other a mere 10K. I then created two identically-sized
> files corresponding to each of these two chunks (4 of the first and approx.
> 400k of the second). Writing them to the SDLT2 drive using 60k blocksize,
> with compression on, yielded uncanny results: the writable capacity before
> hitting EOT was within 0.01%, and the elapsed time was within 0.02%.
As I posted here recently, even modern LTO tape drives use only a 1 KB (1024
byte) history buffer for its sliding window-based compression algorithm. So,
any repeated random chunk greater than 1 KB in size will be incompressible by
LTO tape drives.
> I see there's a reason to almost completely ignore the so-called "compressed
> capacity" claims by tape drive manufacturers...
By definition, random data are not compressible. It's my understanding that
the "compressed capacity" of tapes is based explicitly on an expected 2:1
compression ratio for source data (and this is usually cited somewhere in the
small print). That is a reasonable estimate for text. Other data may compress
better or worse. Already-compressed or encrypted data will be incompressible
to the tape drive. In other words, "compressed capacity" is heavily dependent
on your source data.
Cheers,
Paul.
------------------------------------------------------------------------------
This SF.net email is sponsored by
Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|