Bacula-users

Re: [Bacula-users] Bacula Compression - other then GZIP

2010-02-10 12:57:40
Subject: Re: [Bacula-users] Bacula Compression - other then GZIP
From: Phil Stracchino <alaric AT metrocast DOT net>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 10 Feb 2010 12:54:57 -0500
On 02/10/10 10:36, Sean M Clark wrote:
> xz/lzma is another consideration.  At moderate compression levels, lzma
> seems to be about the same or slightly faster than bzip2 with a little
> better compression.  At lower compression levels it seems like it's
> about as fast as gzip while compressing noticeably farther - at least
> in the small amount of testing I've done so far with the "xz"
> implementation of lzma compression.


I was going to mention xz myself.  I just completed some rather more
extensive tests.

I'm using three example test files here.  The first, a 590MB ISO of
Windows XP Pro SP3, contains a large amount of already-compressed data,
and can be expected to compress poorly.  The second, an 8.5MB stripped
ELF 32-bit LSB executable, can probably be expected to compress
moderately well.  The third, a ebook resaved in text format, isabout
1.5MB of English ASCII text and should compress very well.  I'm
compressing each with gzip default options, gzip -9, bzip2, xz default
options, and xz -7.  (The xz man page notes that compression settings
above 7 are not recommended unless absolute maximum compression is
necessary due to time and memory usage.)

First, the WinXP ISO (whitespace adjusted for clarity):

babylon5:alaric:~:10 $ ls -l winxp.iso
-rw-r----- 1 alaric users 617754624 Feb 10 10:24 winxp.iso

babylon5:alaric:~:11 $ time gzip -c < winxp.iso | dd bs=64K >/dev/null
0+35022 records in
0+35022 records out
573799160 bytes (574 MB) copied, 78.782 s, 7.3 MB/s
real    1m18.935s
user    0m53.804s
sys     0m4.357s
compression: 7.12%
compression/time: 0.0901

babylon5:alaric:~:12 $ time gzip -9 -c < winxp.iso | dd bs=64K >/dev/null
0+35013 records in
0+35013 records out
573652786 bytes (574 MB) copied, 111.185 s, 5.2 MB/s
real    1m51.207s
user    1m11.860s
sys     0m4.905s
compression: 7.14%
compression/time: 0.0643

babylon5:alaric:~:13 $ time bzip2 -c < winxp.iso | dd bs=64K >/dev/null
0+140444 records in
0+140444 records out
575258513 bytes (575 MB) copied, 808.258 s, 712 kB/s
real    13m28.370s
user    10m11.257s
sys     0m6.221s
compression: 6.88%
compression/time: 0.0085

babylon5:alaric:~:14 $ time xz -c < winxp.iso | dd bs=64K >/dev/null
0+69111 records in
0+69111 records out
566328660 bytes (566 MB) copied, 1395.3 s, 406 kB/s
real    23m15.341s
user    17m39.189s
sys     0m9.664s
compression: 8.43%
compression/time: 0.0060

babylon5:alaric:~:15 $ time xz -7 -c < winxp.iso | dd bs=64K >/dev/null
0+69040 records in
0+69040 records out
565609576 bytes (566 MB) copied, 1512.2 s, 374 kB/s
real    25m12.247s
user    19m7.363s
sys     0m10.943s
compression: 8.45%
compression/time: 0.0055

With this poorly compressible data, both gzip and gzip -9 yield better
compression than bzip2, with roughly an order of magnitude higher
throughput and lower CPU usage.  The best compression on this file, by a
hair, is achieved by xz -7, with default xz only 0.02% behind but taking
8% less time.  The worst compression of 6.88% is bzip2, but it takes
around half the time xz takes to do it, resulting in an actual
compression/time score 50% better than xz.  gzip achieves about 1.3%
less compression than xz and about 0.25% better than bzip2, but does it
7 to 10 times faster than bzip2 and 12 to 20 times faster than xz.  The
best compression per unit time score is achieved by default gzip.  The
worst, xz -7, is an order of magnitude worse than gzip -9 in
compression/time and achieves only 1.29% additional compression.


Next, the ELF executable.

babylon5:alaric:~:21 $ ls -l mplayer
-rwxr-x--- 1 alaric users 8485168 Feb 10 12:04 mplayer

babylon5:alaric:~:22 $ time gzip -c < mplayer | dd bs=64K >/dev/null
0+230 records in
0+230 records out
3752190 bytes (3.8 MB) copied, 1.26176 s, 3.0 MB/s
real    0m1.266s
user    0m1.032s
sys     0m0.055s
compression: 55.8%
compression/time: 44.075

babylon5:alaric:~:23 $ time gzip -9 -c < mplayer | dd bs=64K >/dev/null
0+228 records in
0+228 records out
3734027 bytes (3.7 MB) copied, 2.76918 s, 1.3 MB/s
real    0m2.779s
user    0m2.119s
sys     0m0.054s
compression: 56%
compression/time: 20.173

babylon5:alaric:~:24 $ time bzip2 -c < mplayer | dd bs=64K >/dev/null
0+880 records in
0+880 records out
3603587 bytes (3.6 MB) copied, 6.41314 s, 562 kB/s
real    0m6.426s
user    0m5.128s
sys     0m0.050s
compression: 57.5%
compression/time: 8.948

babylon5:alaric:~:25 $ time xz -c < mplayer | dd bs=64K >/dev/null
0+362 records in
0+362 records out
2964084 bytes (3.0 MB) copied, 21.0693 s, 141 kB/s
real    0m21.098s
user    0m15.434s
sys     0m0.316s
compression: 65%
compression/time: 3.081

babylon5:alaric:~:26 $ time xz -7 -c < mplayer | dd bs=64K >/dev/null
0+362 records in
0+362 records out
2964084 bytes (3.0 MB) copied, 19.8819 s, 149 kB/s
real    0m19.913s
user    0m15.347s
sys     0m0.301s
compression: 65%
compression/time: 3.264

This is not all that dissimilar a picture.  Interestingly, here, default
xz and xz -7 achieve identical compression, but xz -7 accomplishes it
slightly over a second faster.  Both lead bzip2 by about 7.5% in
compression, but take around three times as long.  gzip -9 achieves only
0.2% better compression than default gzip, but takes more than 50%
longer to do it.  Even gzip -9 is still more than twice as fast as bzip2
and almost seven times faster than xz, and trails it by only 9% in
compression.  Vanilla gzip, only a fraction behind gzip -9 in
compression, is more than twice as fast as gzip -9 and five times faster
than bzip2, and has almost 15 times the compression/time score of the
best compressor, xz -7.


Finally, the text file:

babylon5:alaric:~:31 $ ls -l 1634-The_Baltic_War.txt
-rw-rw---- 1 alaric users 1501227 Feb 10 12:23 1634-The_Baltic_War.txt

babylon5:alaric:~:32 $ time gzip -c < 1634-The_Baltic_War.txt | dd
bs=64K >/dev/null
0+35 records in
0+35 records out
568436 bytes (568 kB) copied, 0.248751 s, 2.3 MB/s
real    0m0.256s
user    0m0.217s
sys     0m0.006s
compression: 62.135%
compression/time: 242.695

babylon5:alaric:~:33 $ time gzip -9 -c < 1634-The_Baltic_War.txt | dd
bs=64K >/dev/null
0+35 records in
0+35 records out
566204 bytes (566 kB) copied, 0.311892 s, 1.8 MB/s
real    0m0.321s
user    0m0.269s
sys     0m0.009s
compression: 62.284%
compression/time: 194.018

babylon5:alaric:~:34 $ time bzip2 -c < 1634-The_Baltic_War.txt | dd
bs=64K >/dev/null
0+101 records in
0+101 records out
412638 bytes (413 kB) copied, 1.12327 s, 367 kB/s
real    0m1.130s
user    0m0.949s
sys     0m0.023s
compression: 72.513%
compression/time: 64.168

babylon5:alaric:~:35 $ time xz -c < 1634-The_Baltic_War.txt | dd bs=64K
>/dev/null
0+55 records in
0+55 records out
444852 bytes (445 kB) copied, 4.89832 s, 90.8 kB/s
real    0m4.917s
user    0m3.809s
sys     0m0.069s
compression: 70.367%
compression/time: 14.311

babylon5:alaric:~:36 $ time xz -7 -c < 1634-The_Baltic_War.txt | dd
bs=64K >/dev/null
0+55 records in
0+55 records out
444852 bytes (445 kB) copied, 4.79776 s, 92.7 kB/s
real    0m4.815s
user    0m3.854s
sys     0m0.095s
compression: 70.367%
compression/time: 14.614

Now this gets interesting.  Default xz and xz -7 still achieve identical
compression, and again, xz -7 is actually fractionally faster than
default xz.  However, on this data, bzip2 beats both in compression by
just over 2%, and does it more than three times faster, for a
compression/time score about 4.5 times better than xz -7.  This is the
first time any of the three has achieved a *significantly* smaller
output than gzip; the bzip2 output file is roughly 28% smaller than
gzip's.  However, even gzip -9 is still over three times faster than
bzip2, and default gzip is four times faster, with a compression/time
score 17 times better than xz.


So, the overall conclusion:  Yes, you can achieve some savings in space
by using bzip2 or even xz for your compression, if you can afford the
additional CPU and memory utilization.  But it will probably not be a
significant space saving on most data, and it will come at a horrendous
cost in terms of CPU usage and actual throughput.  For on-the-fly
compression of a high-volume stream of mixed data, you're actually
probably best off staying with plain-vanilla gzip, unless you're I/O
bound on your backup writes already *AND* you have a massive compute
server to do your compression - and even then, you may not gain much
unless you use one of the multi-threaded versions of bzip2 or xz that
can make use of multiple CPU cores for a single compression task.


-- 
  Phil Stracchino, CDK#2     DoD#299792458     ICBM: 43.5607, -71.355
  alaric AT caerllewys DOT net   alaric AT metrocast DOT net   phil AT 
co.ordinate DOT org
         Renaissance Man, Unix ronin, Perl hacker, Free Stater
                 It's not the years, it's the mileage.

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>