Amanda-Users

Re: Problem with compression?

2003-02-24 15:29:37
Subject: Re: Problem with compression?
From: Joshua Baker-LePain <jlb17 AT duke DOT edu>
To: John Oliver <joliver AT john-oliver DOT net>
Date: Mon, 24 Feb 2003 14:06:43 -0500 (EST)
On Mon, 24 Feb 2003 at 9:57am, John Oliver wrote

> > You will get exactly 20GB on the tape, after Amanda compression.
> 
> The tape is 20GB native, 40GB compressed.  If amanda is only capable of
> compressing by 0%, then I would submit that its' compression algorithms
> either *really* suck, or simply don't work.  Since I really doubt that,
> I would further submit that maybe amanda *isn't* compressing, after all.
> If you say it is, then I would appreciate an explanation of how
> "compressing" 20GB of data to just fit on a 20GB tape is a useful
> feature.

You're getting hardware and software compression, and pre and 
post-compressed values mixed up.

This is what amanda does (assuming the you are using 'compress client best 
(or fast)' in your dumptype defined in amanda.conf).  As the data is read 
off the disk with tar or dump, it is passed through gzip, then goes over 
the network to the tape server.  There it is either put on the holding 
disk or goes straight to tape.

In either case, the image of the filesystem the tape server gets is 
*smaller* than the data on the disk on the client.  It has been 
compressed.  In the amanda email reports you get, amanda reports the 
pre-compressed size, the post-compressed size, and the compression ratio.  
An excerpt from one of my dumps last night:

DUMP SUMMARY:
                                      DUMPER STATS                TAPER 
STATS  
HOSTNAME DISK          L   ORIG-KB   OUT-KB COMP% MMM:SS    KB/s MMM:SS    
KB/s
------------------------ --------------------------------------- --------------
.
.
$CLIENT  /data/bjf     0   2963470  1259776  42.5  28:38   733.5   5:22  3909.5

See.  That filesystem is about 3GB big on disk, and compressed down to 
about 1.2GB -- better than 2X compression.

Now, as I mentioned, the tape server then puts the *compressed* image on 
tape.  Since software compressed data will *expand* when put through a 
tape drive's hardware compression routine, you must turn that OFF.  
Therefore, your tapes are used at their "native" capacity -- 20GB in your 
case.  Amanda can "only" put 20GB on the tapes.  But that 20GB is data 
that has *already* been compressed, in software, before it goes onto the 
tape.  The actual value on disk was larger.  Again, from my backups last 
night:

STATISTICS:
                          Total       Full      Daily
                        --------   --------   --------
Estimate Time (hrs:min)    0:02
Run Time (hrs:min)         3:02
Dump Time (hrs:min)        4:03       3:55       0:08
Output Size (meg)        4779.6     4597.7      182.0
Original Size (meg)      9332.7     8866.3      466.4
Avg Compressed Size (%)    51.2       51.9       39.0   (level:#disks ...)
Filesystems Dumped           49         32         17   (1:17)
Avg Dump Rate (k/s)       335.5      333.3      401.8

Tape Time (hrs:min)        1:30       1:26       0:04
Tape Size (meg)          4781.9     4599.2      182.8
Tape Used (%)              75.9       73.0        2.9   (level:#disks ...)
Filesystems Taped            49         32         17   (1:17)
Avg Tp Write Rate (k/s)   904.4      909.6      791.3

The original size of the night's backups on disk was 9.3GB -- bigger than 
the 7GB native tapes I'm using.  After compression, it was only about 
4.8GB, and so fit on one tape.

Is that clearer?

Going back to your initial problem, the question you have to be asking 
yourself is why was the level 1 backup of that filesystem over 9GB big 
after compression?  Did a lot of files get their time stamps updated?  
What happened?  You are *not* having a problem with compression, you're 
having an issue with much larger than expected level 1 backups of that 
filesystem.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University







<Prev in Thread] Current Thread [Next in Thread>