Amanda-Users

Re: using disk instead of tape

2006-09-05 18:06:55
Subject: Re: using disk instead of tape
From: Phil Howard <phil-amanda-users AT ipal DOT net>
To: amanda-users AT amanda DOT org
Date: Tue, 5 Sep 2006 16:57:50 -0500
On Tue, Sep 05, 2006 at 10:21:36AM -0400, Gene Heskett wrote:

| And any perceived time saved advantages are lost by a factor of 20 or so 
| when software compression is in use.  Normal backups here are written at 
| 20-50 megabytes/second, but 'compress client best' on a 500 mhz K6 will be 
| slowed to about 50k/second or less for the compression phase.  Once the 
| compression is done, and its in the holding disk, then the actual write is 
| at 20-50 megs/second.  In no way is the speed of the disk a more than a 
| very very minor factor in the amount of time to do the backup here.
| 
| I personally fail to see the point of trying to bypass the filesystem as 
| being a speed bottleneck, its only a percent or three of the total time 
| doing the backups here.  Estimates and compression are the two places to 
| look at when configuring for speed.  If the storage capacity is there, 
| leave the compression out.  However, I selectively use it here on some 
| dle's, particularly those that will compress to less than 10% of the 
| original size, and that does take time when /usr/src on either machine is 
| several gigabytes.
| 
| YMMV, I have maybe 50 gigs at any one time, whereas some may have a 
| terrabyte or more, but thats my take on how relatively pointless (and 
| crippling to the basic premise of amanda) the proposed changes would be at 
| the end of the day.

I wouldn't be using compression.  I've found that when speed matters,
compression only gets in the way, big time.  At the pace disks are getting
bigger and bigger, compression becomes almost moot.  And most of my files
are already compressed.  One project I am considering this for would have
a few terabytes of files already compressed in MPEG and/or DV format.  So
I'd never use compression as the costs majorly outweigh the tiny advantange.

One big problem with a filesystem is the system itself.  It tries to cache
the data blocks and the system actually slows down because it steals pages
from other processes to accomplish that.  Writing such a massive amount of
data at one time is a big load on the system, which causes all processes
to suffer.  Writing to a raw device is different.  In BSD a specific raw
device node exists to bypass the caching.  In Linux, the O_DIRECT option
can be used when opening the device to achieve the same thing.  Writing
then goes directly to the disk and uses relatively little RAM and reduces
the amount of CPU needed, too.

-- 
-----------------------------------------------------------------------------
| Phil Howard KA9WGN       | http://linuxhomepage.com/      http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/   http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------