Amanda-Users

Re: suggestion for a disk-to-disk backup server

2007-06-25 16:04:42
Subject: Re: suggestion for a disk-to-disk backup server
From: Jon LaBadie <jon AT jgcomp DOT com>
To: amanda-users AT amanda DOT org
Date: Mon, 25 Jun 2007 15:57:29 -0400
On Mon, Jun 25, 2007 at 11:10:21AM -0700, Rudy Setiawan wrote:
> ah sorry i pressed the sent button accidentally.
> thank you all for the inputs.
> 
> With the number of hosts that I have,
> what are the best configuration such as, should I separate each file
> for each host or I can dump them into one huge file? Will it make a
> difference in the speed of either backup,restoration or cataloging?
> 
> Other system/amanda configuration that can be suggested to improve the
> speed of the restoration and backup?
> 
> Thank you.

Amanda uses a unit of a DLE (DiskList Entry) which is either a file
system or a directory tree on a single host.  So each backup client
host may have one or multiple DLEs.

Considerations:

How much data do all your DLE's represent.  You gave numbers of twenty
clients and one hundred GB/client.  But is that disk space or actual
data that will need to be backed up?

How frequently do you want a full backup of each DLE?  This is called
the dumpcycle and can be different for different DLEs but most often
the same dumpcycle is used for all DLE.  A common dumpcycle is 1 week.

How often will you perform backups within each dumpcycle.  This is
known as the runs per cycle.  Amanda does not do all the full backups
on the same day.  Each run (of the program amdump) will average
1/"runs per cycle" fraction of the total just doing full backups.
The rest of the DLE will receive incremental backups of just the
changed data.  We don't have any idea if your data is relatively
static or very volatile.

Once you estimate those values you can get some idea of the magnitude
of your amdump runs.  For example, suppose you do have 20x100GB of total
data, it changes say 2GB/day per host, you chose to have a dumpcycle of
1 wk and run amdump every day of the week.  An average amdump run will
be full backups of about 300GB (20x100GB/7) plus incrementals of 20*2GB 
for a total of 340GB/run (day).

Next consider how long you wish to retain your backup data.  For my
SOHO I keep a little more than a month's worth on-line.  If that is
good for you, you will be storing about 32 days x 340GB/day or 11 TB
of data.

Most of that may be compressible.  Data varies in its compressibility
from 0% (already compressed or encrypted data) to as much as 90%.
If your data compresses by 40%, you will need 6-7 TB of disk storage
for 1 months worth of backups.

How is your network?  340GB on a 100Mb/sec network is more than
10 hours at the never achieved 100% efficiency.  Sounds like you
better be on a gigabit network.  Multiple servers may or may not
help here, particularly if some servers and clients are on an
isolated subnet or switch.

Who is going to do the compression, the server or the clients?
Compression is cpu intense, server (3GHz is more than "needed")
may be slowed down.  Having the clients do the compression may
help and it will reduce the amount of network traffic as
compressed data is sent.  But if the clients are providing
a service, that service performance may be impacted.

Sounds to me that you should set up a test config with just
2 or 3 clients.  See how that goes and expand from there to
more clients and/or more servers.

jl
-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)