Amanda-Users

Re: Backup plan and big filesystems

2006-08-18 13:57:33
Subject: Re: Backup plan and big filesystems
From: Jon LaBadie <jon AT jgcomp DOT com>
To: Amanda List <amanda-users AT amanda DOT org>
Date: Fri, 18 Aug 2006 10:35:30 -0400
On Fri, Aug 18, 2006 at 02:06:48PM +0100, Fabio Corazza wrote:
> Sorry for the length of the email, but it's been a bit difficult to
> explain everything... :-)
> 

Followed by a long-winded answer without apologies.

> 
> Hi there,
>  I'm wondering which is the best backup technique that I can implement
> on a Linux-only servers production environment.
> 
> I was wondering initially, also under suggestion of some colleagues, to
> stick with a cron-scripted solution, since they had the impression that
> Amanda would just over complicate everything.

Once installed, configured and running, amanda needs little admin.
Mostly changing tapes and monitoring the daily emails.

You can find evidence on this list of people who have had almost
zero trouble setting up their initial amanda installation and
others who have had considerable teething pains.

> 
> Anyway this would require initial engineering time, like writing cron
> scripts that would manage dump and tar utilities together mtx to manage
> all the stuff. At this point in time, I'd rather go with a ready-to-go
> solution like Amanda, even if initially we still would need some time to
> spend on the configuration. Also, this will give us the fancy of
> detailed email reports, that are always a good thing.

Yes, the reports are a bonus.  Also consider the detailed activity
logs and debugging files with more info than you really want, but
which are there when you do need them.  And indexes of your tapes
and archive contents.

Would your scripts allow you to browse your backups for a specific
file(s) to recover, or would you have to manually look for them and
hope you get the right tape and tape file and command line syntax
and typo-less command line entry?

Could you do recovery from the backup clients as well as from the
amanda server?

Will your scripts make sure you don't overwrite a tape before it
is appropriate to do so?

Will your cron scripts automatically handle the occasional hiccup?
Like a host or network being down at the time of backup or tape
problems?  Will they automatically adjust incremental levels based
on the amount of data change?

There are benefits, but ...

> 
> This is what we have to backup: 4 (four) production servers (2x
> application & 2x database) full filesystems dump (/, /var, etc), and
> some GFS filesystem stored on an iSCSI SAN, mounted as read-only on the
> admin server, where Amanda server will be running.
> 
> We will use tape as backup medium, with a Dell PowerVault 124T attached
> to the SCSI controller of the admin server. This tape library will be
> capable of handling a total size of 6.4TB native uncompressed data with
> 2 magazines of 8 slots each using LTO-3 Ultrium 400GB cartridges.
> 
...

> Other than this, I'm trying to figure out which is the best backup plan.
> I'd go with a weekly full backup for everything (servers local
> filesystems & GFS volumes) and an incremental daily. We will use 2
> different tape sets, to export physically tape cartridges from the data
> center every week for data safety.
> 
> What I'm a bit confused about is how to manage the tape cycles during
> the week, and how to handle backup of big size filesystems (in our case
> GFS). Actually the filesystem is of around 1TB, but I'm guessing if it
> would be better to split it in different mountpoints, each of 400GB.
> This would probably give some benefits for the backup, since this is the
> exact size of our cartridges.
> 
> If I'm not wrong, this is what it should be done:
> 
    [ a traditional backup scheme description snipped ]

While you can do traditional backup scheduling (monster full dumps
one day and incremental the others) amanda is best used the way
it was designed, by allowing amanda to dynamically scheduled your
dumps and spread the full and incrementals over the entire week.

Using any traditional scheme will always have you struggleing with
amanda to "do it your way" rather than amanda's.  It can be done,
and you get the other nice benefits, but it isn't pretty.

The backup unit of amanda is called a disklist entry (DLE) based
on the configuration file name.  A DLE may be an entire filesystem
or one or more directory trees.  You provide amanda a framework
from which to do its scheduling -- how frequently will you be
doing dumps, how regularly do you need a full dump, up to how many
tapes can it use for a single dump, how many tapes total are
in the rotation.  Things like this and many many others can be
specified globally and for many of the configuration parameters
can also be unique to each DLE (if you want, some may get full
dumps monthly, others weekly, others daily, some only full dumps,
skip the incrementals, some only incrementals, full dumps only
on demand, some software compression, others not, etc.).

Once setup, for each run of amdump, amanda plans its activities
for that run, which will get full, which incremental, and at what
level.  The aim of this planning is to give you a consistant size
of dump each run.

For some, a negative aspect of dynamic scheduling is off-site storage.
Rather than take one dump's worth of tapes from the monster dump of
a traditional scheduling scheme, you need to off-site a dump cycle's
worth of tapes (say a weeks worth).  For this type of need a common
solution is to have a separate configuration that only runs occasionally
and does full dumps of everything.


> 
> The questions are:
> 
> - How many cartridges do I need?

How many tapes do you use per run, how often do you run a dump,
how many cycles of dumps do you wish to have available?

> - How cartridges should be rotated during the week?

Amanda will follow the same order it originally sees the tapes.
It doesn't do a "this is mondays tape but today is tuesday".
It uses them in a rotation.


> - Is it better to split the GFS filesystem into multiple 400GB volumes
> to improve the ease of backup?

If go with amanda you definitely should do that GFS filesystem as
multiple DLEs.  If I got your data size correct, you have about
1 TB of GFS data plus 0.4 TB of misc OS data, about 1.4TB total.

With amanda's dynamic scheduling, and say daily runs with a dump
cycle of 1 week (7 runs/week, full dump of everything each week),
a perfect balance would have 1.4TB/7 or 200GB/day of full dumps.
Plus some incremental data plus it is likely that some DLE will
get more than one full dump during the week.  But it sounds like
things would fit onto a single LTO-3 tape, even without considering
compression.  Your daily dumps would probably be about 300GB.

Amanda schedules best when it has a goodly number of DLEs to balance.
It is hard to balance 4 DLEs of 1TB, 100GB, 100Gb, 100GB.  :)
You probaly have about 15-20 file systems among your 4 or 5 clients
(will the server also be a client) and if you setup 8-10 DLEs from
your 1TB GFS system, you should be golden.

> 
> Sorry for being stressful.. I would hoped to be more specific...
> 

Prescription for stress:
5 amanda installations, bed rest, and call us in the morning.

jl
-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)