Amanda-Users

Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape

2009-08-13 17:06:33
Subject: Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
From: Chris Hoogendyk <hoogendyk AT bio.umass DOT edu>
To: amanda-users AT amanda DOT org
Date: Thu, 13 Aug 2009 13:28:19 -0400


Rory Campbell-Lange wrote:
On 13/08/09, Charles Curley (charlescurley AT charlescurley DOT com) wrote:
On Thu, 13 Aug 2009 01:08:03 -0400
Jon LaBadie <jon AT jgcomp DOT com> wrote:
On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote:
So maybe you should provide a complete OS distribution, including the
backup software.  Like a customized version of one of the "live CD"
releases of Linux.  But wait, will that distribution's included
device drivers work on the devices that will exist in 12 years?  Will
that era's computers still have CD drives.  Will they be bootable?
Oh, folks 12 years hence ought to be able to dig out 12 year old
computers to run their 12 year old distributions on.

Many thanks for this note, Charles, and to the other notes Chris,
Charles and Jon about their commentary about using Amanda to provide a
long-term archive format. The points about being able to use standard
Unix tools to retrieve information is well made, as is the point that
the current machines and architectures (and CDs!) may not be around in
12 years' time. Thanks very much for those observations.
I'd like to return the other part of my question if I may:

The backup tape format is to be LT04 and we have a second-hand Dell
PowerVault 124T 16 tape autoloader to work with currently. Backup from
a pool may be taken off a Linux LVM (or hopefully soon a BTRFS)
snapshot ensuring that the source data does not change during the
backup process. We have the possibility of pre-preparing backup or
compressed images if this is advisable.

I'd be grateful to learn specifically if the approach I have set out
seems feasible. Also:
- is the snapshot volume or secondary holding pool advisable? - is compression / deduplication possible? - after scanning through the wiki I can't see any references to what I think of as a backup job "catalogue". How does one know what files were part of a particular backup job?
Thanks for any further advice.

One further comment on the nature of long term archives (and then on to your specific questions):

I used to work in the Systems Office of the University Library. I handled backups there, and had close contact with a group of librarians who were into digital content, archives and special collections. Among other things, we kicked around ideas about how to archive digital collections, the life expectancies and failure rates of various types of CDs (generally terrible in reality), etc. When librarians talk about archives, they don't just talk decades. They expect things to last a hundred years and more. In that light, they have concluded that the solution is akin to the Japanese monks caring for Bonzai. There are records of Bonzai trees that have been cared for for hundreds of years. So, think of sysadmins as monks caring for data. The archive librarians solution is raid6 with hot spares and mirrored to another location. The sysadmins maintain and update hardware and software and transfer data when necessary. Although I haven't kept up with that area, they were developing a cooperative distributed archive software as an open source. The idea was that different libraries join the cooperative, run the software, and they end up with multiple copies of their digital collections distributed geographically among other libraries. If your library burns down, you rebuild, set up the software, and bring your collection back. Sort of a cloud library, if you will.

So, as technology changes, you need to be frequently reviewing the state of your archives, keeping an eye on compatibility bottlenecks and transferring data to newer media when it becomes necessary. I have a faculty member who ran his own backups on AIT2 for years. His drives are fairly old now. I periodically urge him to read them back in to a disk archive and allow me to put them on AIT5. He's too busy. Ah, well. It's his data.

--------------------------

As for your specific questions:

You should be able to do LVM snapshots. I use fssnap on Solaris 9 and 10, and scanning through, here are just a couple of references I find to people using LVM snapshots with Amanda:
http://wiki.zmanda.com/index.php/FAQ:Which_backup_program_for_filesystems_is_better%3F
http://archives.zmanda.com/amanda-archives/viewtopic.php?t=2711&sid=f1535cf0b0782bf2b99aebc033e91c9c
http://archives.zmanda.com/amanda-archives/viewtopic.php?p=9823&sid=8e54f6a0b4ab2cd58bd02e048c299844

In the past that sort of thing had always been done with a wrapper script (described toward the end of http://wiki.zmanda.com/index.php/Backup_client). Paul Bijens refers to a script that he uses in one of the above links. With the latest releases of Amanda, there is a new API that could make it even easier to implement.

Typically, we set up Amanda with holding disk space.
See the section of the sample amanda.conf partway down regarding holding disks -- http://wiki.zmanda.com/images/f/f6/Amanda.conf.example.txt. Also, see http://wiki.zmanda.com/index.php/FAQ:Should_I_use_a_holdingdisk_when_the_final_destination_of_the_backup_is_a_virtual_tape%3F. A holding disk allows parallel dumps. Dumps then go from holding disk to tape while other dumps continue. I have a couple of 300G Ultra320 SCSI disks on a separate SCSI bus from the tape drive. You have to juggle whatever your hardware setup is to support the throughput you need. For one of my departments, my Amanda server has only a 100Mb network interface. For another department, my Amanda server has 4 GigE network interfaces and a multihomed SAS interconnect.

Compression can be done either on the client, on the server, or on the tape drive. Obviously, if you use software compression, you want to turn off the tape drive compression. I use server side compression, because I have a dedicated Amanda server that can handle it. By not using the tape drive compression, Amanda has more complete information on data size and tape usage for its planning. If your server is more constrained than your clients, you could use client compression. This is specified in your dumptypes in your amanda.conf.

Deduplication is not available with Amanda. However, some people stage different kinds of tools and use Amanda for the final staging and management of tapes and archives. So, in some situations, BackupPC could be used to do deduplication from, say, desktop clients to a server archive which is then backed up by Amanda. That could start complicating your 12 year recovery scenario and what happens when software is not available or doesn't run.

Amanda uses the term "index" rather than "catalog" -- see http://wiki.zmanda.com/index.php/Amanda_Index.

Note that if you are putting tapes into a long term archive with no intent of recycling them in subsequent backups, you can use amadmin to mark them as no-reuse. I periodically (typically at the end of semesters) do a force full, mark the tapes as no-reuse, and then pull them out of my tapecycle and put them in storage.

HTH


--
---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk AT bio.umass DOT edu>

---------------
Erdös 4



<Prev in Thread] Current Thread [Next in Thread>