Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape



Rory Campbell-Lange wrote:

On 13/08/09, Charles Curley (charlescurley AT charlescurley DOT com) wrote:

On Thu, 13 Aug 2009 01:08:03 -0400
Jon LaBadie <jon AT jgcomp DOT com> wrote:

On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote:

So maybe you should provide a complete OS distribution, including the

backup software.  Like a customized version of one of the "live CD"
releases of Linux.  But wait, will that distribution's included
device drivers work on the devices that will exist in 12 years?  Will
that era's computers still have CD drives.  Will they be bootable?

Oh, folks 12 years hence ought to be able to dig out 12 year old

computers to run their 12 year old distributions on.


Many thanks for this note, Charles, and to the other notes Chris,
Charles and Jon about their commentary about using Amanda to provide a
long-term archive format. The points about being able to use standard
Unix tools to retrieve information is well made, as is the point that
the current machines and architectures (and CDs!) may not be around in

12 years' time. Thanks very much for those observations.

I'd like to return the other part of my question if I may:

The backup tape format is to be LT04 and we have a second-hand Dell
PowerVault 124T 16 tape autoloader to work with currently. Backup from
a pool may be taken off a Linux LVM (or hopefully soon a BTRFS)
snapshot ensuring that the source data does not change during the
backup process. We have the possibility of pre-preparing backup or
compressed images if this is advisable.


I'd be grateful to learn specifically if the approach I have set out
seems feasible. Also:

- is the snapshot volume or secondary holding pool advisable?- is compression / deduplication possible?- after scanning through the wiki I can't see any references to whatI think of as a backup job "catalogue". How does one know whatfiles were part of a particular backup job?

Thanks for any further advice.

One further comment on the nature of long term archives (and then on toyour specific questions):

I used to work in the Systems Office of the University Library. Ihandled backups there, and had close contact with a group of librarianswho were into digital content, archives and special collections. Amongother things, we kicked around ideas about how to archive digitalcollections, the life expectancies and failure rates of various types ofCDs (generally terrible in reality), etc. When librarians talk aboutarchives, they don't just talk decades. They expect things to last ahundred years and more. In that light, they have concluded that thesolution is akin to the Japanese monks caring for Bonzai. There arerecords of Bonzai trees that have been cared for for hundreds of years.So, think of sysadmins as monks caring for data. The archive librarianssolution is raid6 with hot spares and mirrored to another location. Thesysadmins maintain and update hardware and software and transfer datawhen necessary. Although I haven't kept up with that area, they weredeveloping a cooperative distributed archive software as an open source.The idea was that different libraries join the cooperative, run thesoftware, and they end up with multiple copies of their digitalcollections distributed geographically among other libraries. If yourlibrary burns down, you rebuild, set up the software, and bring yourcollection back. Sort of a cloud library, if you will.

So, as technology changes, you need to be frequently reviewing the stateof your archives, keeping an eye on compatibility bottlenecks andtransferring data to newer media when it becomes necessary. I have afaculty member who ran his own backups on AIT2 for years. His drives arefairly old now. I periodically urge him to read them back in to a diskarchive and allow me to put them on AIT5. He's too busy. Ah, well. It'shis data.


--------------------------

As for your specific questions:

You should be able to do LVM snapshots. I use fssnap on Solaris 9 and10, and scanning through, here are just a couple of references I find topeople using LVM snapshots with Amanda:

http://wiki.zmanda.com/index.php/FAQ:Which_backup_program_for_filesystems_is_better%3F
http://archives.zmanda.com/amanda-archives/viewtopic.php?t=2711&sid=f1535cf0b0782bf2b99aebc033e91c9c
http://archives.zmanda.com/amanda-archives/viewtopic.php?p=9823&sid=8e54f6a0b4ab2cd58bd02e048c299844

In the past that sort of thing had always been done with a wrapperscript (described toward the end ofhttp://wiki.zmanda.com/index.php/Backup_client). Paul Bijens refers to ascript that he uses in one of the above links. With the latest releasesof Amanda, there is a new API that could make it even easier to implement.


Typically, we set up Amanda with holding disk space.

See the section of the sample amanda.conf partway down regarding holdingdisks -- http://wiki.zmanda.com/images/f/f6/Amanda.conf.example.txt.Also, seehttp://wiki.zmanda.com/index.php/FAQ:Should_I_use_a_holdingdisk_when_the_final_destination_of_the_backup_is_a_virtual_tape%3F.A holding disk allows parallel dumps. Dumps then go from holding disk totape while other dumps continue. I have a couple of 300G Ultra320 SCSIdisks on a separate SCSI bus from the tape drive. You have to jugglewhatever your hardware setup is to support the throughput you need. Forone of my departments, my Amanda server has only a 100Mb networkinterface. For another department, my Amanda server has 4 GigE networkinterfaces and a multihomed SAS interconnect.

Compression can be done either on the client, on the server, or on thetape drive. Obviously, if you use software compression, you want to turnoff the tape drive compression. I use server side compression, because Ihave a dedicated Amanda server that can handle it. By not using the tapedrive compression, Amanda has more complete information on data size andtape usage for its planning. If your server is more constrained thanyour clients, you could use client compression. This is specified inyour dumptypes in your amanda.conf.

Deduplication is not available with Amanda. However, some people stagedifferent kinds of tools and use Amanda for the final staging andmanagement of tapes and archives. So, in some situations, BackupPC couldbe used to do deduplication from, say, desktop clients to a serverarchive which is then backed up by Amanda. That could start complicatingyour 12 year recovery scenario and what happens when software is notavailable or doesn't run.

Amanda uses the term "index" rather than "catalog" -- seehttp://wiki.zmanda.com/index.php/Amanda_Index.

Note that if you are putting tapes into a long term archive with nointent of recycling them in subsequent backups, you can use amadmin tomark them as no-reuse. I periodically (typically at the end ofsemesters) do a force full, mark the tapes as no-reuse, and then pullthem out of my tapecycle and put them in storage.


HTH


--
---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center

~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogendyk AT bio.umass DOT edu>

---------------

Erdös 4