Re: [Amanda-users] Advice needed on Linux backup strategy to LTO-4 tape
2009-08-13 17:06:33
Rory Campbell-Lange wrote:
On 13/08/09, Charles Curley (charlescurley AT charlescurley DOT com) wrote:
On Thu, 13 Aug 2009 01:08:03 -0400
Jon LaBadie <jon AT jgcomp DOT com> wrote:
On Wed, Aug 12, 2009 at 06:17:17PM -0400, rorycl wrote:
So maybe you should provide a complete OS distribution, including the
backup software. Like a customized version of one of the "live CD"
releases of Linux. But wait, will that distribution's included
device drivers work on the devices that will exist in 12 years? Will
that era's computers still have CD drives. Will they be bootable?
Oh, folks 12 years hence ought to be able to dig out 12 year old
computers to run their 12 year old distributions on.
Many thanks for this note, Charles, and to the other notes Chris,
Charles and Jon about their commentary about using Amanda to provide a
long-term archive format. The points about being able to use standard
Unix tools to retrieve information is well made, as is the point that
the current machines and architectures (and CDs!) may not be around in
12 years' time. Thanks very much for those observations.
I'd like to return the other part of my question if I may:
The backup tape format is to be LT04 and we have a second-hand Dell
PowerVault 124T 16 tape autoloader to work with currently. Backup from
a pool may be taken off a Linux LVM (or hopefully soon a BTRFS)
snapshot ensuring that the source data does not change during the
backup process. We have the possibility of pre-preparing backup or
compressed images if this is advisable.
I'd be grateful to learn specifically if the approach I have set out
seems feasible. Also:
- is the snapshot volume or secondary holding pool advisable?
- is compression / deduplication possible?
- after scanning through the wiki I can't see any references to what
I think of as a backup job "catalogue". How does one know what
files were part of a particular backup job?
Thanks for any further advice.
One further comment on the nature of long term archives (and then on to
your specific questions):
I used to work in the Systems Office of the University Library. I
handled backups there, and had close contact with a group of librarians
who were into digital content, archives and special collections. Among
other things, we kicked around ideas about how to archive digital
collections, the life expectancies and failure rates of various types of
CDs (generally terrible in reality), etc. When librarians talk about
archives, they don't just talk decades. They expect things to last a
hundred years and more. In that light, they have concluded that the
solution is akin to the Japanese monks caring for Bonzai. There are
records of Bonzai trees that have been cared for for hundreds of years.
So, think of sysadmins as monks caring for data. The archive librarians
solution is raid6 with hot spares and mirrored to another location. The
sysadmins maintain and update hardware and software and transfer data
when necessary. Although I haven't kept up with that area, they were
developing a cooperative distributed archive software as an open source.
The idea was that different libraries join the cooperative, run the
software, and they end up with multiple copies of their digital
collections distributed geographically among other libraries. If your
library burns down, you rebuild, set up the software, and bring your
collection back. Sort of a cloud library, if you will.
So, as technology changes, you need to be frequently reviewing the state
of your archives, keeping an eye on compatibility bottlenecks and
transferring data to newer media when it becomes necessary. I have a
faculty member who ran his own backups on AIT2 for years. His drives are
fairly old now. I periodically urge him to read them back in to a disk
archive and allow me to put them on AIT5. He's too busy. Ah, well. It's
his data.
--------------------------
As for your specific questions:
You should be able to do LVM snapshots. I use fssnap on Solaris 9 and
10, and scanning through, here are just a couple of references I find to
people using LVM snapshots with Amanda:
http://wiki.zmanda.com/index.php/FAQ:Which_backup_program_for_filesystems_is_better%3F
http://archives.zmanda.com/amanda-archives/viewtopic.php?t=2711&sid=f1535cf0b0782bf2b99aebc033e91c9c
http://archives.zmanda.com/amanda-archives/viewtopic.php?p=9823&sid=8e54f6a0b4ab2cd58bd02e048c299844
In the past that sort of thing had always been done with a wrapper
script (described toward the end of
http://wiki.zmanda.com/index.php/Backup_client). Paul Bijens refers to a
script that he uses in one of the above links. With the latest releases
of Amanda, there is a new API that could make it even easier to implement.
Typically, we set up Amanda with holding disk space.
See the section of the sample amanda.conf partway down regarding holding
disks -- http://wiki.zmanda.com/images/f/f6/Amanda.conf.example.txt.
Also, see
http://wiki.zmanda.com/index.php/FAQ:Should_I_use_a_holdingdisk_when_the_final_destination_of_the_backup_is_a_virtual_tape%3F.
A holding disk allows parallel dumps. Dumps then go from holding disk to
tape while other dumps continue. I have a couple of 300G Ultra320 SCSI
disks on a separate SCSI bus from the tape drive. You have to juggle
whatever your hardware setup is to support the throughput you need. For
one of my departments, my Amanda server has only a 100Mb network
interface. For another department, my Amanda server has 4 GigE network
interfaces and a multihomed SAS interconnect.
Compression can be done either on the client, on the server, or on the
tape drive. Obviously, if you use software compression, you want to turn
off the tape drive compression. I use server side compression, because I
have a dedicated Amanda server that can handle it. By not using the tape
drive compression, Amanda has more complete information on data size and
tape usage for its planning. If your server is more constrained than
your clients, you could use client compression. This is specified in
your dumptypes in your amanda.conf.
Deduplication is not available with Amanda. However, some people stage
different kinds of tools and use Amanda for the final staging and
management of tapes and archives. So, in some situations, BackupPC could
be used to do deduplication from, say, desktop clients to a server
archive which is then backed up by Amanda. That could start complicating
your 12 year recovery scenario and what happens when software is not
available or doesn't run.
Amanda uses the term "index" rather than "catalog" -- see
http://wiki.zmanda.com/index.php/Amanda_Index.
Note that if you are putting tapes into a long term archive with no
intent of recycling them in subsequent backups, you can use amadmin to
mark them as no-reuse. I periodically (typically at the end of
semesters) do a force full, mark the tapes as no-reuse, and then pull
them out of my tapecycle and put them in storage.
HTH
--
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk AT bio.umass DOT edu>
---------------
Erdös 4
|
|
|