Bacula-users

Re: [Bacula-users] [Bacula-devel] RFC: backing up hundreds of TB

2009-11-28 08:58:39
Subject: Re: [Bacula-users] [Bacula-devel] RFC: backing up hundreds of TB
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Sat, 28 Nov 2009 14:43:09 +0100
Hi,

27.11.2009 13:23, Ralf Gross wrote:
> [crosspost to -users and -devel list]
> 
> Hi,
> 
> we are happily using bacula since a few years and already backing up
> some dozens of TB (large video files) to tape.
> 
> In the next 2-3 years the amount of data will be growing to 300+ TB.
> We are looking for some very pricy solutions for the primary storage
> at the moment (NetApp etc). But we (I) are also looking if it is
> possible to go on with the way we store the data right now. Which is
> just some large raid arrays and backup2tape.

Good luck... while I agree that SAN/NAS appliances tend to look 
expensive, they've got their advantages when your space has to grow to 
really big sizes. Managing only one setup, when some physical disk 
arrays work together is one of these advantages.

Also, if you're running a number of big RAID arrays, reliable vendor 
support is surely beneficial.

> I've no problem with the primary storage and 10 very big raid boxes
> (high availability is not needed).  What frightens me is to backup all
> the data. Most files will be written once and maybe never been
> accessed again.

Yearly full backups, and then only incrementals (using accurate backup 
mode) should be a usable approach. Depending on how often you expect 
to need recovery, you may want your boss to spend some money on a 
bigger tape library :-) to make sure most data can be restored without 
manually loading tapes.

> But the data need to be online and there is a
> requirement for backup and the ability to restore deleted files
> (retention time can be different, going from 6 months to a couple of
> years).

6 months retention time and actually pruning data would be easier with 
full backups more often than one year, probably.

I think you should start by defining how long you want to keep your 
data, how to do full backups when those jobs will surely run longer 
than your regular backup windows (either splitting the jobs into 
smaller parts, or making sure you can run backups over a 
non-production network; measuring impact of backups on other file 
system accesses).

> 
> The snapshot feature of some commerical products is a very nice
> feature for taking backups and it's a huge benefit that only the
> deltas are stored.

You can build on snapshot capability of SAN filers with Bacula. You'll 
still get normal file backups, but that's an advantage IMO... the most 
useful aspect of those snapshots is that you get a consistent stae of 
the file system, and don't affect production access more than necessary.

> Backup2tape means that with the classic Full/Diff/Incr setup we'll
> need many tapes. Even if the data on the primary storage won't change. 

Sure - a backup is reliable if you've got at least two copies of your 
files, so for 300 TB, you'll need some tapes. But tapes are probably 
cheaper than the required disk capacity for a NetApp filer :-)

> Backup2disk has the disadvantage that a corrupt filesystem (been
> there, seen that...) can ruin TB's of backed up data. And we will need
> file storage that is much bigger than the primary storage (keeping x
> versions of a file....).

Yup. Tapes get cheaper at that volume, especially since they don't 
need power when stored.

> 
> Anyone else here with the same problem? Anyone (maybe Kern or Eric)
> here that can tell if one of the upcoming new bacula features (dedup?)
> could help to solve the problem with the massive amount of tapes
> needed and the growing time windows and bandwidth for backups?

Well, I'm not Kern or Eric - you've got Kerns feedback - but 
deduplication using a deduping virtual tape library (dVTL I'll call 
it) might be one way to go. Unfortunately, these things are, as far as 
I know, slow compared to real tape drives. And more expensive than a 
big RAID array for use with Bacula.

> Any thought?

I'd build something using LUN snapshots to minimize (not eliminate!) 
performance impact on production use of your file systems, a dedicated 
backup network to keep the production network from getting slow during 
backups, and a big tape library. For fast backups and recovery, you 
might add a big extra RAID array for Bacula use and implement disk to 
disk to tape backups using Baculas copy or migration feature.

Data would end up on tapes for relatively cheap and energy-efficient 
storage. You'll need some nice big tape library, or employ a tape 
operator for a few hours a day ;-)

Cheers,

Arno

> 
> Thanks, Ralf
> 
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Bacula-devel mailing list
> Bacula-devel AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
> 

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users