Bacula-users

Re: [Bacula-users] [Bacula-devel] RFC: backing up hundreds of TB

2009-11-28 10:01:47
Subject: Re: [Bacula-users] [Bacula-devel] RFC: backing up hundreds of TB
From: Ralf Gross <Ralf-Lists AT ralfgross DOT de>
To: bacula-users AT lists.sourceforge DOT net, bacula-devel AT lists.sourceforge DOT net
Date: Sat, 28 Nov 2009 15:57:56 +0100
Arno Lehmann schrieb:
> 27.11.2009 13:23, Ralf Gross wrote:
> > [crosspost to -users and -devel list]
> > 
> > Hi,
> > 
> > we are happily using bacula since a few years and already backing up
> > some dozens of TB (large video files) to tape.
> > 
> > In the next 2-3 years the amount of data will be growing to 300+ TB.
> > We are looking for some very pricy solutions for the primary storage
> > at the moment (NetApp etc). But we (I) are also looking if it is
> > possible to go on with the way we store the data right now. Which is
> > just some large raid arrays and backup2tape.
> 
> Good luck... while I agree that SAN/NAS appliances tend to look 
> expensive, they've got their advantages when your space has to grow to 
> really big sizes. Managing only one setup, when some physical disk 
> arrays work together is one of these advantages.

I fully agree. But this comes with a price that is 5-10 time higher
than a setup with simple RAID arrays and a large changer. In the end
I'll present 2 or 3 concepts and others will decide how valuable the
data is.

 
> Also, if you're running a number of big RAID arrays, reliable vendor 
> support is surely beneficial.
> 
> > I've no problem with the primary storage and 10 very big raid boxes
> > (high availability is not needed).  What frightens me is to backup all
> > the data. Most files will be written once and maybe never been
> > accessed again.
> 
> Yearly full backups, and then only incrementals (using accurate backup 
> mode) should be a usable approach. Depending on how often you expect 
> to need recovery, you may want your boss to spend some money on a 
> bigger tape library :-) to make sure most data can be restored without 
> manually loading tapes.

A big 500 slot lib ist already part of the idea.

 
> > But the data need to be online and there is a
> > requirement for backup and the ability to restore deleted files
> > (retention time can be different, going from 6 months to a couple of
> > years).
> 
> 6 months retention time and actually pruning data would be easier with 
> full backups more often than one year, probably.
> 
> I think you should start by defining how long you want to keep your 
> data, how to do full backups when those jobs will surely run longer 
> than your regular backup windows (either splitting the jobs into 
> smaller parts, or making sure you can run backups over a 
> non-production network; measuring impact of backups on other file 
> system accesses).

Some of the data will only be for a couple of months on the filer,
some for a couple of years. The filer(s) won't be that busy, and there
is a dedicated LAN for backups. 


> > The snapshot feature of some commerical products is a very nice
> > feature for taking backups and it's a huge benefit that only the
> > deltas are stored.
> 
> You can build on snapshot capability of SAN filers with Bacula. You'll 
> still get normal file backups, but that's an advantage IMO... the most 
> useful aspect of those snapshots is that you get a consistent stae of 
> the file system, and don't affect production access more than necessary.
> 
> > Backup2tape means that with the classic Full/Diff/Incr setup we'll
> > need many tapes. Even if the data on the primary storage won't change. 
> 
> Sure - a backup is reliable if you've got at least two copies of your 
> files, so for 300 TB, you'll need some tapes. But tapes are probably 
> cheaper than the required disk capacity for a NetApp filer :-)

Compared to a NetApp filer, tapes are definitly cheaper. Using cheaper
raid arrays it might be a bit different. 

 
> > Backup2disk has the disadvantage that a corrupt filesystem (been
> > there, seen that...) can ruin TB's of backed up data. And we will need
> > file storage that is much bigger than the primary storage (keeping x
> > versions of a file....).
> 
> Yup. Tapes get cheaper at that volume, especially since they don't 
> need power when stored.
> 
> > 
> > Anyone else here with the same problem? Anyone (maybe Kern or Eric)
> > here that can tell if one of the upcoming new bacula features (dedup?)
> > could help to solve the problem with the massive amount of tapes
> > needed and the growing time windows and bandwidth for backups?
> 
> Well, I'm not Kern or Eric - you've got Kerns feedback - but 
> deduplication using a deduping virtual tape library (dVTL I'll call 
> it) might be one way to go. Unfortunately, these things are, as far as 
> I know, slow compared to real tape drives. And more expensive than a 
> big RAID array for use with Bacula.

Yup, I had a look at some VTLs. The are expensive and don't come
with the capacity we need. 

 
> > Any thought?
> 
> I'd build something using LUN snapshots to minimize (not eliminate!) 
> performance impact on production use of your file systems, a dedicated 
> backup network to keep the production network from getting slow during 
> backups, and a big tape library. For fast backups and recovery, you 
> might add a big extra RAID array for Bacula use and implement disk to 
> disk to tape backups using Baculas copy or migration feature.
> 
> Data would end up on tapes for relatively cheap and energy-efficient 
> storage. You'll need some nice big tape library, or employ a tape 
> operator for a few hours a day ;-)

Thanks for your thoughts,
Ralf 

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users