Bacula-users

Re: [Bacula-users] [Bacula-devel] RFC: backing up hundreds of TB

2009-11-28 13:05:53
Subject: Re: [Bacula-users] [Bacula-devel] RFC: backing up hundreds of TB
From: Robert LeBlanc <robert AT leblancnet DOT us>
To: Ralf Gross <Ralf-Lists AT ralfgross DOT de>
Date: Sat, 28 Nov 2009 10:05:28 -0700
On Sat, Nov 28, 2009 at 7:57 AM, Ralf Gross <Ralf-Lists AT ralfgross DOT de> wrote:
Arno Lehmann schrieb:
> 27.11.2009 13:23, Ralf Gross wrote:
> > [crosspost to -users and -devel list]
> >
> > Hi,
> >
> > we are happily using bacula since a few years and already backing up
> > some dozens of TB (large video files) to tape.
> >
> > In the next 2-3 years the amount of data will be growing to 300+ TB.
> > We are looking for some very pricy solutions for the primary storage
> > at the moment (NetApp etc). But we (I) are also looking if it is
> > possible to go on with the way we store the data right now. Which is
> > just some large raid arrays and backup2tape.
>
> Good luck... while I agree that SAN/NAS appliances tend to look
> expensive, they've got their advantages when your space has to grow to
> really big sizes. Managing only one setup, when some physical disk
> arrays work together is one of these advantages.

I fully agree. But this comes with a price that is 5-10 time higher
than a setup with simple RAID arrays and a large changer. In the end
I'll present 2 or 3 concepts and others will decide how valuable the
data is.


> Also, if you're running a number of big RAID arrays, reliable vendor
> support is surely beneficial.
>
> > I've no problem with the primary storage and 10 very big raid boxes
> > (high availability is not needed).  What frightens me is to backup all
> > the data. Most files will be written once and maybe never been
> > accessed again.
>
> Yearly full backups, and then only incrementals (using accurate backup
> mode) should be a usable approach. Depending on how often you expect
> to need recovery, you may want your boss to spend some money on a
> bigger tape library :-) to make sure most data can be restored without
> manually loading tapes.

A big 500 slot lib ist already part of the idea.


> > But the data need to be online and there is a
> > requirement for backup and the ability to restore deleted files
> > (retention time can be different, going from 6 months to a couple of
> > years).
>
> 6 months retention time and actually pruning data would be easier with
> full backups more often than one year, probably.
>
> I think you should start by defining how long you want to keep your
> data, how to do full backups when those jobs will surely run longer
> than your regular backup windows (either splitting the jobs into
> smaller parts, or making sure you can run backups over a
> non-production network; measuring impact of backups on other file
> system accesses).

Some of the data will only be for a couple of months on the filer,
some for a couple of years. The filer(s) won't be that busy, and there
is a dedicated LAN for backups.


> > The snapshot feature of some commerical products is a very nice
> > feature for taking backups and it's a huge benefit that only the
> > deltas are stored.
>
> You can build on snapshot capability of SAN filers with Bacula. You'll
> still get normal file backups, but that's an advantage IMO... the most
> useful aspect of those snapshots is that you get a consistent stae of
> the file system, and don't affect production access more than necessary.
>
> > Backup2tape means that with the classic Full/Diff/Incr setup we'll
> > need many tapes. Even if the data on the primary storage won't change.
>
> Sure - a backup is reliable if you've got at least two copies of your
> files, so for 300 TB, you'll need some tapes. But tapes are probably
> cheaper than the required disk capacity for a NetApp filer :-)

Compared to a NetApp filer, tapes are definitly cheaper. Using cheaper
raid arrays it might be a bit different.

We are in a similar situation where we will be possibly expanding our storage much faster than we ever anticipated. We recently attended a small vendor show and I was impressed by F5 APX product. Our data would be mostly read/written for a short while then sit for a long time and this data would be intermixed. We will also be presenting this data over cifs. F5 APX seems to be a great fit for the following reasons: 1. We can put policies on the data and tier it transparently, when data has not been accessed for 90 days, move it to tier 2 disk and after another predefined time move it to tier 3 (dedup box or something). 2. Cut down back-up times significantly, we would do GFS on tier 1 (only files that have recently changed), then do monthlies on tier2 right after the policy moves it to tier2 and then do something like once every 6 months on tier 3. With tier 3 being on a dedup box, we may even be able to write backups to the dedup box and get backups for free. We have a Neo8000 that we are going to try to wait until LTO5 comes out to upgrade the drives on to be our archival. I was looking for a transparent teiring solution, I didn't realize how it could reduce back-up time as well. The only thing I'm not sure if F5 can handle is Shadow Copy. I'm not sure if there is anything else out there that does the same thing as F5, but we will be looking into it before we purchase. That may give you an idea that you may not have had previously.

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University
 

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>