Bacula-users

Re: [Bacula-users] Some Operational Questions: Backing up lots of stuff

2009-08-19 15:40:03
Subject: Re: [Bacula-users] Some Operational Questions: Backing up lots of stuff
From: Arno Lehmann <al AT its-lehmann DOT de>
To: "bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 19 Aug 2009 21:35:26 +0200
Hi,

14.08.2009 23:16, K. M. Peterson wrote:
> Hi everyone,
> 
> I have some strategy questions.

Actually, I think you had a bunch of very interesting things to tell 
us ;-)

> We've been using Bacula for about 18 months; backing up ~3.5TB/week to 
> DLT-S4 (Quantum SuperLoader).  We are still on 2.2.8, but will be 
> upgrading to 3.x this fall.

Good move.

> Thanks to everyone on this list, and the Bacula team for an excellent 
> product. 
> 
> We have a Network Applicance filer (previously produced under their 
> "StoreVault" division), with both CIFS and NFS exports.  Backing up NFS 
> mounts on our backup server is kind of slow - 3-5MB/sec.

Hmm... this seems *really* slow... does your file set contain many 
small files?

>  Not having the 
> hardware to put in to mount and back up the CIFS shares,

Any linux/unix machine should be good for that - you just need samba 
installed and I don't see why the shares shouldn't be available 
CIFS-mounted. In other words, your backup server itself would probably 
be able to mount the CIFS shares. Can you tell us what's the problem 
at your site?

> I found that we 
> can get ~20MB/sec by constructing a pipe on the server using rsh and the 
> NetApp dump command.   Of course, all we get into Bacula is a dump file, 
> so we need to set up a similar arrangement with the restore command to 
> restore things, but it's fine.  I want to start backing up the 
> NFS-native trees this way.

Some how I believe that, if Bacula natively supported NDMP, the 
procedure would look similar to what you do :-)

(Which IMO proves that plugins are overrated - much of what a plugin 
is expected to do can be done with a few limes of shell scrip.)

> However, backing up the whole thing as one backup job is problematic.  
> It takes a long time, it's opaque, and it's the 600 lb (272kg) gorilla 
> in the backup workflow.  And a restore is going to be even more painful 
> from a single backup job of the root of the device.

One of the big problems with dump-based backups, IMO... I'm sure 
others disagree here.

> I should point out that I have scripts currently to run through a list 
> of CIFS shares, set up the rsh jobs and pipes, and generate a report of 
> what got backed up and when and how.  It's still one job, though, even 
> though each share is a separate "file" in Bacula.  It's a problem 
> because these jobs create snapshots when they are submitted, and so 
> there are snapshots sitting around for the entirety of the job, and I'm 
> never sure whether they are going to be cleaned up properly if the job 
> gets canceled.  And if it does get canceled, I have to re-run everything 
> again.  Painful.
> 
> This isn't the real question, though I'd love it if someone has 
> something I haven't thought of.

Well, just two suggestions: Use FIFOs to pass data from the client to 
Bacula, and initiate the snapshot only before you start reading the 
actual volume. Thus you save disk space and don't have the snapsots 
wasting space.

>  The real question is a more general 
> one: I need to figure out a way to dynamically create jobs.  I really 
> want one job per filesystem - but what's the consensus of the best way 
> to do this?  Should I just write job definitions and fileset definitions 
> to a file that's included in my director's config, then issue a reload?

Yup, that's todays way to do things. There is no API to create jobs 
dynamically.

> Is there an API that I've missed?  Is there something in 3.x that is 
> going to make this better?

A first step towards your goal is developed now - an API to create 
file sets by browsing a client. In the end, I would be surprised if 
writing that file set to the configuration and dynamic job creation 
was not added to that.

>  I want something that is as transparent as 
> possible, and that can be set up so that when a new share/export gets 
> created on the thing the backups get updated.  I can run something in 
> cron, or RunBeforeJob, but it just seems wrong.   (By the way, it would 
> be cool to have a plugin that would take the output from 'tar' or 
> 'dump', and feed it to Bacula as if it were coming from an FD so Bacula 
> would store the files and metadata... but I digress.)

Well, your digression is not too far off, I think. You can do the 
above - not with dump, which isn't portable enough for that purpose, 
but with tar:
- Untar to a temporary directory
- Back up that directory
- skip the leading paths, so you wouldn't capture 
/tmp/bactmp/clientXY/etc/services but only /etc/services
- Modify the catalog information so that this backup looks like it was 
done on the proper client and at the time the original tar was created.

For the original problem, I would probably simply handle it as a 
management issue: Just instruct your admins to tell the backup admion 
to add a job if they create new volumes.

If you really do this very often, a shell script which creates the 
necessary job resource with a default fileset and schedule and adds 
some run script to create a snaphot, mount that, and destroy it after 
the backup would be worth the effort.

If you can get a list of the shares, a script which synconizes this 
list with the jobs in Bacula wouldn't be too hard to create, I think.

> I know I can dynamically define a fileset.  But, again, what I need is a 
> more granular way to break down a large /job/.  I can figure out how to 
> kludge it - and I've shown the current NetApp backup system to a few 
> people who've considered I should get some therapy - but I'm at the 
> point where I think I need to ask for directions.

Without going much more into detail I really can't suggest a solution, 
I fear. The problem sounds interesting, though. I'm sure that I would 
like to work on it :-) (And you can get my work through Bacula 
Systems, too...)

> We also have a few Windows servers that are in a different hemisphere.  
> I have the same kind of problem: I'd love to just backup "C:", but find 
> we just can't keep the (session) up for long enough to get through it.  
> I know that we're going to have Accurate backups, which I presume might 
> allow us to restart a job, but again over a long Internet link this is 
> going to be problematic.
> 
> So, the question here is: is there a better way to plan for a likely 
> inability to back up a large-ish filesystem in one job without resorting 
> to having to enumerate all of the n level directories and break the task 
> up into multiple jobs?

Not now... but, again, the solution is, as far as I know, in the 
developers' queue: continue stopped jobs.

>  I started writing a script to scan a filesystem 
> and emit the necessary directories to break the backup into a certain 
> number of pieces, but of course we only control the top level, and users 
> are going to want to add things that we'll need to back up incrementally.
> 
> Or, again, is there something I'm missing?

I found OpenVPN to be a good solution - often, a broken network 
connection resulted in OpenVPN re-creating its tunnel transparently to 
the applications that used that tunnel. In other words, you only see a 
moment with a really slow connection and some dropped packets, but TCP 
handles that quite well.

> I'm happy to discuss things off-list if that would be easier.  Many thanks!

Not necessarily easier... it makes collecting input from several 
people more difficult. But feel free to contact 
sales AT baculasystems DOT com to ask for a quote ;-)

Cheers,

Arno

> _KMP
> 
> 
-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>