Re: [Bacula-users] Some Operational Questions: Backing up lots of stuff

Hi Arno,

Thanks for your response. I'll try to trim things down a bit so as not to clog those Internet pipes...

On Wed, Aug 19, 2009 at 15:35, Arno Lehmann <al AT its-lehmann DOT de> wrote:

Hi,

14.08.2009 23:16, K. M. Peterson wrote:
> Hi everyone,
>
> ...

> We have a Network Applicance filer (previously produced under their
> "StoreVault" division), with both CIFS and NFS exports. Backing up NFS
> mounts on our backup server is kind of slow - 3-5MB/sec.

Hmm... this seems *really* slow... does your file set contain many
small files?

Well, yes, probably. Like many environments, we don't really know, but much of our work does revolve around data collection, and software engineers aren't always interested in understanding how 2 million files in a filesystem aren't conducive to operational efficiencies. That's just life, unfortunately.

> Not having the
> hardware to put in to mount and back up the CIFS shares,

Any linux/unix machine should be good for that - you just need samba
installed and I don't see why the shares shouldn't be available
CIFS-mounted. In other words, your backup server itself would probably
be able to mount the CIFS shares. Can you tell us what's the problem
at your site?

Yes. I am, perhaps simply out of ignorance, concerned about whether Bacula accessing CIFS shares captures all available filesystem metadata (permissions/ownership, and all of the other myriad NTFS bits). We are primarily concerned with data protection in the event of large-scale (hate to use the term "catastrophic") failure. We have very few requests for recovery, so the process in place seems most efficient should we have to bring an entire filesystem back.

But if this turns out to be a moot point, I'd happily run some tests to see how efficient we can make the process.

...

> However, backing up the whole thing as one backup job is problematic.
> It takes a long time, it's opaque, and it's the 600 lb (272kg) gorilla
> in the backup workflow. And a restore is going to be even more painful
> from a single backup job of the root of the device.

One of the big problems with dump-based backups, IMO... I'm sure
others disagree here.

Well, again, I'm not sure it's not going to be easier to recover an entire filesystem from a dump file... and I would rather reap speed benefit on the backup end and pay the penalty on the recover side.

> I should point out that I have scripts currently to run through a list
> of CIFS shares, set up the rsh jobs and pipes, and generate a report of
> what got backed up and when and how. It's still one job, though, even
> though each share is a separate "file" in Bacula. It's a problem
> because these jobs create snapshots when they are submitted, and so
> there are snapshots sitting around for the entirety of the job, and I'm
> never sure whether they are going to be cleaned up properly if the job
> gets canceled. And if it does get canceled, I have to re-run everything
> again. Painful.
>
> This isn't the real question, though I'd love it if someone has
> something I haven't thought of.

Well, just two suggestions: Use FIFOs to pass data from the client to
Bacula, and initiate the snapshot only before you start reading the
actual volume. Thus you save disk space and don't have the snapsots
wasting space.

Sorry, I should have been clearer. That's what I'm doing. The issue is that it's easy to initiate a snapshot if each filesystem is a separate job - but my attempts to script something to detect when Bacula actually wants to start reading a fifo and then initiate the snapshot and data stream have come to naught, since as far as /proc and anything else I can "see" is concerned, the fifo isn't "open" until both sides of the pipe are set up: I can't figure out how to detect when Bacula opens one side if the other isn't open (and has data queued). Again, looks like the best way is to generate jobs of a filesystem by filesystem basis, but...

> The real question is a more general
> one: I need to figure out a way to dynamically create jobs. I really
> want one job per filesystem - but what's the consensus of the best way
> to do this? Should I just write job definitions and fileset definitions
> to a file that's included in my director's config, then issue a reload?

Yup, that's todays way to do things. There is no API to create jobs
dynamically.

I was hoping that there was a way to do this with a command in bconsole or something...

> Is there an API that I've missed? Is there something in 3.x that is
> going to make this better?

A first step towards your goal is developed now - an API to create
file sets by browsing a client. In the end, I would be surprised if
writing that file set to the configuration and dynamic job creation
was not added to that.

> I want something that is as transparent as
> possible, and that can be set up so that when a new share/export gets
> created on the thing the backups get updated. I can run something in
> cron, or RunBeforeJob, but it just seems wrong. (By the way, it would
> be cool to have a plugin that would take the output from 'tar' or
> 'dump', and feed it to Bacula as if it were coming from an FD so Bacula
> would store the files and metadata... but I digress.)

Well, your digression is not too far off, I think. You can do the
above - not with dump, which isn't portable enough for that purpose,
but with tar:
- Untar to a temporary directory
- Back up that directory
- skip the leading paths, so you wouldn't capture
/tmp/bactmp/clientXY/etc/services but only /etc/services
- Modify the catalog information so that this backup looks like it was
done on the proper client and at the time the original tar was created.

Oh boy, that's a bit much even for me. Sorry :-)

I say "dump" rather than "tar" because the NetApp backup application is based on [Solaris] dump. I guess what I was asking was whether it might be feasible to have a pipe to take a tar/dump datastream as input and output a bacula-fd datastream out (or two datastreams - one for filedata and the other for attribute data).

For the original problem, I would probably simply handle it as a
management issue: Just instruct your admins to tell the backup admion
to add a job if they create new volumes.

If you really do this very often, a shell script which creates the
necessary job resource with a default fileset and schedule and adds
some run script to create a snaphot, mount that, and destroy it after
the backup would be worth the effort.

If you can get a list of the shares, a script which synconizes this
list with the jobs in Bacula wouldn't be too hard to create, I think.

You're right, and that seems the best way to go. I like, as we say here, to have a belt and suspenders. Again, the issue was whether there was a more reasonable way to create jobs dynamically. Given there really isn't right now, I'll go with the writing-config-files-and-reload process.

> I know I can dynamically define a fileset. But, again, what I need is a
> more granular way to break down a large /job/. I can figure out how to
> kludge it - and I've shown the current NetApp backup system to a few
> people who've considered I should get some therapy - but I'm at the
> point where I think I need to ask for directions.

Without going much more into detail I really can't suggest a solution,
I fear. The problem sounds interesting, though. I'm sure that I would
like to work on it :-) (And you can get my work through Bacula
Systems, too...)

I'd love to... but I don't think I necessarily have the budget right now....

> So, the question here is: is there a better way to plan for a likely
> inability to back up a large-ish filesystem in one job without resorting
> to having to enumerate all of the n level directories and break the task
> up into multiple jobs?

Not now... but, again, the solution is, as far as I know, in the
developers' queue: continue stopped jobs.

Let me ask a question then: if I have a job that dynamically gets a list of files for its fileset, that particular job is "snapshot" when it's submitted, isn't it? Does it make sense to consider a script to have a script to submit the same job, but each time with a different (dynamically generated) list of files in its fileset? So I might have 20 jobs queued with the same "name" but otherwise differing only in the files part of the fileset (this clearly would only work in this fashion for full backups, but that's kind of the problem right now...)? Or is there something rather patently wrong with that...

Again, I feel the need to work within the architecture of Bacula as much as possible. Don't like being way out there at the edge of the caravan, especially as it's always moving (as with all technology projects).

I found OpenVPN to be a good solution - often, a broken network
connection resulted in OpenVPN re-creating its tunnel transparently to
the applications that used that tunnel. In other words, you only see a
moment with a really slow connection and some dropped packets, but TCP
handles that quite well.

Ah, another area of our architecture that I am not in a position to change in order to make the backups work... but, it does occur to me that perhaps I could look at some timeout values to preserve the jobs being run. Thanks for that hint.

And everything else...

_KMP

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users