Bacula-users

Re: [Bacula-users] Include Dir Containing?

2011-03-03 11:23:08
Subject: Re: [Bacula-users] Include Dir Containing?
From: Bob Hetzel <beh AT case DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 03 Mar 2011 11:20:18 -0500
> From: Christian Manal <moenoel AT informatik.uni-bremen DOT de>
> Subject:      
> To: bacula-users AT lists.sourceforge DOT net
> Message-ID: <4D6CB79B.3070001 AT informatik.uni-bremen DOT de>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Am 26.02.2011 04:52, schrieb Dan Langille:
>> > On 2/25/2011 5:49 AM, Christian Manal wrote:
>>> >> Am 22.02.2011 14:06, schrieb Christian Manal:
>>>> >>> Am 22.02.2011 13:45, schrieb Marc Schiffbauer:
>>>>> >>>> * Christian Manal schrieb am 22.02.11 um 12:43 Uhr:
>>>>>> >>>>> Am 22.02.2011 12:26, schrieb Phil Stracchino:
>>>>>>> >>>>>> On 02/22/11 06:07, Christian Manal wrote:
>>>>>>>> >>>>>>> That's right, but not what I need. I want to include the 
>>>>>>>> >>>>>>> directory
>>>>>>>> >>>>>>> containing the specific file, not the file itself like it is 
>>>>>>>> >>>>>>> shown in
>>>>>>>> >>>>>>> the examples.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> To give an example: By default, nothing under
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>     /export/home
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> will be backed up. Now a user "foo" creates a file ".backmeup" 
>>>>>>>> >>>>>>> in his
>>>>>>>> >>>>>>> home directory or a subdirectory of it. For example
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>     /export/home/foo/important/.backmeup
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> The next backup should then include the directory
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>     /export/home/foo/important
>>>>>>> >>>>>>
>>>>>>> >>>>>> There is not, to my knowledge, any built-in functionality to do 
>>>>>>> >>>>>> this at
>>>>>>> >>>>>> this time.  You'd have to use a script to generate the fileset 
>>>>>>> >>>>>> on demand.
>>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>> I feared as much. Thanks for the replies anyway.
>>>>> >>>>
>>>>> >>>> Christian,
>>>>> >>>>
>>>>> >>>> before bacula had the "Exclude Dir Containing" feature I used
>>>>> >>>> something very similra to you example like that:
>>>>> >>>>
>>>>> >>>> File = "\\|sh -c 'for D in /home; do find $D -xdev -name 
>>>>> >>>> .BACULA_NO_BACKUP \
>>>>> >>>>          -type f -printf \"%h\\n\"; done | tee 
>>>>> >>>> /root/bacula_excluded_dirs.log'"
>>>>> >>>>
>>>>> >>>> That worked very well over years.
>>>>> >>>>
>>>>> >>>> So I think something like that for include should work well for you.
>>>>> >>>>
>>>>> >>>> File = "\\|sh -c 'for D in /home; do find $D -xdev -name 
>>>>> >>>> .BACULA_BACKUP \
>>>>> >>>>          -type f -printf \"%h\\n\"; done | tee 
>>>>> >>>> /root/bacula_included_dirs.log'"
>>>>> >>>>
>>>> >>>
>>>> >>> Thanks. That saves me the work of putting a working script together
>>>> >>> myself :-)
>>> >>
>>> >> Hi again,
>>> >>
>>> >> this turned out to be not viable for my setup. Running that find command
>>> >> in a shell on my fileserver takes about 9 hours for a ZFS dataset of
>>> >> about 620 GiB with far more than 10 million files. And having a 9 hour
>>> >> overhead in my backups just to create the fileset isn't acceptable.
>>> >>
>>> >> So I'm thinking about solving this another way, by letting the users
>>> >> create their own filesets by putting relative paths into a dotfile in
>>> >> the root of their homedir.
>>> >>
>>> >> To give an example: User "foo" puts a file '.backuprc' in his homedir
>>> >> containing the following:
>>> >>
>>> >>     important/stuff
>>> >>     other/important/stuff
>>> >>     mostly/unimportant/stuff/importantfile.txt
>>> >>     mostly/unimportant/stuff/importantfile2.txt
>>> >>
>>> >> which would be collected by something like this:
>>> >> (a first test run with '-name .bashrc' took only about 10 minutes)
>>> >>
>>> >>     find /export/home -maxdepth 2 -type f -name .backuprc \
>>> >>       -exec sh -c '/path/to/sanitize-paths.pl {}<  {}' \;
>>> >>
>>> >> where 'sanitize-paths.pl' filters empty lines and comments, appends the
>>> >> relative path to the absolute path of the homedir, makes sure the users
>>> >> don't pull any shenanigans with this (like putting '../../../' as a
>>> >> path) and also informs them when they have invalid lines in their list.
>>> >>
>>> >> The output would look like this:
>>> >>
>>> >>     /export/home/foo/important/stuff
>>> >>     /export/home/foo/other/important/stuff
>>> >>     /export/home/foo/mostly/unimportant/stuff/importantfile.txt
>>> >>     /export/home/foo/mostly/unimportant/stuff/importantfile2.txt
>>> >>
>>> >>
>>> >> But now I'm concerned about the potential size of the fileset. I have
>>> >> over 3000 homedirs in that filesystem, which could result in some
>>> >> ten-thousand lines and more. Will Bacula handle that or do I have to
>>> >> expect performance issues or even crashes?
>> >
>> > Try it.  Find out.  Sorry, but I really don't know.  It is easily tested
>> > though.  Without involving your users.  Create a test case.
>> >
> I just did that. Made a test setup on a VirtualBox. I created 3000 "home
> directories" and filled them randomly with some dirs and files (around
> 140,000 files total). The script generated fileset had about 18,000
> lines. I noticed no real problems during my test-runs.
>
>
> Regards,
> Christian Manal
>
>
>

Here's one issue you're going to run into if you do it by "including" 
rather than excluding:
First, either way you're going to have to use "ignore fileset changes = 
yes".  If you don't do that you'll find that you get a full backup every 
single time.  Usually that's not what is wanted.
Now suppose you do a full backup on March 1.  On March 10, a user then 
decides some old files that weren't being backed up before should be backed 
up.  The files are all timestamped prior to March 1.  The end result is 
that they won't get backed up until the next full backup.  If you do full 
backups frequently you might not care much about it but you should be aware 
of it.

If you go about it the other way--backing up everything by default and only 
skipping things somebody thought at least a little about you're going to be 
less likely to skip important stuff.

Either way you do this though, by excluding lots of stuff dynamically, or 
including lots of stuff dynamically, you're going to find at some point in 
the future that you missed something and data will be lost forever due to 
user error.  In many environments, that's not worth the risk.  YMMV.

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>