Bacula-users

[Bacula-users] seeking advice re. splitting up large backups -- dynamic filesets to prevent duplicate jobs and reduce backup time

2011-10-12 17:56:04
Subject: [Bacula-users] seeking advice re. splitting up large backups -- dynamic filesets to prevent duplicate jobs and reduce backup time
From: mark.bergman AT uphs.upenn DOT edu
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 12 Oct 2011 17:53:41 -0400
In an effort to work around the fact that bacula kills long-running
jobs, I'm about to partition my backups into smaller sets. For example,
instead of backing up:

        /home

I would like to backup the content of /home as separate jobs. For example:

        /home/[0-9]*    
        /home/[A-G]*    
        /home/[H-M]*
        /home/[N-Q]*
        /home/[R-U]*
        /home/[V-Z]*
        /home/[a-g]*    
        /home/[h-m]*
        /home/[n-q]*
        /home/[r-u]*
        /home/[v-z]*

I'm looking for advice for how prevent multiple jobs of different names,
that access the same client, from running simultaneously. For example,
to prevent an incremental of job "home0-9" running at the same time as
a full of job "homeA-G".

The only method I can think of is to use a dynamic fileset in the
director to generate the different filesets, so that there's only a
single named job that backs up a different set of files on each full
backup. This way the "Allow Duplicate Jobs" setting can be effective.

For example:

----------------- bacula-dir.conf -------------------------------
    FileSet
    {
          Name = "home"
    
          Include
          {
                Options
                {
                      onefs = yes
                      signature = MD5
                      sparse = yes
                
                      Exclude = yes
                        wildDir = "/lost+found/*"
                 }
                File = "|/usr/local/sbin/generate_fileset Friday"
          }
    }
-----------------------------------------------------------------


Where the "generate_fileset" command would be something like this:

-------------------------------------------------------
#! /bin/bash
today=`date "+%A"`
subsets=(01A-Ga-g 23H-Mh-m 45N-Qn-q 67R-Ur-u 89V-Zv-z)
numsubsets=${#subsets[@]}

usage()
{
        echo "Missing required argument."
        echo
        echo "Usage:"
        echo "        $0 -d Dayofweek -p /path/to/backup"
        echo "Example:"
        echo "        $0 -d Wednesday" -p /export/home
        echo 
        echo ""
        echo "The above example would return the fileset (shell regex) to 
determine what"
        echo "files to backup from \"/export/home\"."
        echo ""
        echo "If the current day is Wedneday (when a full backup is run), the 
fileset"
        echo "would be a regular expression for a subset of all files, for 
example:"
        echo "  /export/home/[${subsets[2]}]*"
        echo ""
        echo "The selected subset will change each week. If the current"
        echo "day is not Wednesday, the fileset will be \"/export/home\""
        echo "to do an incremental backup of all files in \"/export/home\"."
        echo ""
        echo "The possible subsets are:"
        echo "  ${subsets[@]}" | sed -e "s/ /   /g"

        exit 1
}

if [ $# != 4 ] ; then
        usage
fi

while [ $# -gt 0 ] 
do
        case $1 in
                -d)
                        runfull=$2
                        shift 2
                        ;;

                -p)
                        path=$2
                        shift 2
                        ;;

                *)
                        usage
                        ;;      
        esac
done


if [ -z $runfull -o -z $path ] ; then
        usage
fi

if [ $runfull != $today ] ; then
        # we are NOT running a full backup. Return "*" as the
        # fileset to backup, since incrementals should cover everything
        # in the filesystem
        echo "${path}"
else
        # we are running a full backup.
        #       
        # Use the value of the current week-number (within the year,
        # range 00..53) to choose which subset of directories to backup
        weeknum=`date "+%U"`
        index=$((weeknum % numsubsets))
        echo "${path}/[${subsets[$index]}]*"
fi
-------------------------------------------------------


My questions are:

        Does this seem overly complex?

        How does Bacula handle it if the fileset returned from 
"generate_fileset" (ie., "[34R-Uru]*")
        doesn't match any files or directories?

        Any other suggestions?

Environment:
        bacula 5.0.2
        2x LTO-4
        200GB spool file size limit

Thanks,

Mark

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users