Bacula-users

[Bacula-users] seeking feedback on dynamic fileset scheme

2014-12-22 16:48:46
Subject: [Bacula-users] seeking feedback on dynamic fileset scheme
From: mark.bergman AT uphs.upenn DOT edu
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 22 Dec 2014 16:45:01 -0500
I'm looking for feedback on a scheme dynamic filesets.

We backup to LTO4, with a schedule where each fileset gets a full every
2 months, a differential weekly, and incrementals nightly. I'm willing
to change the differential backup to monthly (alternating with Fulls).

-The Problem-
As our data volume has grown--now about 45TB being backed up--managing
filesets has become more complex. We've gone through a typical
progression, of backing up "everything", to backing up high-level logical
groupings (ie., all "home dirs" and all "projects") as separate filesets,
then to smaller sets (ie., "home dirs beginning A-F", "projects beginning
[G-Kg-i0-4]", etc.).

The current filesets are very unbalanced (over 12TB in some). I am
reluctant to manually create individual filesets for each group of
directories alphabetically (ie.,"project dirs starting with A", "home
with B", "source code dirs with C", "collaborator dirs with D" etc.)
because this will result in ~100 filesets, and they'd still be very
unbalanced in terms of backup volume, due to uneven distribution in
project and user names.

-Proposed Solution-
The new scheme I'm considering would use a dynamic fileset, generated each
night, to define 3 backup jobs: Full, Differential, and Incremental.

A program would select all directories to be backed up (/home/*,
/projects/*, /src/*, /collab/*) and determine the backup level.

For each directory to be backed up, the path to the directory is hashed
to a number within the range 1-56. The choice of 56 corresponds to double
the number of days in February, and allows us to alternate incrementals &
differentials each month for a given fileset.

For each directory, the backup level is determined by:

        Full backup if:
                (Current month is Odd) and (Directory hash <= 28)
                                AND
                Day of the month == Directory hash

                    OR

                (Current month is Even) and (Directory hash > 28)
                                AND
                Day of the month == (Directory hash - 28)

        Differential backup if:
                (Current month is Odd) and (Directory hash > 28)
                                AND
                Day of the month == (Directory hash - 28)

                    OR

                (Current month is Even) and (Directory hash < 28)
                                AND
                Day of the month == Directory hash

        Incremental backup if:
                Day of the month != Directory hash
                                AND
                Day of the month != (Directory hash - 28)


For example, if the directory "/projects/Bird" had a hash value of 7,
it would get:

                Full backups:
                        Jan 7, Mar 7, May 7, Jul 7, Sep 7, Nov 7
                Differential backups:
                        Feb 7, Apr 7, Jun 7, Aug 7, Oct 7, Dec 7
                Incremental backups every other night


For example, if the directory "/home/Byrd" had a hash value of 44,
it would get:

                Full backups:
                        Feb 16, Apr 16, Jun 16, Aug 16, Oct 16, Dec 16
                Differential backups:
                        Jan 16, Mar 16, May 16, Jul 16, Sep 16, Nov 16
                Incremental backups every other night

Full & Differential backups would not be started on days 29-31 of
any month.

This should make each nightly backup volume smaller, by running more
Full jobs per month, each of a smaller size.

The downsides to this scheme that I see are increased complexity and
the greater uncertainty about when a particular directory got a Full or
Differential backup.

What do you think of this scheme?

Thanks,

Mark
-- 
Mark Bergman                                           voice: 215-662-7310
mark.bergman AT uphs.upenn DOT edu                              fax: 
215-614-0266
http://www.cbica.upenn.edu/                
IT Technical Director, Center for Biomedical Image Computing and Analytics
Department of Radiology                         University of Pennsylvania
          PGP Key: http://www.cbica.upenn.edu/sbia/bergman 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] seeking feedback on dynamic fileset scheme, mark . bergman <=