Bacula-users

Re: [Bacula-users] Starting again with my bacula config...

2014-06-08 19:47:09
Subject: Re: [Bacula-users] Starting again with my bacula config...
From: Steven Haigh <netwiz AT crc.id DOT au>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 09 Jun 2014 09:41:15 +1000
On 09/06/14 01:55, Phil Stracchino wrote:
> On 06/08/14 06:09, Steven Haigh wrote:
>> I do believe this is one of the biggest shortcomings of Bacula...
>> The fact it is job based vs file based removes a lot of
>> flexibility.
> 
>> If I understand things properly, for a VirtualFull will: 1) Require
>> all volumes as stated below, and; 2) Require enough space to write
>> the entire backup out again; and 3) is unable to keep a copy of a
>> file forever if it is never changed.
> 
>> Instead, after the purge date, the file is deleted and
>> retransferred - unless it is done by a VirtualFull - which still
>> has the problems of #1 and #2 above.
> 
> I'm not sure I understand your objections here.  Given arbitrarily
> large disk space, you can set your job and file retention to fifty
> years if you so choose.  Bacula will keep your initial full backup as
> long as you tell it to.  But you can't get by on just an initial full
> backup and ten years worth of daily incrementals.  Doing a restore
> would require searching thousands of jobs to build the file tree.
> 
> Name me one backup solution which does NOT have to re-transfer a file
> if you delete the original backup.

I think I misled with the terminology here. Coming from the TSM world,
everything is file based. The database tracks individual files in
volumes that can be from any node (client). If the file is still
present, it never deletes the file from the backup. If the file is
updated, it marks it as a historical version and keeps it for X
revisions or Y days depending on config.

As this is done *per file*, it means files never 'expire' as such unless
they are deleted and beyond the deleted file retention. It can do this
effectively because TSM does things *per file* and not per job.

This means with a few volumes (I had 40Gb always on, 3 x 1Tb eSATA
drives) do incrementals forever and still be guaranteed a consistent
result on restore - as long as you had all the volumes.

To handle volumes that have a ton of deleted but a few current files,
TSM does a reclamation where it moves those files to another storage
pool and then migrates them back to a more recent volume (in the case of
tapes) or continues with random access forever in the case of File volumes.

My setup with TSM was that clients transfer all their incremental data
to a 40Gb 'temporary' pool if you like - then when that starts to get
full it was migrated to eSATA storage - again - all at the file level -
not job level. The eSATA drives can then be taken offline again and not
be required until either another migration or a restore requires them.

> VirtualFull jobs allow you to keep replacing that original Full backup
> without having to re-transfer all of the files from the client again.
>  Of *xcourse*you have to re-copy them from the old job; but then the
> old job can be purged.  Yes, you need the space for both to create the
> VirtualFull; but you don't have to keep both around after it's completed.

Yeah, this is more or less what I understood. Not a simple task - but
workable. Either way, I think I'll have to rethink how I do things.

> If you want to never have to do any job maintenance, never re-copy a
> file that hasn't changed, never use any extra disk space, etc, etc,
> perhaps you should be just using rsync?  Of course, de-duplicating and
> creating hard links would become your problem, and then you have to be
> careful not to update all copies of a hard-linked file when one
> original updates...

I'm having a bit of a tinker with just that now... I've attached the
script as I've currently written it for interest. The script runs on a
VM and stores all data in /backups/$HOST/[0-7]/ on a compressed btrfs
volume.

I think then I can also use the eSATA drives to rsync from this host to
the eSATA drive and have all 7 days revision of the system. You could
easily extend this number of days with very little extra disk space
requirement - as everything is hard linked, only the '0' folder has all
files - 1-7 have 'changes':
        # du -hs *
        7.3G    0
        91M     1

The advantage though would mean that EVERY directory can be used as a
'restore point' to give an exact copy of the system at that point in
time. You could probably do this weekly instead in some cases and have
several months worth of changes for a small increase in space requirements.

That being said - *my* requirements for backups are rather minimal. The
entire setup on my part is less than 200Gb when stored this way.

-- 
Steven Haigh

Email: netwiz AT crc.id DOT au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299

Attachment: rsync-backup
Description: Text document

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users