Bacula-users

[Bacula-users] option: detect file change by hash value only

2008-09-11 05:10:38
Subject: [Bacula-users] option: detect file change by hash value only
From: Alex Ehrlich <Alex.Ehrlich AT mail DOT ee>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>, bacula-devel <bacula-devel AT lists.sourceforge DOT net>
Date: Thu, 11 Sep 2008 12:10:18 +0300
Hello,

Does anybody know is it possible to make Bacula detect file modification
(and hence the need of backup) only by the file hash value, ignoring
file modification datetime?

The business problem is MS Outlook. This program has an option of moving
"old" mails to a separate storage file (usually archive.pst, you can
create lots of them with different names), and it could be good for
archiving. *But* every time you open the archive file for *viewing*
Outlook changes the file modification datetime (if you make the file
read-only Outlook will not open it at all).
Such "Outlook archive" files are often 1-10Gb in size and run into daily
backups without actually being changed. Adjusting Bacula seems a more
promising approach than adjusting MS ;-/. I currently have only 400Gb
for backup data and 10% of it is already used up in about 2 weeks with
just two clients actively using Outlook.

If no such an option exists and it is not a very easy task to add then I
     have to make a pretty specific backup plan for Outlook data...

A side question on how Bacula calculates file hash values. Below goes
information on one archive.pst file, it shows hash values, too.
.
jobid   type    end time                hash            
154     D       2008-08-31 03:18:07     KQfc/jt/2uvSth4TQsWyHOqr/2o
148     I       2008-08-30 03:18:08     KQfc/jt2/uvSth4TQsWyHOqr/2o
141     I       2008-08-29 03:18:25     vtWZTmFzsv3nzYEx+p4KEtzYVX0
134     I       2008-08-28 03:18:17     Qj1F4i+W3M6cHFapKmoiUFBGhiY
127     I       2008-08-27 03:18:14     l68AAavutPtKrddMWZlGrnR+3bs
120     I       2008-08-26 03:16:11     99cj5sVS95HfotnlkiwdskvThOY
106     D       2008-08-24 03:18:21     /m2GsdZ80c2XQOZjpu2ku1jmESw
  99    I       2008-08-23 03:18:18     /m2GsdZ80c2XQOZjpu2ku1jmESw
  91    I       2008-08-22 03:18:59     ueo5zj7jDnuTTQAceSLWWRMrvT8
  67    I       2008-08-19 03:18:25     tzlDSsAVXZ27F0LeC46ddz3GZjo
  46    D       2008-08-17 03:18:02     Le5jXuX5v2sTVW5oKwNmUXkpYvE
  38    I       2008-08-16 03:17:58     Le5jXuX5v2sTVW5oKwNmUXkpYvE
  29    I       2008-08-15 03:11:20     qsA4Mg8QICjs8CCVzTmBjyuZElk
   4    F       2008-08-12 18:30:13     lprr+TlhFseh3iSGsVVjj0lyD14
.
You can see that every time a Differential backup is run the hash value
is the same as for the previous day's Incremental one. However, for each
Incremental backup the hash value is different. I cannot readily agree
that it is just a co-incidence and that the archive.pst did actually
change. So is there any more input to the hash function but the file
contents, and if yes then is there actually any good reason for it?

Alex






-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users