Bacula-users

[Bacula-users] Bacula tape format vs. rsync on deduplicated file systems

2010-05-27 17:20:34
Subject: [Bacula-users] Bacula tape format vs. rsync on deduplicated file systems
From: Robert LeBlanc <robert AT leblancnet DOT us>
To: "bacula-users (anglais)" <bacula-users AT lists.sourceforge DOT net>, bacula-devel <bacula-devel AT lists.sourceforge DOT net>, lessfs AT googlegroups DOT com
Date: Thu, 27 May 2010 15:18:12 -0600
Spurred by the discussion last month on the Bacula mailing list about needing a new archive format when storing Bacula data on disks, I decided to do a little test.

The test set-up:
* One lightly used file system ~30GB of mostly unchanging data, a good mix of documents, executables, images, videos etc.
* Snapshot the file system using LVM, then use rsync and Bacula to backup the data.
* The deduplication file system of choice was lessfs on EXT4 since it is available to anyone.
* Three different block sizes for lessfs (16K, 32K and 64K) to see how much difference there would be between each block size
* Bacula archive size was set to 10G so that one backup would span multiple volumes, very common in our environment
* Test the final results with our DataDomain box

I took six snapshots of the original file system over the course of about 2.5 weeks, I then rsynced a copy to a non deduplication file system, then rsynced it to each of the three less files systems (one for each block size) in a folder specified by the date. This would cause rsync to create a new copy of the data each time since each rsync was in it's own folder (named by the date of the rsync) instead of just syncing the changes after the first time. Bacula would do a full backup of the file system snapshot every time as well to a non deduplication file system and those were rsynced to three lessfs files systems without any folder structure so that only one copy of a volume would exist on each lessfs file system. At the conclusion of the test, I decided to dump the final raw rsync and bacula data onto our DataDomain box as a comparison.

backup.png

This chart shows that using the sync method, the data's compression grew in almost a linear fashion, while the Bacula data stayed close to 1x compression. My suspicion is that since the Bacula tape format inserts job information regularly into the stream file and lessfs uses a fixed block size, lessfs is not able to find much unique data in the Bacula stream. Although Data Domain's variable block size feature allows it much better compression of Bacula data, rsync still achieved an almost 2x greater compression over Bacula.

In conclusion, lessfs is a great file system and can benefit from variable block sizes, if it can be added, for both regular data and Bacula data. Bacula could also greatly benefit by providing a format similar to a native file system on lessfs and even a good benefit on DataDomain.

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University
------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users