I set it up in a small test environment,
and started backing up three clients into a deduplication pool
exclusively. 2 windows and 1 linux system
The first pass wasn't that impressive,
maybe a 5% to 10% de-dup ratio and it took a bit longer than just
streaming that same data to tape or disk.
The second pass had a 90% deduplication
ratio, mostly because just 1 week had passed since the first full
backup and not every file changed.
After three weeks, I had about 300GB of
data deduplicated down into about 90GB of disk space. The kbytes total
reported from the catalog said 300G, and df -k said 90G.
The data content of the three systems is
typical for a user workstation. Email, photos, miscellaneous files.
De-dup let me put many versions of those same files into a backup
without actually having many copies of that file spinning on disk.
And then the disk holding the de-duplicated
data developed a bunch of bad sectors and I lost it all. Once I
rebuild it, I'll check out the DR process for protecting your de-dup
database and files.
I also want to test client side
de-duplication to see if that helps stream data compared to media
server de-duplication alone. The media/master server is a quad core
with 8g ram, and the clients are a desktop and a laptop. After seeing
it run and observing the space savings it generates, I think it is a
very creative way to solve some (not all) problems. It is definitely
not a "set it and forget it" technology. You still need to monitor its
utilization similar to how you would monitor basic disk or tape usage.
-Jon