BackupPC-users

Re: [BackupPC-users] experiences with very large pools?

2010-02-19 10:36:08
Subject: Re: [BackupPC-users] experiences with very large pools?
From: dan <dandenson AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Fri, 19 Feb 2010 08:34:34 -0700
you would need to move up to 15K rpm drives to have a very large array and the cost will grow exponentially trying to get such a large array.

as Les said, look at a zfs array with block level dedup.  I have a 3TB setup right now and I have some been running a backup against a unix server and 2 linux servers in my main office here to see how the dedup works

opensolaris:~$ zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
rpool      74G  5.77G  68.2G     7%  1.00x  ONLINE  -
storage  3.06T   1.04T  2.02T     66%  19.03x  ONLINE  -

this is just rsync(3) pulling data over to a directory
/storage/host1 which is a zfs fileset off pool storage for each host.

my script is very simple at this point

zfs snapshot storage/host1@`date +%Y.%m.%d-%M.%S`
rsync -aHXA --exclude-from=/etc/backups/host1excludes.conf host1:/ /storage/host1

to build the pool and fileset
format #gives all available disks
zpool status will tell you what disks are already in pools
zpool create storage mirror disk1 disk2 disk3 etc etc spare disk11 cache disk12 log disk13
#cache disk is a high RPM disk or SSD, basically a massive buffer for IO caching,
#log is a transaction log and doesnt need a lot of size but IO is good so high RPM or smaller SSD
#cache and log are optional and are mainly for performance improvements when using slower storage drives like my 7200RPM SATA drives
zfs create -o dedup=on (or dedup=verify) -o compression=on -o storage/host1

dedup is very very good for writes BUT requires a big CPU.  dont re-purpose your old P3 for this.
compression is actually going to help your write performance assuming you have a fast CPU.  it will reduce the IO load and zfs will re-order writes on the fly.
dedup is all in-line so it reduces IO load for anything with common blocks.  it is also block level not file level so a large file with slight changes will get deduped.

dedup+compression really needs a fast dual core or quad core.

if you look at my zpool list above you can see my dedup at 19x and usage at 1.04 which effectively means Im getting 19TB in 1TB worth of space.  my servers have relatively few files that change and the large files get appended to so I really only store the changes.

snapshots are almost instant and can be browsed at /storage/host1/.zfs/snapshot/ and are labeled by the @`date xxx` so i get folders for the dates.  these are read only snapshots and can be shared via samba or nfs.
zfs list -t snapshot

opensolaris:/storage/host1/.zfs/snapshot# zfs list -t snapshot
NAME                            
rpool/ROOT/opensolaris@install   270M      -  3.26G  -
storage/[email protected]

zfs set sharesmb=on storage/[email protected]
-or-
zfs set sharenfs=on storage/[email protected]


if you dont want to go pure opensolaris then look at nexenta.  it is a functional opensolaris-debian/ubuntu hybrid with ZFS and it has dedup.  it does not currently share via iscsi so keep that in mind.  I believe it also uses a full samba package for samba shares while opensolaris can use the native CIFS server which is faster than samba.

opensolaris can also join Active Directory. You also need to extend your AD schema.  If you do you can give a priviliged use UID and GUI mappings in AD and then you can access the windows1/C$ shares.  I would create a backup user and add them to restricted groups in GP to be local administrators on the machines (but not domain admins).  You would probably want to figure out how to do a VSS and rsync that over instead of the active filesystem because you will get tons of file locals if you dont.

good luck




On Fri, Feb 19, 2010 at 6:51 AM, Les Mikesell <lesmikesell AT gmail DOT com> wrote:
Ralf Gross wrote:
>
> I think I've to look for a different solution, I just can't imagine a
> pool with > 10 TB.

Backuppc's usual scaling issues are with the number of files/links more than
total size, so the problems may be different when you work with huge files.  I
thought someone had posted here about using nfs with a common archive and
several servers running the backups but I've forgotten the details about how he
avoided conflicts and managed it.  Maybe this would be the place to look at
opensolaris with zfs's new block-level de-dup and a simpler rsync copy.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/