BackupPC-users

Re: [BackupPC-users] experiences with very large pools?

2010-02-19 11:29:55
Subject: Re: [BackupPC-users] experiences with very large pools?
From: Ralf Gross <Ralf-Lists AT ralfgross DOT de>
To: backuppc-users AT lists.sourceforge DOT net
Date: Fri, 19 Feb 2010 17:28:55 +0100
dan schrieb:
> you would need to move up to 15K rpm drives to have a very large array and
> the cost will grow exponentially trying to get such a large array.
> 
> as Les said, look at a zfs array with block level dedup.  I have a 3TB setup
> right now and I have some been running a backup against a unix server and 2
> linux servers in my main office here to see how the dedup works
> 
> opensolaris:~$ zpool list
> NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> rpool      74G  5.77G  68.2G     7%  1.00x  ONLINE  -
> storage  3.06T   1.04T  2.02T     66%  19.03x  ONLINE  -
> 
> this is just rsync(3) pulling data over to a directory
> /storage/host1 which is a zfs fileset off pool storage for each host.
> 
> my script is very simple at this point
> 
> zfs snapshot storage/host1@`date +%Y.%m.%d-%M.%S`
> rsync -aHXA --exclude-from=/etc/backups/host1excludes.conf host1:/
> /storage/host1
> 
> to build the pool and fileset
> format #gives all available disks
> zpool status will tell you what disks are already in pools
> zpool create storage mirror disk1 disk2 disk3 etc etc spare disk11 cache
> disk12 log disk13
> #cache disk is a high RPM disk or SSD, basically a massive buffer for IO
> caching,
> #log is a transaction log and doesnt need a lot of size but IO is good so
> high RPM or smaller SSD
> #cache and log are optional and are mainly for performance improvements when
> using slower storage drives like my 7200RPM SATA drives
> zfs create -o dedup=on (or dedup=verify) -o compression=on -o storage/host1
> 
> dedup is very very good for writes BUT requires a big CPU.  dont re-purpose
> your old P3 for this.
> compression is actually going to help your write performance assuming you
> have a fast CPU.  it will reduce the IO load and zfs will re-order writes on
> the fly.
> dedup is all in-line so it reduces IO load for anything with common blocks.
> it is also block level not file level so a large file with slight changes
> will get deduped.
> 
> dedup+compression really needs a fast dual core or quad core.
> 
> if you look at my zpool list above you can see my dedup at 19x and usage at
> 1.04 which effectively means Im getting 19TB in 1TB worth of space.  my
> servers have relatively few files that change and the large files get
> appended to so I really only store the changes.
> 
> snapshots are almost instant and can be browsed at
> /storage/host1/.zfs/snapshot/ and are labeled by the @`date xxx` so i get
> folders for the dates.  these are read only snapshots and can be shared via
> samba or nfs.
> zfs list -t snapshot
> 
> opensolaris:/storage/host1/.zfs/snapshot# zfs list -t snapshot
> NAME
> rpool/ROOT/opensolaris@install   270M      -  3.26G  -
> storage/[email protected]
> 
> zfs set sharesmb=on storage/[email protected]
> -or-
> zfs set sharenfs=on storage/[email protected]
> 
> 
> if you dont want to go pure opensolaris then look at nexenta.  it is a
> functional opensolaris-debian/ubuntu hybrid with ZFS and it has dedup.  it
> does not currently share via iscsi so keep that in mind.  I believe it also
> uses a full samba package for samba shares while opensolaris can use the
> native CIFS server which is faster than samba.
> 
> opensolaris can also join Active Directory. You also need to extend your AD
> schema.  If you do you can give a priviliged use UID and GUI mappings in AD
> and then you can access the windows1/C$ shares.  I would create a backup
> user and add them to restricted groups in GP to be local administrators on
> the machines (but not domain admins).  You would probably want to figure out
> how to do a VSS and rsync that over instead of the active filesystem because
> you will get tons of file locals if you dont.
> 
> good luck

Thanks for you detailed reply. I'll have a look at nexenta, right now
www.nexenta.org seems to be down.

Ralf

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/