Re: [Networker] Backups of Large clients 1+ TBs

On 01 Apr 08, at 1509, sunman1 wrote:

What is a standard configuration with NetWorker to backup a systemwith a large amount of data? We have systems with 600-1000MB filesystems, and MSSQL systems with 2-4TBs of data.

I'm backing several systems each with in excess of 20TB of data. Iback up around 2TB of incrementals per day, as well.

The main things are (a) you can't go faster than your tape drives (b)you can't go faster than your networking (c) you can't go faster thanyour disks and (d) you can't go faster than the CPUs in the client,the NSR server and the storage node.

The incrementals (ie level 1..9 or inc) all go to staging disk.Because an incremental spends more of its time considering what toback up than it does backing up (depending on your ratio of changed tonon-changed files) they behave terribly when sent to tape. I run aseparate `save' process on each file system, making sure that theparallelism level is at or below the number of CPU threads the clientcan cope with and hoping that I have enough disk bandwidth that Idon't need to manage the parallelism there.

When I come to drop those savesets to tape the effective parallelismis one: ie I can run dozens of incremental streams into a disk stagingarea without worrying about how slow or fast they are, and then duringproduction hours the following day I can take them to tape withnsrstage or nsrclone at full speed with little resource consumption.

The baselines go straight to tape. I used to do funky thingsinvolving changing the parallelism level to a lower level duringbaselines but now I don't bother: the limiting factor for me is thetapes (LTO3) and the networking (GigE over 20km), and I can pull ~70MB/sec out of the disk array and on to tape at pretty well anyparallelism level. The main benefit of parallelism=1 is thatrecovering a single filesystem will be faster and involve fewer tapes,but that's something you can sort out during a cloning phase if youare concerned about it.

If you have one filesystem per set of tapes you can also paralleliseover multiple tape drives during recovery, but these days few diskarrays will be able to write at the speed of multiple streams anyway.Weaving together the contents of multiple arrays on one tape might bea false economy, though.

I found some years ago that having the nsr server and the storage nodeon distinct systems was a performance benefit, presumably because thedatabase updates were kept away from the task of throwing data totape. I suspect that's less true today.


ian

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER