Networker

Re: [Networker] Backups of Large clients 1+ TBs

2008-04-03 13:07:09
Subject: Re: [Networker] Backups of Large clients 1+ TBs
From: Ian G Batten <ian.batten AT UK.FUJITSU DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 3 Apr 2008 17:54:19 +0100
On 01 Apr 08, at 1509, sunman1 wrote:
What is a standard configuration with NetWorker to backup a system with a large amount of data? We have systems with 600-1000MB file systems, and MSSQL systems with 2-4TBs of data.

I'm backing several systems each with in excess of 20TB of data. I back up around 2TB of incrementals per day, as well.

The main things are (a) you can't go faster than your tape drives (b) you can't go faster than your networking (c) you can't go faster than your disks and (d) you can't go faster than the CPUs in the client, the NSR server and the storage node.

The incrementals (ie level 1..9 or inc) all go to staging disk. Because an incremental spends more of its time considering what to back up than it does backing up (depending on your ratio of changed to non-changed files) they behave terribly when sent to tape. I run a separate `save' process on each file system, making sure that the parallelism level is at or below the number of CPU threads the client can cope with and hoping that I have enough disk bandwidth that I don't need to manage the parallelism there.

When I come to drop those savesets to tape the effective parallelism is one: ie I can run dozens of incremental streams into a disk staging area without worrying about how slow or fast they are, and then during production hours the following day I can take them to tape with nsrstage or nsrclone at full speed with little resource consumption.

The baselines go straight to tape. I used to do funky things involving changing the parallelism level to a lower level during baselines but now I don't bother: the limiting factor for me is the tapes (LTO3) and the networking (GigE over 20km), and I can pull ~70MB/ sec out of the disk array and on to tape at pretty well any parallelism level. The main benefit of parallelism=1 is that recovering a single filesystem will be faster and involve fewer tapes, but that's something you can sort out during a cloning phase if you are concerned about it.

If you have one filesystem per set of tapes you can also parallelise over multiple tape drives during recovery, but these days few disk arrays will be able to write at the speed of multiple streams anyway. Weaving together the contents of multiple arrays on one tape might be a false economy, though.

I found some years ago that having the nsr server and the storage node on distinct systems was a performance benefit, presumably because the database updates were kept away from the task of throwing data to tape. I suspect that's less true today.

ian

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER