Networker

Re: [Networker] Long NDMP Backups

2008-02-28 18:35:38
Subject: Re: [Networker] Long NDMP Backups
From: Matthew Huff <mhuff AT OX DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 28 Feb 2008 18:28:23 -0500
I'd recommend you give it a try. I think the real-world numbers will be 
different that what you think they will be. We running 5-6 savesets in parallel 
and don't take up that much CPU. It could be that the jumbo frames and the 
reduced interrupts they generate help, but  I don't think it's as bad as you 
think. Of course, since you are running SATA, that could be a difference, but I 
wouldn't expect it to show up in the CPU stats.
 
We have a cluster 920c running 6.5x, so we aren't the state of the art as far 
as hardware/software.

________________________________

From: Yaron Zabary [mailto:yaron AT aristo.tau.ac DOT il]
Sent: Thu 2/28/2008 6:25 PM
To: Matthew Huff
Cc: EMC NetWorker discussion
Subject: Re: [Networker] Long NDMP Backups



Matthew Huff wrote:
> I'm glad you are sure, because we are doing it right now and it's working 
> well. Of course, we worked with Netapp engineering and this was their 
> suggestion. They strongly suggest not having any saveset over 400MB, 
> especially DAR restores can be extremely slow with very large savesets.
>
> Obviously every filer is different. 10 may be too much, 5 may be just right, 
> however doing everything in serial with just one saveset is going to be a 
> major bottleneck. Tuning your network by using jumbo frames and making sure 
> that the tcp sliding window is tuned is very important
>
> BTW, I've been doing NDMP backups with Netapp since before Legato supported 
> it, so I've got a bit of experience. We are currently backing up around 4TB. 
> We are using about 6 savesets per filer. The full backups are taking around 6 
> hours, except for one saveset that is taking around 11 (we are currently 
> migrating data around to break up the saveset). Once the migration is done, 
> we should be back to around 6 hours. BTW, we have been forced to kick the 
> backup off earlier than normal on a Friday (due to major power work being 
> done over a weekend) and even with all the backups running, the system never 
> had any issues even during the trading day.

    It took me some time to reply because our statistics were broken for
a few days. Anyhow, in the attached PNGs, you can see the impact of a
level 9 DSA backup on our 3050. CPU utilization was 5-10% with a traffic
of less than 100Mbps (a third of an LTO-2), due to the server's CPU
being a bottleneck. The backup started at 2:56 and ended at 5:26. The
volume has ~2Tb of data and 2.5M files on s SATA shelf. This saveset had
132GB of data.

   Now, if extrapolation works in this case, I would speculate that
pushing data into a single LTO-2 at full speed (no hardware compression)
would need just over 20% of the NetApp's CPU. If I was to run just five
such streams in parallel that would probably use the entire CPU. This
means that the filer will be very unresponsive for all other purposes.

>
>
>
> ----
> Matthew Huff       | One Manhattanville Rd
> OTA Management LLC | Purchase, NY 10577
> www.otaotr.com     | Phone: 914-460-4039
> aim: matthewbhuff  | Fax:   914-460-4139
>
> -----Original Message-----
> From: Yaron Zabary [mailto:yaron AT aristo.tau.ac DOT il]
> Sent: Tuesday, February 26, 2008 4:10 PM
> To: EMC NetWorker discussion; Matthew Huff
> Subject: Re: [Networker] Long NDMP Backups
>
> Matthew Huff wrote:
>> The main advantage is that it runs in parallel rather than in serial. For 
>> example, lets say your /vol/vol0 was 1TB, and had 10 qtrees each with 100MB 
>> in it. You could increase the client parallelism in legato to 10, and when 
>> you started the backup with a saveset of:
>> 
>> /vol/vol0/dir_a
>> /vol/vol0/dir_b
>> /vol/vol0/dir_c
>> /vol/vol0/dir_d
>> /vol/vol0/dir_e
>> /vol/vol0/dir_f
>> /vol/vol0/dir_g
>> /vol/vol0/dir_h
>> /vol/vol0/dir_i
>> /vol/vol0/dir_j
>> 
>> You would get 10 parallel backups each taking around 1/10 of what the volume 
>> backup would take. If you had the I/O and tape drive capacity, you would be 
>> reducing your backup time by 90%. Of course, that's an ideal situation.
>>
>
>    I am quite sure that this is a great way of killing your filer. Our
> 3050 can push at LTO-3 (~70MB/s) speed while consuming many CPU cycles
> (our CPU graphs are broken, so I cannot provide real numbers, but 20%
> seems about right). Considering this, running too many NDMP backups at
> once will make the filer unresponsive (assuming that it does any useful
> work, this might be unacceptable). It would not even get things to work
> any faster because if the filer is at 100% CPU utilization, it will
> become your bottleneck (it could even get you worse performance, as you
> will most likely have contention on your aggregate, volume or RAID group.
>


--

-- Yaron.

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER