Networker

Re: [Networker] Long NDMP Backups

2008-03-05 14:23:58
Subject: Re: [Networker] Long NDMP Backups
From: Troy Kutil <Troy.Kutil AT MILWAUKEETOOL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 5 Mar 2008 13:19:28 -0600
I just talked with Netapp today about this. You need to specify each path 
for each qtree. 
By the way, "All" does not work in the latest version of Networker. 
Hopefully, they get that fixed or you will be typing a lot of savesets.

Troy Kutil
Phone 262-783-8289
Fax 262-373-5689
Milwaukee Electric Tool



"Yakimowicz, Brian" <BYakimowicz AT COLLEGEBOARD DOT ORG> 
Sent by: EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
03/05/2008 01:14 PM
Please respond to
EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>; Please respond 
to
"Yakimowicz, Brian" <BYakimowicz AT COLLEGEBOARD DOT ORG>


To
NETWORKER AT LISTSERV.TEMPLE DOT EDU
cc

Subject
Re: [Networker] Long NDMP Backups






When you say 5-6 savesets in parallel, do you mean you specifically name
the qtree as the saveset?  This seems like it would be ideal if you have
a handful of volumes/qtrees.. But we are around 150 volumes and 400
qtrees per filer.  That number can change by -/+ 10 any day, so
logistically we have found it to be to difficult to name each qtree as a
saveset.  That being said we are forced to use "all" so our drives sit
idle for hours while the filesystem is crawled.  Any one know a way
around this?


Thanks,
brian 

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Matthew Huff
Sent: Thursday, February 28, 2008 6:28 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] Long NDMP Backups

I'd recommend you give it a try. I think the real-world numbers will be
different that what you think they will be. We running 5-6 savesets in
parallel and don't take up that much CPU. It could be that the jumbo
frames and the reduced interrupts they generate help, but  I don't think
it's as bad as you think. Of course, since you are running SATA, that
could be a difference, but I wouldn't expect it to show up in the CPU
stats.
 
We have a cluster 920c running 6.5x, so we aren't the state of the art
as far as hardware/software.

________________________________

From: Yaron Zabary [mailto:yaron AT aristo.tau.ac DOT il]
Sent: Thu 2/28/2008 6:25 PM
To: Matthew Huff
Cc: EMC NetWorker discussion
Subject: Re: [Networker] Long NDMP Backups



Matthew Huff wrote:
> I'm glad you are sure, because we are doing it right now and it's
working well. Of course, we worked with Netapp engineering and this was
their suggestion. They strongly suggest not having any saveset over
400MB, especially DAR restores can be extremely slow with very large
savesets.
>
> Obviously every filer is different. 10 may be too much, 5 may be just 
> right, however doing everything in serial with just one saveset is 
> going to be a major bottleneck. Tuning your network by using jumbo 
> frames and making sure that the tcp sliding window is tuned is very 
> important
>
> BTW, I've been doing NDMP backups with Netapp since before Legato
supported it, so I've got a bit of experience. We are currently backing
up around 4TB. We are using about 6 savesets per filer. The full backups
are taking around 6 hours, except for one saveset that is taking around
11 (we are currently migrating data around to break up the saveset).
Once the migration is done, we should be back to around 6 hours. BTW, we
have been forced to kick the backup off earlier than normal on a Friday
(due to major power work being done over a weekend) and even with all
the backups running, the system never had any issues even during the
trading day.

    It took me some time to reply because our statistics were broken for
a few days. Anyhow, in the attached PNGs, you can see the impact of a
level 9 DSA backup on our 3050. CPU utilization was 5-10% with a traffic
of less than 100Mbps (a third of an LTO-2), due to the server's CPU
being a bottleneck. The backup started at 2:56 and ended at 5:26. The
volume has ~2Tb of data and 2.5M files on s SATA shelf. This saveset had
132GB of data.

   Now, if extrapolation works in this case, I would speculate that
pushing data into a single LTO-2 at full speed (no hardware compression)
would need just over 20% of the NetApp's CPU. If I was to run just five
such streams in parallel that would probably use the entire CPU. This
means that the filer will be very unresponsive for all other purposes.

>
>
>
> ----
> Matthew Huff       | One Manhattanville Rd
> OTA Management LLC | Purchase, NY 10577
> www.otaotr.com     | Phone: 914-460-4039
> aim: matthewbhuff  | Fax:   914-460-4139
>
> -----Original Message-----
> From: Yaron Zabary [mailto:yaron AT aristo.tau.ac DOT il]
> Sent: Tuesday, February 26, 2008 4:10 PM
> To: EMC NetWorker discussion; Matthew Huff
> Subject: Re: [Networker] Long NDMP Backups
>
> Matthew Huff wrote:
>> The main advantage is that it runs in parallel rather than in serial.
For example, lets say your /vol/vol0 was 1TB, and had 10 qtrees each
with 100MB in it. You could increase the client parallelism in legato to
10, and when you started the backup with a saveset of:
>> 
>> /vol/vol0/dir_a
>> /vol/vol0/dir_b
>> /vol/vol0/dir_c
>> /vol/vol0/dir_d
>> /vol/vol0/dir_e
>> /vol/vol0/dir_f
>> /vol/vol0/dir_g
>> /vol/vol0/dir_h
>> /vol/vol0/dir_i
>> /vol/vol0/dir_j
>> 
>> You would get 10 parallel backups each taking around 1/10 of what the
volume backup would take. If you had the I/O and tape drive capacity,
you would be reducing your backup time by 90%. Of course, that's an
ideal situation.
>>
>
>    I am quite sure that this is a great way of killing your filer. Our

> 3050 can push at LTO-3 (~70MB/s) speed while consuming many CPU cycles

> (our CPU graphs are broken, so I cannot provide real numbers, but 20% 
> seems about right). Considering this, running too many NDMP backups at

> once will make the filer unresponsive (assuming that it does any 
> useful work, this might be unacceptable). It would not even get things

> to work any faster because if the filer is at 100% CPU utilization, it

> will become your bottleneck (it could even get you worse performance, 
> as you will most likely have contention on your aggregate, volume or
RAID group.
>


--

-- Yaron.

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or via RSS at
http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type 
"signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>