Networker

Re: [Networker] Maximum number of save sessions?

2007-03-19 14:54:50
Subject: Re: [Networker] Maximum number of save sessions?
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 19 Mar 2007 14:46:04 -0400
Dave Mussulman wrote:
On Fri, Mar 16, 2007 at 09:06:31PM -0400, George Sinclair wrote:
I'm still testing out these LTO3 drives, and I've found - and it's probably no surprise - that in order to push the drives to a reasonable performance level (even 50-60 MB/s), I have to increase the target sessions to about 12. This wasn't the case before with the older LTO1s, where we typically used 4-5 target sessions, and we're getting decent performance from our SDLT-600 drives, running on the other snode, at 5 sessions each for a total of 20. But we're backing up directly over gigabit ethernet, so we don't have a front end VTL, or some such thing, where the network can be taken out of the equation - at lease not yet.

That's consistent with my deployment of LTO3.  I'm currently running
with two LTO3 drives and have parallelism set to 16 to get "decent"
utilization speed on the drives.  Whether one or both drives are
running, under load we're getting near the 100Mbps I would expect from a
single-gigE connection the server.  That was after tuning parallelism
down and cranking it back up again to find a sweet spot.


When you say parallelism of 16, do you mean the server parallelism, or are you referring to the actual number of target sessions you have set per drive? You have 2 drives, what do you have
the target sessions set to on each?
I found that once I hit 16 sessions, the LTO-3 drive could easily top 80 MB/s, but at 12, it averaged anywhere from 50-60 MB/s, hitting upwards of 73 MB/s sometimes. Of course, it screams when backups are run from the host itself (99 MB/sec or better), but with gigabit ethernet, I'm probably lucky to get 95 MB/s coming in to the snode period. The SCSI HBAs can handle the load (they're dual channel 320 MB/s each, total = 640 MB/s), and the snode host can handle it, but I can't push data fast enough to make the drives really burn unless I up the target sessions. If I do that, though, then I would quickly exceed the 32 session limit when carried out over 4 drives. That's a bummer. I know upping the target sessions will increase recovery time, and not sure if the faster read speed on LTO-3 drives would compensate?

I'd like to add 4 more drives to the library for a total of 8, mostly to allow cloning operations that might run in parallel with the backups. I thought having more drives would help out. Right now, with 4 drives, all the drives are typically in use once the backups are running, so cloning other tapes would have to be done during the day, which is fine but it could overlap into the evening and effect backups or vice versa since various drives that might otherwise be available would then be in use. But even with only 4 drives, it seems I would be limited to 8 sessions each, for a total of 32?

The total number of target sessions for all of your drives can exceed
the maximum parallelism set in the server config; the catch being that I
think Networker will only allow the server max parallelism.  You could
configure a scenario where some drives never get used, under load,
because a single drive is taking most of the allowed sessions.  (I'm
trying to recall if I've ever seen this enforced - I know that I've
streamed to three devices, each configured for 16 target sessions but I
didn't count to see if it was artificially limiting it to 32.  I'm
running 7.2.1 on the server.)

Yeah, I could crank the target sessions for any one drive well beyond the server parallelism - it allows that - but can the total number of allowed "running" target sessions on the storage node exceed 32 if your server will actually allow a parallelism of, say, 96? For example, if one snode is running 20 sessions, and the server has no attached libraries, then could the other snode run 50 save sets?
I know that even with a "high" level of parallelism to the drive, with
LTO3 restore speeds aren't too bad.  (Compared to DLT.)  That's an easy
thing to test in your environment and see if your configuration meets
your restore objectives.  I'm comfortable with where I have it
configured.

So would recovering data from an LTO-3 tape wherein multiplexing is set to, say, 12, versus from an LTO-1 where multiplexing is set to 4, maybe 5 be about as fast? Do the LTO-3 drives read data by a fast enough difference over the LTO-1 drives to
make up for the increased target sessions?

Looking at the larger picture, at FastEthernet speeds it's going to
take 10+ sessions at 10 or 11Mbps to max out an LTO3 drive.  Given that
you can probably only get those speeds on a full, you're going to need
more than 10 sessions, and of course, overall you're limited by the
server's networking speeds.  That won't scale by adding more drives --
in fact, it gets worse.  It sounds like you've figured all this out, but
are waiting for the obvious point: you should be backing up to
server-local disk first, and staging to tape.  That's the best
environment that makes sure data is available as quickly as LTO3 can
consume it, and that slow clients don't impact your tape environment (by
backing up slowly and shoe-shining the drive -- although that's supposed
to be better in LTO than DLT, and by monopolizing a tape mount for a
single slow session that could be used for something else.)  If you want
to add $36k (4*$9k, the approx. cost of an HP drive the last time I was
quoted,) to improve your backups environment, you're better off putting
some sort of staging disk or VTL in the middle.  It's a win-win design.

Of course, you'll need to look into EMC licensing nickel and diming that
out (for either advanced disk storage or more library options,) which
will eat into the cost.  Knowing I was switching to a D2D2T environment
is one of the reasons we're migrating to TSM -- those were features we
got "for free" and the data migration management through TSM is easier.
Just showing how different products do different things, TSM demands
smart D2D2T because it doesn't do multiplexing.  (Yet, it seems many TSM
deployments have large libraries -- hundreds of slots, tens of drives --
which are probably useful for other aspects of TSM.)

Dave

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER



--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -
To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>