Networker

Re: [Networker] Question on drive target sessions?

2006-11-27 16:56:38
Subject: Re: [Networker] Question on drive target sessions?
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 27 Nov 2006 16:37:14 -0500
Thanks to all who responded.

So it would seem, therefore, that my observations are not particular to my environment alone. Misery loves company, I guess. At least you guys managed to express it better than I did. Just to be clear, when I noticed this, I only had 1 appendable tape with the other marked read-only. One session was writing, and the others were pending. No sane reason it couldn't start sending some of those to the same tape. When I changed the other tape to appendable, all those pending sessions then woke up from their dream state and started writing, I think even before it loaded the second one, but it is frustrating that it took another writable tape to make this happen. Otherwise, it might not have happened until the session that was writing was completed. I've seen this.

As others have commented, it's as if it somehow sees that it wants another tape, allocates a certain amount and then just sticks with that, and then maybe after a certain period it might re-allocate or refresh or some such thing. Heck I can't explain it. Sheesh!

George


Davina Treiber wrote:

That's ludicrous. That's like saying that if the user has configured a slightly strange setting it must be wrong so they'll overrule it.

That is NOT the sort of fudged feature we want in NetWorker.


Landwehr, Jerome wrote:

This is something I too have seen at version 7.3.2

After months of having a case open, this is the (infuriating) response I finally got:


Here is the details of the target sessions behavior and how it is expected to work in NetWorker 7.3.x. Hope this explains the changes in target sessions behavior comparing to the previous version of NetWorker and related concern from Jerry. Background: Prior to the changes made to target sessions behavior, we had many complains from different customers about one or few devices are being hammered when other devices sitting idle doing nothing (eligible/enabled devices). This was only an issue when we had different target sessions settings for each device, and say the first device had it set to 10 and others set to 1 or 2. in this scenario by getting 8 save sessions coming to the networker storage node the first device (selected as per device selection criteria) was hosting them all 8 sessions and remaining devices sitting idle. Due to couple of enhancement request to address above issue for better utilization of the eligible devices, a fix has been implemented to query the respective storage node and it's eligible devices, then select the ¿lowest¿ target session setting amongst the devices and use this number as the target session value for all devices for the backup in order for better distribution and load balancing on incoming save sessions. Suggestions While attempting to utilize more devices (if possible) is a good thing, I would ask Jerry however to make the target sessions value more close to minimize the request for additional volumes and slightly improve the performance. Values from 1 to 10 as per above explanation could have negative impact on resource utilization and as per above explanation and in this configuration value 10 not taking effect eventually because of the lowest number. So either setting the target session value in a range like 4-6 is recommended (if it is really required to be different) or making some changes to the configuration (group, client¿.etc) to achieve better resource allocation also performance. As we have mentioned the Escalation LGTpa89210 is open on device request issue and the fix is not verified yet. This escalation is however is a side effect of other settings especially target sessions and with making suggested changes this will not be a problem.

So rather than telling customers to fix the target sessions to their liking, they 'fixed' the software to find and use the lowest target sessions available on any device for the storage node and ignore the user setting!

this broke my environment since I have two jukeboxes connected to the storage nodes: a VTL jukebox with target sessions set to 1 for all devices, while the LTOII jukebox has significantly more target sessions on each of it's devices

the result is that whenever a LTOII savegroup kicks off, I get a hundred messages emailed about all the storage nodes wanting more tapes than they need... nice!

Jerry

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On Behalf Of Clark, Patricia
Sent: Friday, November 24, 2006 12:40 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] Question on drive target sessions?

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of George Sinclair
Sent: Friday, November 24, 2006 12:30 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Question on drive target sessions?

This question concerns why NetWorker sometimes fails to start multiple
sessions on a device. I don't see this that often, but here's an
example:

I was running a group with 1 client. The client has 6 save sets. I start
the group (level full), and there's only one writable volume for the
given pool. The other is set Read-only. NetWorker loads the tape but
only sends one save set to the tape. The rest are pending. This seems
odd because the target sessions on the device was set to 5, the client
parallelism is 6 and the group parallelism was set to 0.
Anyway, it just sits there running that silly save set for the longest
time, and the rest are doing nothing. Meanwhile it keeps asking for
another writable tape. Obviously, it wants to kick of the rest, but why
the heck can't it just run some of them on the same device it's writing
the other save set to? There were no other groups running, so there's
plenty of room.

The tape library has 4 drives, and the server parallelism is set to 20
(5 per device). We're running 7.2.2 on Solaris server. The storage node
is running Linux (NW 7.2.2) and manages the library.

Now, I finally set the other tape (was read-only) to appendable. Within
30 seconds,  NetWorker then  stops requesting a  writable tape (issues
event cleared message) and then starts to load the other tape. However, before it loads it it then sends 4 of the remaining 5 save sets
to the device that the first one was writing to.  After mounting the
other volume it sends the 6th and last save set to that device.

I suppose it's conceivable that it was just coincidence and that it took
it that long to figure out what it wanted to back up? I didn't think it
generally mattered when running fulls even if a lot of files are
involved? Anyway, assuming not, why the heck did it have to wait until
it had another appendable tape to send more save sets to the first one
if the first device was only running 1 session?

Does anyone  else see this behavior ever? I ran 20 other groups today,
all with various number of clients, and I never saw this, so it doesn't
occur most of the time.

Thanks.

George

--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -

I've seen this behaviour just recently, v7.3.2 on linux server.
Initially there were several savesets processing in parallel to a single
tape where as they completed none of the remaining sets waiting to
process kicked off leaving just one running.  I did not provide another
tape and when the single save set completed, others were launched to the
same tape until the backup was completed.  I'd manually kicked off the
group since it had not completed its normal incremental (due to
something else) and was monitoring it at the time.  Since I'm running
v7.3.2, I'm expecting odd behaviour.  I guess this is something that got
carried forward.


Patti Clark
Unix System Administrator - RHCT
Office of Scientific and Technical Information

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
--
This email has been verified as Virus free
Virus Protection and more available at http://www.plus.net


To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER




--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -
To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER