Networker

Re: [Networker] Question on drive target sessions?

2006-11-27 13:17:51
Subject: Re: [Networker] Question on drive target sessions?
From: "Clark, Patricia" <Clarkp AT OSTI DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 27 Nov 2006 13:14:52 -0500
So, to fix one problem, they created another?  I have a single server/node with 
a tape library with 3 tape drives.  All drives have the same setting for target 
sessions, set to 6 each.  I will have 2 drives sitting idle because I don't 
have another tape in the tape pool available and only one save set backing up 
while many more save sets sit waiting for the one save set to complete before 
launching any more to a tape and drive that are more than capable of handling 
the load?  What kind of resource management is that?  If it looks like a bug, 
sounds like a bug, and behaves like a bug, it is NOT a feature!


Patti

-----Original Message-----
From: Landwehr, Jerome [mailto:jlandweh AT harris DOT com] 
Sent: Monday, November 27, 2006 12:49 PM
To: EMC NetWorker discussion; Clark, Patricia
Subject: RE: [Networker] Question on drive target sessions?

This is something I too have seen at version 7.3.2

After months of having a case open, this is the (infuriating) response I 
finally got:


Here is the details of the target sessions behavior and how it is expected to 
work in NetWorker 7.3.x. Hope this explains the changes in target sessions 
behavior comparing to the previous version of NetWorker and related concern 
from Jerry.    

Background:    Prior to the changes made to target sessions behavior, we had 
many complains from different customers about one or few devices are being 
hammered when other devices sitting idle doing nothing (eligible/enabled 
devices). This was only an issue when we had different target sessions settings 
for each device, and say the first device had it set to 10 and others set to 1 
or 2. in this scenario by getting 8 save sessions coming to the networker 
storage node the first device (selected as per device selection criteria) was 
hosting them all 8 sessions and remaining devices sitting idle. 

Due to couple of enhancement request to address above issue for better 
utilization of the eligible devices, a fix has been implemented to query the 
respective storage node and it's eligible devices, then select the ¿lowest¿ 
target session setting amongst the devices and use this number as the target 
session value for all devices for the backup in order for better distribution 
and load balancing on incoming save sessions.        

Suggestions    While attempting to utilize more devices (if possible) is a good 
thing, I would ask Jerry however to make the target sessions value more close 
to minimize the request for additional volumes and slightly improve the 
performance. Values from 1 to 10 as per above explanation could have negative 
impact on resource utilization and as per above explanation and in this 
configuration value 10 not taking effect eventually because of the lowest 
number. So either setting the target session value in a range like 4-6 is 
recommended (if it is really required to be different) or making some changes 
to the configuration (group, client¿.etc) to achieve better resource allocation 
also performance.    

As we have mentioned the Escalation LGTpa89210 is open on device request issue 
and the fix is not verified yet. This escalation is however is a side effect of 
other settings especially target sessions and with making suggested changes 
this will not be a problem.    



So rather than telling customers to fix the target sessions to their liking, 
they 'fixed' the software to find and use the lowest target sessions available 
on any device for the storage node and ignore the user setting!

this broke my environment since I have two jukeboxes connected to the storage 
nodes: a VTL jukebox with target sessions set to 1 for all devices, while the 
LTOII jukebox has significantly more target sessions on each of it's devices

the result is that whenever a LTOII savegroup kicks off, I get a hundred 
messages emailed about all the storage nodes wanting more tapes than they 
need...  nice!

Jerry

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On 
Behalf Of Clark, Patricia
Sent: Friday, November 24, 2006 12:40 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] Question on drive target sessions?

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On 
Behalf Of George Sinclair
Sent: Friday, November 24, 2006 12:30 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Question on drive target sessions?

This question concerns why NetWorker sometimes fails to start multiple sessions 
on a device. I don't see this that often, but here's an
example:

I was running a group with 1 client. The client has 6 save sets. I start the 
group (level full), and there's only one writable volume for the given pool. 
The other is set Read-only. NetWorker loads the tape but only sends one save 
set to the tape. The rest are pending. This seems odd because the target 
sessions on the device was set to 5, the client parallelism is 6 and the group 
parallelism was set to 0.
Anyway, it just sits there running that silly save set for the longest time, 
and the rest are doing nothing. Meanwhile it keeps asking for another writable 
tape. Obviously, it wants to kick of the rest, but why the heck can't it just 
run some of them on the same device it's writing the other save set to? There 
were no other groups running, so there's plenty of room.

The tape library has 4 drives, and the server parallelism is set to 20
(5 per device). We're running 7.2.2 on Solaris server. The storage node is 
running Linux (NW 7.2.2) and manages the library.

Now, I finally set the other tape (was read-only) to appendable. Within 30 
seconds,  NetWorker then  stops requesting a  writable tape (issues event 
cleared message) and then starts to load the other tape. 
However, before it loads it it then sends 4 of the remaining 5 save sets to the 
device that the first one was writing to.  After mounting the other volume it 
sends the 6th and last save set to that device.

I suppose it's conceivable that it was just coincidence and that it took it 
that long to figure out what it wanted to back up? I didn't think it generally 
mattered when running fulls even if a lot of files are involved? Anyway, 
assuming not, why the heck did it have to wait until it had another appendable 
tape to send more save sets to the first one if the first device was only 
running 1 session?

Does anyone  else see this behavior ever? I ran 20 other groups today, all with 
various number of clients, and I never saw this, so it doesn't occur most of 
the time.

Thanks.

George

--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -

>>>>>>>>>>>>>>>>>>>>>>>>>
I've seen this behaviour just recently, v7.3.2 on linux server.
Initially there were several savesets processing in parallel to a single tape 
where as they completed none of the remaining sets waiting to process kicked 
off leaving just one running.  I did not provide another tape and when the 
single save set completed, others were launched to the same tape until the 
backup was completed.  I'd manually kicked off the group since it had not 
completed its normal incremental (due to something else) and was monitoring it 
at the time.  Since I'm running v7.3.2, I'm expecting odd behaviour.  I guess 
this is something that got carried forward.


Patti Clark
Unix System Administrator - RHCT
Office of Scientific and Technical Information

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or via RSS at 
http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER