Networker

Re: [Networker] problems from upgrade from 7.2.2 to 7.4.3

2008-10-23 17:22:47
Subject: Re: [Networker] problems from upgrade from 7.2.2 to 7.4.3
From: Peter Viertel <Peter.Viertel AT MACQUARIE DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 24 Oct 2008 08:19:29 +1100
That might have been my case. 

We had almost all of the symptoms you described on 7.3.4. The case ran along 
for months before we looked at the os.  The answer turned out to be a bug in 
the solaris 10 network stack which was introduced by a patch around last feb. 
We patched the os up to recommended cluster for august08  and hey presto all 
the probs went away.  

We expected it to fix our problems with backups of some windows clients but it 
also fixed the problem we had with savegroups hanging at the end, acsls silos 
getting mixed up inventories  and nmc losing track of sessions etc. Basically 
any process that uses tcp sockets to talk to others was being impacted even if 
they were on the same host. 
 

----- Original Message -----
From: EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
Sent: Fri Oct 24 04:05:11 2008
Subject: Re: [Networker] problems from upgrade from 7.2.2 to 7.4.3

I'm running solaris 10. Sorry about that.

My tech says he has another case, w/ almost identical issues, assigned
to him and they are also running solaris 10.

Thanks!

Joel

-----Original Message-----
From: Stan Horwitz [mailto:stan AT temple DOT edu] 
Sent: Thursday, October 23, 2008 12:44 PM
To: EMC NetWorker discussion; Joel Fisher
Subject: Re: [Networker] problems from upgrade from 7.2.2 to 7.4.3

This discussion is interesting, but no one who said they are having
problems
has mentioned which OS they are using or anything else about their
NetWorker
configuration other then the version number. As a result, it is
impossible
to see if there are any commonalities to the problems that have been
discussed in this thread.


> From: Joel Fisher <jfisher AT WFUBMC DOT EDU>
> Reply-To: EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>,
Joel
> Fisher <jfisher AT WFUBMC DOT EDU>
> Date: Thu, 23 Oct 2008 12:39:13 -0400
> To: <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
> Subject: Re: [Networker] problems from upgrade from 7.2.2 to 7.4.3
> 
> Hey Roberta,
> 
> My tech mentioned "the disabling all devices" work around.  Does that
> clear up the problem until the next networker shutdown?  Does is clear
> up the adv_file device unmounting issue?
> 
> Thanks!
> 
> Joel
> 
> 
> 
> -----Original Message-----
> From: Roberta Gold [mailto:gold11 AT llnl DOT gov]
> Sent: Thursday, October 23, 2008 12:01 PM
> To: EMC NetWorker discussion; Joel Fisher
> Subject: Re: [Networker] problems from upgrade from 7.2.2 to 7.4.3
> 
> Wow! Finally someone with the same problems as us!!! Especially item
> #3, which EMC has been working since our upgrade to 7.3 ... 7.3.1 ...
> 7.3.2 ... 7.3.3 ... 7.4.1 ... 7.4.2. Apparently it is related to the
> media index not being ready when it tries to do mounts. Of course the
> autochanger volumes will retry until successful, but no such retries
> for advfile volumes ...
> 
> Our work-around is to disable both jukeboxes and unmount all advfile
> volumes before shutting down NetWorker. After restarting NetWorker we
> wait for "media db open for business' before enabling our devices.
> Pain in the ...
> 
> Anyway, this only works for planned outages. When NetWorker goes down
> without time to do above, we have to do the workaround after it comes
> up, and stop/restart again!
> 
> I will post case numbers later. I am busy for the next 30 minutes ...
> 
> Oh yes. We also experienced 1, 2, 4, & 5 ...
> 
> 
> 
> 
>> Hey Guys,
>> 
>> 
>> 
>> Last Thursday I upgrade from 7.2.2 to 7.4.3. It has been less than
>> smooth so far.
>> 
>> 
>> 
>> It initial seemed to go flawlessly, but Monday morning nsrd crashed
and
>> would not stay running.  EMC provided a hotfixed nsrd that seems to
> have
>> resolved that problem, but I have some other less critical problems
> that
>> I was wondering if you guys have seen.
>> 
>> 
>> 
>> 1)      Adv_file devices keep randomly unmounting.  I've seen in the
>> archives people having issues with RO devices, but in my case it is
any
>> device RW or RO.  There isn't any message any the log about the
> dismount
>> just that it notifies me if it needs it mounted.
>> 
>> 2)      'Owner notification' either doesn't work, or the
functionality
>> has changed.  My existing scripts don't work with it.  For
>> troubleshooting, I've made a very simple script that basically takes
>> stdin and writes it to a file.  That doesn't work either.
>> 
>> 3)      Media that is labeled and previously working will not mount.
>> I'll get a message about "volume xxxxxx(volid xxxxxxxxxxx) NOT in
media
>> index".  But then after awhile it will mount, after no intervention
on
>> my part.  This is happening on tapes within a silo and on my adv_file
>> type devices that keep unmounting.  May be related to the first
> problem.
>> 
>> 4)      Many, not all, savegroups are not finishing and the jobs that
>> are just hanging out are typically index saves.
>> 
>> 5)      Nsrjb shows empty slots... which in not normal for an acsls
>> silo.  It allows me to allocate the "volumes" in those slots, but the
>> volumes are not actually in the silo.  In previous releases, a volume
>> could not be allocated to a silo unless in was physically in the
silo.
>> I'm assuming this is a bug not a design change.
>> 
>> 
>> 
>> Any assistance would be appreciated.
>> 
>> 
>> 
>> FYI... I do have a case open with EMC to address them.
>> 
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> Joel
>> 
>> 
>> To sign off this list, send email to listserv AT listserv.temple DOT edu
>> and type "signoff networker" in the body of the email. Please write
>> to networker-request AT listserv.temple DOT edu if you have any problems
>> with this list. You can access the archives at http://
>> listserv.temple.edu/archives/networker.html or
>> via RSS at http:// listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
> 
> 
> -- 
> Roberta Gold
> Lawrence Livermore National Laboratory
> ICC/HPSD - Security Technologies Group
> gold11 AT llnl DOT gov
> (925) 422-0167
> 
> To sign off this list, send email to listserv AT listserv.temple DOT edu and
type
> "signoff networker" in the body of the email. Please write to
> networker-request AT listserv.temple DOT edu if you have any problems with
this list.
> You can access the archives at
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

NOTICE
This e-mail and any attachments are confidential and may contain copyright 
material of Macquarie Group Limited or third parties. If you are not the 
intended recipient of this email you should not read, print, re-transmit, store 
or act in reliance on this e-mail or any attachments, and should destroy all 
copies of them. Macquarie Group Limited does not guarantee the integrity of any 
emails or any attached files. The views or opinions expressed are the author's 
own and may not reflect the views or opinions of Macquarie Group Limited.