Networker

Re: [Networker] Savesets completed but group still runs

2003-02-04 19:08:09
Subject: Re: [Networker] Savesets completed but group still runs
From: "Christopher T. Beers" <ctbeers AT COE.SYR DOT EDU>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Tue, 4 Feb 2003 19:08:00 -0500
George:

I am starting to see this same problem but only on one group.  All machine and 
savesets in the group are started via savepnpc and there is always one
that "gets kill after timeout of 33 minutes of inactivity".  However, if you 
look, the saveset completed long long ago.

Any help would be appreciated.

Chris

--On Tuesday, February 04, 2003 2:35 PM -0500 George Sinclair <George.Sinclair AT 
noaa DOT gov> wrote:

Hi,

I've seen a problem on our backup server where one or more groups are
reported as still running in the group control window, but their
respective save sessions have long since completed. If you check the
saveset recover window, or the volumes listing, you can see that the
savesets did in fact complete. Also, they are reported as completed in
the messages window, but when you check the group, sure enough, a bunch
of them are still listed in the darn pending window. Of course, you
never receive a savegroup completion notification. I could just stop the
group and restart it, but that seems ridiculous. I'm constantly seeing
this problem. Another thing I notice is that when this is happening, if
I login to the primary backup server, I see numerous processes running,
one for each saveset, like this:

root  1351  1270  0 12:21:28 ?        0:00 /usr/sbin/nsrexec -T 60000 -c
clientname -a nsrexec -- clientname:/saveset

This is to be expected, but the problem is that the named saveset did
complete. The problem only gets worse as more groups will inevitably
start up later, causing more and more of these processes to run
eternally on the server. Gee, it's no wonder the server just locks up at
some point with all these open processes running. The poor thing can't
breath! How could it with a zillion nsrexec processes still uselessly
running. When I came in this morning, the entire server was frozen, as
in the admin gui would never respond, and just about every group was
sill running. There were plenty of tapes. I could login to the primary
server, and I could see a ton of these processes still running, but I
could NEVER get the admin gui to come alive. I finally had to shut down
NetWorker and restart it. Ugh!!! Of course, at that point, you can't
then restart any of the groups. I basically had to then restart
everything from scratch, and now I'm again seeing the server become more
and more sluggish as more of these groups claim to still be running. I'm
thinking the frozen or sluggish behavior is due to the fact that it has
all these processes open and it should have closed them. Does anyone
know what causes this and how to fix it?

We're running 6.1.1 on our primary server which is a Sun with Solaris
2.8. We also have a storage node server running 6.1.1 under Linux RedHat
7.3. Most of our backups are done on the storage node server.

Thanks.

George
George.Sinclair AT noaa DOT gov

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>