Networker

Re: [Networker] cannot open new progress file? -- problem SOLVED!

2006-11-23 14:33:20
Subject: Re: [Networker] cannot open new progress file? -- problem SOLVED!
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 23 Nov 2006 14:32:51 -0500
Thanks. It turns out, the problem was that I had also copied over /nsr/tmp from the old server (Doh!). Not sure why I did that, or what use I possibly thought it would be to me, but after shutting down the new server, removing /nsr/tmp and re-starting, everything is working fine now. Of course, NetWorker recreated a new /nsr/tmp, and it has very different contents than before.

One thing I should note is that I captured the contents of everything in /nsr/tmp before I shut down the server, and there was no old information in there. Everything in there was from today, and it was all from the new server, no legacy data. Regardless, wiping it out and re-starting fixed it. Again, the contents are way different now, and that was before I re-tested running any groups.

Whew! It was bad enogh having to be here today, let alone having to deal with some unexpected problem in the middle of everything.
Guess I can be thankful now.

Cheers!

George

Stuart Whitby wrote:

My guess is that it's the savegrp progress file which is the problem.  Try running this using 
"truss -o /tmp/savegrp.truss savegrp -v -l full testfull" and searching back for "= 
E" from there.  This should lead you to some sort of error which will identify which file it's 
trying to open, allowing you to check permissions etc. from there.

Cheers,

Stuart.

________________________________

From: EMC NetWorker discussion on behalf of George Sinclair
Sent: Thu 23-Nov-06 18:00
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] cannot open new progress file?



I'm getting the following error when trying to run any groups (manual or
auto enabled) on our server:

savegrp: error, cannot open new progress file

I also tried adding just a single client (fred) to a group (testfull)
(pool=testfull) and then running that group as:

# savegrp -v -l full testfull
fred:All                                level=full
11/23/06 17:36:36 savegrp: Run up to 20 clients in parallel
11/23/06 17:36:36 savegrp: error, cannot open new progress file
11/23/06 17:36:36 savegrp: Failed to update server, aborting Savegroup.
11/23/06 17:36:36 savegrp: group testfull aborted.
11/23/06 17:36:36 savegrp: no save sets to check if written to verified
media
11/23/06 17:36:36 savegrp: error, cannot open new progress file
11/23/06 17:36:36 savegrp: Failed to update server, aborting Savegroup.

Here are the messages from /nsr/logs/messages:
Nov 23 17:52:01 server root: [ID 702911 daemon.notice] NetWorker
savegroup: (info) starting  testfull (with 1 client(s))
Nov 23 17:52:01 server root: [ID 702911 daemon.notice] NetWorker
savegroup: (alert) testfull aborted, total 1 client(s), 0 Hostname
(s) Unresolved, 0 Failed, 1 Succeeded.
Nov 23 17:52:01 server root: [ID 702911 daemon.notice] Start time:   Thu
Nov 23 17:52:01 2006
Nov 23 17:52:01 server root: [ID 702911 daemon.notice] End time:     Thu
Nov 23 17:52:01 2006
Nov 23 17:52:01 server root: [ID 702911 daemon.notice]
Nov 23 17:52:01 server root: [ID 702911 daemon.notice] --- Never Started
Save Sets ---
Nov 23 17:52:01 server root: [ID 702911 daemon.notice]
Nov 23 17:52:01 server root: [ID 702911 daemon.notice] savegrp: fred:All
save was never started
Nov 23 17:52:01 server root: [ID 702911 daemon.notice] savegrp:
fred:index index was never started
Nov 23 17:52:01 server root: [ID 702911 daemon.notice]

Here are the messages from /nsr/logs/deamon.log:
11/23/06 17:52:01 savegrp: error, cannot open new progress file
11/23/06 17:52:01 savegrp: Failed to update server, aborting Savegroup.
11/23/06 17:52:01 nsrd: savegroup info: starting  testfull (with 1
client(s))
11/23/06 17:52:01 savegrp: group testfull aborted.
11/23/06 17:52:01 savegrp: error, cannot open new progress file
11/23/06 17:52:01 savegrp: Failed to update server, aborting Savegroup.
11/23/06 17:52:01 nsrd: savegroup alert: testfull aborted, total 1
client(s), 0 Hostname(s) Unresolved, 0 Failed, 1 Succeeded.
11/23/06 17:52:01 nsrd: runq: NSR group testfull exited with return code 1.

However, I am able to back up data to this group from the client as:
save -b testfull -l full /path

We're running 7.2.2 on Solaris 2.9 with a Linux storage node (also runs
7.2.2). All clients are at 7.2.2. The current media database and indexes
were transferred
over from our old 6.1.1 server (was running Solaris 2.8). The server
name is the same. I ran nsrim -X on the old server
before the transfer and on the new server after. The old server is shut
down to avoid any ip conflicts while the new server
is up. We've been doing recent backups on the old server, so the new
server was still considered a "test" server. If I wanted to
test it, I would shut down the old server and then bring up the new one.
I know I'd done backups a while ago without seeing these problems. I went to
put the new server into production today, and then this happens! Sheesh!
I should note that I'm able to load, unload,
mount, label and inventory tapes. Recovering data that was written to
tapes labeled on the old server also works. I am
also able to clone data from old tapes and read that data to.

Any ideas?

I'm having a lousy thanksgiving [sigh ...]. Thanks much.

George

--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER




--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -
To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>