Try running the full with the retries set to zero to get more information about
the "failure".
We have randomly had similar issues across all platforms - backup appears to
complete, then suddenly it restarts, and only on the fulls. Same "cannot
determine status" error(s). Once we set the retries to zero, we got a
different, more descriptive (but, no less helpful unfortunately) error message.
That might work for you and at least you can get an idea of what's causing the
restart.
In our case, we're on 7.4.4.4 Solaris 10. We've had this happen on Windows
server clients, Solaris 10 clients, and a Netware client. We've had a case
open with EMC for months on one client - the error is "nsr_end: bad file
number" - no resolution as of yet.
-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Len Philpot
Sent: Wednesday, April 14, 2010 3:02 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Hanging backup job through a firewall ... ?
------------------------------------------------------
Standard disclaimer:
Yes, I know Networker 7.2 is old (and so is Solaris 8)
and believe me we're trying to move on, but a highly-
interdependent environment is slow to move. Next week,
we move to Solaris 10 and will get to 7.5.x ASAP, but
we have dependent Solaris 8 clients out there that
are holding us back. Plus, we're updating the entire
infrastructure later this year... 'nuff said. :-)
------------------------------------------------------
So, as you probably guessed, we're running Sun badged "EBS" 7.2 on Solaris
8 SPARC server, writing to SDLT 320. Since we moved a specific Windows XP
client behind a Cisco firewall, we've seen strange behavior with one
saveset (D:\) that actually finishes backing up, but the group/job never
completes for full (only) backups. The firewall rules allow two way
communication between the server and client, via TCP and UDP ports
7937-9936.
Here's what I've seen :
Mon-Thu, scheduled Level 5 C:\ 10 MB Completes normally
Level 5 D:\ 15 GB Completes normally
Friday, scheduled Full C:\ 6 GB Completes normally
Full D:\ 48 GB Saveset finishes *
Test run, manual Level 5 D:\ 68 MB Completed normally
Test run, manual Full D:\ 48 GB Saveset finished *
* but not the job!
As I watch the group, D:\ finishes :
04/14/10 11:29:37 nsrd: client1:D:\ done saving to pool 'pool1'
(001828) 48 GB
...but the index never saves and it just sits there :
Looking on the server, the only two related processes I see are :
root 3346 3338 0 10:13:08 ? 0:00 /usr/sbin/nsr/nsrexec -c
client1 -a -- client1:D:\
root 3338 557 0 10:13:05 ? 0:00 /usr/sbin/nsr/savegrp
missed
During this time, our firewall guy didn't see anything hitting the
firewall from the client, though.
Finally this appears in the daemon.log and it tries again :
04/14/10 12:21:20 savegrp: client1:D:\ unexpectedly exited.
* client1:D:\ Cannot determine status of the backup process. Use
mminfo to determine job status.
04/14/10 12:21:20 savegrp: client1:D:\ will retry 5 more time(s)
04/14/10 12:21:21 nsrd: client1:D:\ saving to pool 'pool1' (001826)
When I finally kill the group, I get the same kind of message :
* client1:D:\ 5 retries attempted
* client1:D:\ Cannot determine status of the backup process. Use
mminfo to determine job status.
mminfo seems to indicate the backup is OK; I can indeed browse and recover
from it. But somehow, Networker never seems to know it's actually
finished, so it can backup the index and close the group/job. In fact,
given a scheduled full backup last Friday night, two manual attemped fulls
and other tests since then, we now have 17+ copies of this saveset! It
*appears* to be related to the backup level/size, but it may be a timeout
or other secondary issue. At this point, I have no idea.
Is there something we missed? Another port (range), maybe?
Given this just started happening when the client was physically moved and
firewalled-off, it's pretty hard to ignore the coincidence of it. Help!
:-)
Thanks!
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|