------------------------------------------------------
Standard disclaimer:
Yes, I know Networker 7.2 is old (and so is Solaris 8)
and believe me we're trying to move on, but a highly-
interdependent environment is slow to move. Next week,
we move to Solaris 10 and will get to 7.5.x ASAP, but
we have dependent Solaris 8 clients out there that
are holding us back. Plus, we're updating the entire
infrastructure later this year... 'nuff said. :-)
------------------------------------------------------
So, as you probably guessed, we're running Sun badged "EBS" 7.2 on Solaris
8 SPARC server, writing to SDLT 320. Since we moved a specific Windows XP
client behind a Cisco firewall, we've seen strange behavior with one
saveset (D:\) that actually finishes backing up, but the group/job never
completes for full (only) backups. The firewall rules allow two way
communication between the server and client, via TCP and UDP ports
7937-9936.
Here's what I've seen :
Mon-Thu, scheduled Level 5 C:\ 10 MB Completes normally
Level 5 D:\ 15 GB Completes normally
Friday, scheduled Full C:\ 6 GB Completes normally
Full D:\ 48 GB Saveset finishes *
Test run, manual Level 5 D:\ 68 MB Completed normally
Test run, manual Full D:\ 48 GB Saveset finished *
* but not the job!
As I watch the group, D:\ finishes :
04/14/10 11:29:37 nsrd: client1:D:\ done saving to pool 'pool1'
(001828) 48 GB
...but the index never saves and it just sits there :
Looking on the server, the only two related processes I see are :
root 3346 3338 0 10:13:08 ? 0:00 /usr/sbin/nsr/nsrexec -c
client1 -a -- client1:D:\
root 3338 557 0 10:13:05 ? 0:00 /usr/sbin/nsr/savegrp
missed
During this time, our firewall guy didn't see anything hitting the
firewall from the client, though.
Finally this appears in the daemon.log and it tries again :
04/14/10 12:21:20 savegrp: client1:D:\ unexpectedly exited.
* client1:D:\ Cannot determine status of the backup process. Use
mminfo to determine job status.
04/14/10 12:21:20 savegrp: client1:D:\ will retry 5 more time(s)
04/14/10 12:21:21 nsrd: client1:D:\ saving to pool 'pool1' (001826)
When I finally kill the group, I get the same kind of message :
* client1:D:\ 5 retries attempted
* client1:D:\ Cannot determine status of the backup process. Use
mminfo to determine job status.
mminfo seems to indicate the backup is OK; I can indeed browse and recover
from it. But somehow, Networker never seems to know it's actually
finished, so it can backup the index and close the group/job. In fact,
given a scheduled full backup last Friday night, two manual attemped fulls
and other tests since then, we now have 17+ copies of this saveset! It
*appears* to be related to the backup level/size, but it may be a timeout
or other secondary issue. At this point, I have no idea.
Is there something we missed? Another port (range), maybe?
Given this just started happening when the client was physically moved and
firewalled-off, it's pretty hard to ignore the coincidence of it. Help!
:-)
Thanks!
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|