Networker

Re: [Networker] backup of / failing

2006-11-21 12:18:49
Subject: Re: [Networker] backup of / failing
From: "Landwehr, Jerome" <jlandweh AT HARRIS DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 21 Nov 2006 12:10:21 -0500
there are only two partitions on the OS disk / and var (well tmp too)

there's no firewall of filtering - they are in the same room on the same network

the -D9 output is one thing that EMC asked for and still no progress - it ends 
thusly:

walk(/usr/openwin/bin/Xprt, Xprt)
save: Walking for /usr/openwin/bin/Xprt
save: lg_lstat(): Calling native lstat().
`/usr/openwin/bin/Xprt' change time Mon Nov 13 10:08:29 2006
save: file_get_stsize(), size = 1701824
uasm -s /usr/openwin/bin/Xprt
save: lg_open(): Calling open().
save: no extended ACL entry for `/usr/openwin/bin/Xprt'
/usr/openwin/bin/Xprt: fid = <0, 4790>
save: Encoding SF_MAGIC3 saverec.
lost connection to server, exiting

I tried a .nsr in /usr to skip that directory and it fails just the same 
someplace else

I'll try that env on the client though

thanks!


-----Original Message-----
From: Jason Kölker [mailto:jason AT koelker DOT net] 
Sent: Tuesday, November 21, 2006 11:37 AM
To: EMC NetWorker discussion; Landwehr, Jerome
Subject: Re: [Networker] backup of / failing

On Tue, 2006-11-21 at 10:04 -0500, Landwehr, Jerome wrote:
> I have a NW 7.2.1 data zone with a Unix master server and a Unix network
> client, both Solaris 9, on the same subnet, both on gigabit copper
> network, no network errors or collisions
>  
> I have never been able to get a successful backup of / (yet /var works
> everytime)

I assume that / has more data than /var, correct?

>  
> I've tried everything, deleting the index, backing up from the client,
> deleting the client resource, upgrading to 7.2.2 on the client, putting
> a .nsr file to skip directories, yet everytime I get about 14MB and then
> the error "lost connection to server, exiting"
>  

Is there perchance a firewall or some other filtering in between the 2
servers.  It sounds like the control connection is getting terminated
since there is no data passing on it.  This can happen with large
volumes when data is being transferred on the data ports and nothing on
the control port.

If you set the NSR_KEEPALIVE_WAIT environment variable in the start up
script on the client to something like 30, this will send a keep alive
packet on the control port every 30 seconds.

> Of course EMC support 'has never seen this' and has done nothing more
> than the obvious
>  
> Has anyone seen this or have suggestions?

You might also try running a backup of / manually on the client with
"-vvv" and/or "-D9" arguments to `save` to get some better output if the
NSR_KEEPALIVE_WAIT doesn't work.  Usually with "-D9" you can see exactly
what's going on.

Happy Hacking!

7-11

>  
> thanks in advance
>  
> jerry
> 
> To sign off this list, send email to listserv AT listserv.temple DOT edu and 
> type "signoff networker" in the body of the email. Please write to 
> networker-request AT listserv.temple DOT edu if you have any problems with 
> this list. You can access the archives at 
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER