Networker

Re: [Networker] Saveset not completing normally

2007-12-20 18:06:26
Subject: Re: [Networker] Saveset not completing normally
From: Tim Mooney <Tim.Mooney AT NDSU DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 20 Dec 2007 17:02:17 -0600
In regard to: [Networker] Saveset not completing normally, Clark, Patti...:

12/19/07 21:10:09 savegrp: command ' save -s lxclyde-gb.osti.gov -g
"Production Linux FS" -LL -f - -m lxdwtprod-gb.osti.gov -t 1198027808 -l
incr -q -W 78 -N /var /var' for client lxdwtprod-gb.osti.gov exited with
return code 1

I would first check the logs on the client, especially
/nsr/logs/daemon.log.  That *might* contain a relevant clue.

I would also check /nsr/cores on the client, to see if there are any
recent core files from any of the networker programs that run on a client.
I'm pretty sure there won't be anything useful here on your client, but
it's worth a try.  You might be surprised at how much you'll find in
there on your server.

This saveset started failing over the weekend on incrementals and level
5's.  I first tried a manual backup of only that saveset incrementally
and had the same result.  Approx. 4GB would backup and then it would sit
for 1-2 hours before aborting.  I then changed it to do a full backup
which successfully performed to completion and that evening the normally
scheduled incremental was successful.  Then again last night, the
scheduled incremental failed.  Is this familiar to anyone? Ideas on what
to tackle?

The way I've tackled issues like this is with truss (on Linux, it's
strace).  Connect up to the running nsrexecd processes on your client with
strace (be sure you use the -f option to continue tracing new processes
after fork()).  You'll want to use the -o option as well, to save all
strace output to a file somewhere (in a large filesystem with plenty of
free space).

Once connected to the nsrexecds on the client, start a normal backup
from the server using whatever method you choose (it will make the log
reading job easier if you only backup the filesystem you're having
troubles with).  Your strace log file will log every system call the
nsrexecds make, along with the system calls of all of their children
(save processes).

Somewhere in there is the answer to why the backup is failing.  Reading
system call traces does require a relatively good understanding of how
system calls work on UNIX or Linux, but once you get comfortable with
strace or truss, you'll find it's an invaluable tool.

Tim
--
Tim Mooney                                        Tim.Mooney AT ndsu DOT edu
Information Technology Services                   (701) 231-1076 (Voice)
Room 242-J6, IACC Building                        (701) 231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER