Networker

Re: [Networker] No response from client

2004-12-03 12:26:10
Subject: Re: [Networker] No response from client
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 3 Dec 2004 12:28:51 -0500
Well, 'traceroute -i interface client' works fine from/to this host.
However, 'savefs -p' on this client produces nothing! Very interesting.
It works fine on other clients. I checked /etc/fstab, nothing unusual in
there. I stopped and restarted the client software, same problem.
Everyting under /nsr/tmp is 0 bytes so not sure that there's anything to
clean out? Here's what /nsr/tmp has in it:

nsr.res.lck
nsrla.res.lck
product.res.lck
sec (empty directory)

This is the same on other clients. I also changed the "Storage nodes"
field for this client from:

storagenode_server_name
primary_server_name

to:

primary_server_name

and then ran same backups, and same problem. Clearly, the fact that this
client can't even run 'savefs -p' and receive a response in a normal
period of time is a major question here. I suppose it must eventually do
something, however, as backups on this hsot will eventually run, it just
takes all night. Not sure what to do at this point.

George

Robert Maiello wrote:
>
> I think the key here is:
>
> "2. Running a probe against the client from the primary server produces
> nothing:  savegrp -pn c client group"
>
> When you run a probe against the client it should return something. If it
> is successfull sometimes as you say shouln't a probe eventually return
> something?
>
> My guess is either:
>
> a.) It is having some network issue;  bad connection/packetloss/routing..
> .it may be having trouble getting back to the server...ie.  you can get to
> it but it can't back to the server.    Did you try traceroutes on the
> client out the proper interface to the backup server/storagenode that your
> data travels on?
>
> b.)  It is an client software issue.   Does savefs -p  work on the client?
> This is what probe runs, the software should function on the client in
> some manner. The client software have been stopped, /nsr/tmp cleaned out,
> and software restarted? It is possible the system needs to be rebooted.
>
> Robert Maiello
> Pioneer Data Systems
>
> On Fri, 3 Dec 2004 11:30:12 -0500, George Sinclair
> <George.Sinclair AT NOAA DOT GOV> wrote:
>
> >Hi,
> >
> >We have this Linux client that is agonizingly slow to backup. It *WILL*
> >back up, but it takes hours, even to do an incremental, even against
> >something simple like /tmp. So if I launch a backup against this one
> >client from the GUI, for example, and I walk away, and I come back an
> >hour later, I will see nothing, I kid you not. Absolutely no activity
> >whatsoever. If I check group control window, the saveset 'All' is still
> >listed pending. Unbelievable! If I come back another hour later, same
> >thing. When I go home at night, though, and I come in the next morning,
> >it's done!
> >
> >Its file systems are no bigger than any of our other clients. It's
> >running the same version of the OS (RedHat 7.3), and as near as I can
> >tell has the same setup. It's running the same version of the NetWorker
> >client software, too. Pinging the host produces normal timely responses,
> >same as other hosts on the network. Also, pinging machines from that
> >host works normal, too. This machine can access network with no problems
> >that I can tell, and the user never complains about being able to reach
> >other hosts, internet, etc.
> >
> >This problem has existed as long as I can remember, so I don't know when
> >it first starting exhibiting this behavior. Here are its horrible
> >symptoms and some of the things I've tried to troubleshoot it:
> >
> >1. Running the following commands on the client produce no output or
> >error messages to the console. They just hang:
> >
> >nwadmin -s server
> >nwrecover -s server
> >recover -s server
> >save -s server -b pool -l i /tmp
> >
> >2. Running a probe against the client from the primary server produces
> >nothing:
> >savegrp -pn c client group
> >
> >3. Under save sets, changed 'All' to /tmp, placed client in its own
> >group and ran both full and incremental from GUI. Still nothing! Just
> >sits there. I can see that if a file system had like 50 million inodes
> >that maybe doing an incremental might take a while, but /tmp? Common!
> >Even a small file system like /var just sits dormant during backup. I've
> >seen huge RAID on other boxes run circles around this machine on a bad
> >day.
> >
> >There are no error messages or strange warnings in the nsr daemon.log or
> >messages log files on the primary server.
> >I used rpm -e to remove the NetWorker client and then re-installed and
> >re-started the software. Still no luck. I moved the network cable (100
> >Mbit) to another port, still no luck. I even tried another network
> >cable, no luck. I shut down the host, and rebooted it. Nothing. I even
> >ran a check against the client index on the primary server as: 'nsrck
> >-L6 client'. No error messages or warnings, and it completed just dandy.
> >Didn't take long either. What and the heck is this machine's problem?!!!
> >It has plenty of memory, plenty of swap space, and it's doing NOTHING
> >when I've been running these tests. It's not like there's 100 users
> >logged in. Noone is logged in other than me. I've tried everything but
> >running something like ethereal (sp), but maybe I'm gonna have to start
> >analyzing packets here? Not too good with those kind of tools.
> >
> >Any ideas on what to try?
> >
> >Thanks.
> >
> >George
> >
> >--
> >Note: To sign off this list, send a "signoff networker" command via email
> >to listserv AT listmail.temple DOT edu or visit the list's Web site at
> >http://listmail.temple.edu/archives/networker.html where you can
> >also view and post messages to the list. Questions regarding this list
> >should be sent to stan AT temple DOT edu
> >=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=