Networker

[Networker] No response from client

2004-12-03 11:27:31
Subject: [Networker] No response from client
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 3 Dec 2004 11:30:12 -0500
Hi,

We have this Linux client that is agonizingly slow to backup. It *WILL*
back up, but it takes hours, even to do an incremental, even against
something simple like /tmp. So if I launch a backup against this one
client from the GUI, for example, and I walk away, and I come back an
hour later, I will see nothing, I kid you not. Absolutely no activity
whatsoever. If I check group control window, the saveset 'All' is still
listed pending. Unbelievable! If I come back another hour later, same
thing. When I go home at night, though, and I come in the next morning,
it's done!

Its file systems are no bigger than any of our other clients. It's
running the same version of the OS (RedHat 7.3), and as near as I can
tell has the same setup. It's running the same version of the NetWorker
client software, too. Pinging the host produces normal timely responses,
same as other hosts on the network. Also, pinging machines from that
host works normal, too. This machine can access network with no problems
that I can tell, and the user never complains about being able to reach
other hosts, internet, etc.

This problem has existed as long as I can remember, so I don't know when
it first starting exhibiting this behavior. Here are its horrible
symptoms and some of the things I've tried to troubleshoot it:

1. Running the following commands on the client produce no output or
error messages to the console. They just hang:

nwadmin -s server
nwrecover -s server
recover -s server
save -s server -b pool -l i /tmp

2. Running a probe against the client from the primary server produces
nothing:
savegrp -pn c client group

3. Under save sets, changed 'All' to /tmp, placed client in its own
group and ran both full and incremental from GUI. Still nothing! Just
sits there. I can see that if a file system had like 50 million inodes
that maybe doing an incremental might take a while, but /tmp? Common!
Even a small file system like /var just sits dormant during backup. I've
seen huge RAID on other boxes run circles around this machine on a bad
day.

There are no error messages or strange warnings in the nsr daemon.log or
messages log files on the primary server.
I used rpm -e to remove the NetWorker client and then re-installed and
re-started the software. Still no luck. I moved the network cable (100
Mbit) to another port, still no luck. I even tried another network
cable, no luck. I shut down the host, and rebooted it. Nothing. I even
ran a check against the client index on the primary server as: 'nsrck
-L6 client'. No error messages or warnings, and it completed just dandy.
Didn't take long either. What and the heck is this machine's problem?!!!
It has plenty of memory, plenty of swap space, and it's doing NOTHING
when I've been running these tests. It's not like there's 100 users
logged in. Noone is logged in other than me. I've tried everything but
running something like ethereal (sp), but maybe I'm gonna have to start
analyzing packets here? Not too good with those kind of tools.

Any ideas on what to try?

Thanks.

George

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=