Re: [Networker] No response from client

Thanks for the suggestion, Teresa, but it turns out this was a lot
simpler than that. The host's ip address, listed in /etc/hosts, was not
correct, but the one listed in the main DNS table was. The reason is
because this machine was set up to replace another older host with that
same ip name; so, until the work could be completed on the new host and
the old files transferred from the old one, a temporary ip address and
host was used on the new one. The new ip name was later changed back to
the old one, BUT the ip address never was. It still listed the new ip!
My guess is that when the backup server would go to resolve the ip, it
would not match since it was using DNS which had the correct entry
comparared with the client's /etc/hosts which did not?

The host's /etc/nsswitch.conf file listed 'files nis dns'. After
changing the wrong ip to the correct ip, the problem immediately
resolved itself. No reboot or re-start necessary. I was also then able
to run all the tools like nwadmin, nwrecover, recover, mminfo, etc.
without them hanging. They all hung prior.

One thing I've noticed in the past when there's a duplex mismatch is
that backups run, but the byte changes in the GUI window are much slower
than you would expect. In our case, though, there was nothing in the
sessions window, not even the message about the tape forwarding.
Nothing. It would take hours before anything would appear in the
sessions window and once something finally appeared, it would then run
normally.

George

Teresa Biehler wrote:
>
> Are the duplex settings for the client and network port the same?  This
> sounds like a duplex mismatch problem.
>
> -T
>
> -----Original Message-----
> From: Legato NetWorker discussion [mailto:NETWORKER AT LISTMAIL.TEMPLE DOT 
> EDU]
> On Behalf Of George Sinclair
> Sent: Friday, December 03, 2004 11:30 AM
> To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
> Subject: [Networker] No response from client
>
> Hi,
>
> We have this Linux client that is agonizingly slow to backup. It *WILL*
> back up, but it takes hours, even to do an incremental, even against
> something simple like /tmp. So if I launch a backup against this one
> client from the GUI, for example, and I walk away, and I come back an
> hour later, I will see nothing, I kid you not. Absolutely no activity
> whatsoever. If I check group control window, the saveset 'All' is still
> listed pending. Unbelievable! If I come back another hour later, same
> thing. When I go home at night, though, and I come in the next morning,
> it's done!
>
> Its file systems are no bigger than any of our other clients. It's
> running the same version of the OS (RedHat 7.3), and as near as I can
> tell has the same setup. It's running the same version of the NetWorker
> client software, too. Pinging the host produces normal timely responses,
> same as other hosts on the network. Also, pinging machines from that
> host works normal, too. This machine can access network with no problems
> that I can tell, and the user never complains about being able to reach
> other hosts, internet, etc.
>
> This problem has existed as long as I can remember, so I don't know when
> it first starting exhibiting this behavior. Here are its horrible
> symptoms and some of the things I've tried to troubleshoot it:
>
> 1. Running the following commands on the client produce no output or
> error messages to the console. They just hang:
>
> nwadmin -s server
> nwrecover -s server
> recover -s server
> save -s server -b pool -l i /tmp
>
> 2. Running a probe against the client from the primary server produces
> nothing:
> savegrp -pn c client group
>
> 3. Under save sets, changed 'All' to /tmp, placed client in its own
> group and ran both full and incremental from GUI. Still nothing! Just
> sits there. I can see that if a file system had like 50 million inodes
> that maybe doing an incremental might take a while, but /tmp? Common!
> Even a small file system like /var just sits dormant during backup. I've
> seen huge RAID on other boxes run circles around this machine on a bad
> day.
>
> There are no error messages or strange warnings in the nsr daemon.log or
> messages log files on the primary server.
> I used rpm -e to remove the NetWorker client and then re-installed and
> re-started the software. Still no luck. I moved the network cable (100
> Mbit) to another port, still no luck. I even tried another network
> cable, no luck. I shut down the host, and rebooted it. Nothing. I even
> ran a check against the client index on the primary server as: 'nsrck
> -L6 client'. No error messages or warnings, and it completed just dandy.
> Didn't take long either. What and the heck is this machine's problem?!!!
> It has plenty of memory, plenty of swap space, and it's doing NOTHING
> when I've been running these tests. It's not like there's 100 users
> logged in. Noone is logged in other than me. I've tried everything but
> running something like ethereal (sp), but maybe I'm gonna have to start
> analyzing packets here? Not too good with those kind of tools.
>
> Any ideas on what to try?
>
> Thanks.
>
> George
>
> --
> Note: To sign off this list, send a "signoff networker" command via
> email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list. Questions regarding this list
> should be sent to stan AT temple DOT edu
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list. Questions regarding this list
should be sent to stan AT temple DOT edu
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=