Networker

Re: [Networker] Savegrp Aborting

2004-04-16 13:59:29
Subject: Re: [Networker] Savegrp Aborting
From: Darren Dunham <ddunham AT TAOS DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 16 Apr 2004 10:59:24 -0700
> The new server is a Sun-Fire v880 running Solaris 9 with the latest
> recommended patch cluster downloaded from Sunsolve. We've attached a
> Qualstar 58132 library containing 3 SAIT-1 tape drives in it via
> scsi. We installed Networker 7.1.1 on it.

What holds /nsr?

> I ran the networker nsrd and nsrexecd daemons in debug mode and it
> pointed me to the /nsr/tmp directory.
>
> If I cd to /nsr/tmp/sec/sg/<groupname> and type ls, it lists the files
> like pr000001 and pr00000a. However if I type ls -la it shows:
>
> # ls -la
> pr000001: No such file or directory
> total 6
> 2 drwxr-xr-x   2   root   other    512  Apr   12  18:08  .
> 2 drwxr-xr-x   10 root   other    512  Apr   12  17:10  ..
> 2 -rw-r--r--    1   root   other    348  Apr    12  12:17 pr00000a
>
> If I then:   'touch test'     I get a message:    test: cannot stat
> However if I type 'ls', it shows pr000001, pr00000a, test
> But if I type 'ls -la' it shows:
>
> # ls -la
> pr000001: No such file or directory
> test: No such file or directory
> total 6
> 2 drwxr-xr-x   2   root   other    512  Apr   12  18:08  .
> 2 drwxr-xr-x   10 root   other    512  Apr   12  17:10  ..
>
> If I wait about 10 or 15 minutes and then type ls -la it shows all
> the files including 'test' just fine, showing their permissions and
> everything else.  This would appear to be an OS issue to me but it is
> really weird.

I would agree.

#1 is /nsr/tmp a UFS filesystem or NFS?  Any mount options?
#2 have you tried unmounting and doing an fsck on it?

It does appear to be OS related since you get the errors just with
'ls'.

> Also, thinking that this might be a problem with Networker 7.1.1,
> I pkgrm it and installed Networker 7.1....and got the same problem.
> Removed 7.1 and installed Networker 6.1.4 and got the same problem.

Seems strange.  I don't know how that would affect the filesystem in the
way you describe.

> So, wondering if you have any ideas on this?  I'm thinking the fact that
> when I 'touch test' and get the cannot stat message and then ls -la shows
> no such file or directory is my main culprit here. But not sure what the
> heck is causing it. BTW...nothing helpful in /var/adm/messages.

One last thing...  is /nsr a mount point?  If so, you might unmount /nsr
and check the permissions of /nsr on the root filesystem (the underlying
directory).  Sometimes that can cause problems (but the error messages
are usually very different from your examples).

> One small clue...I have seen some issues on this box where the same
> message is occuring for files in /var/spool/mqueue. Not sure if it's
> because Networker is trying to email messages or what but there are
> at time some files in /var/spool/mqueue that have the same symptoms
> as those in /nsr/tmp/sec/sg...

Ugh.  My first thought is that this has nothing to do with networker,
but I'm at a loss as to what the real problem is.  I would concentrate
on the filesystem though.


--
Darren Dunham                                           ddunham AT taos DOT com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>