Networker

Re: [Networker] Too many open files in system?

2007-11-30 17:42:56
Subject: Re: [Networker] Too many open files in system?
From: "McBeen, Ben" <ben.mcbeen AT PLUMASBANK DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 30 Nov 2007 14:34:59 -0800
Incase anyone is interested, the problem for us here was hardware. We are still not sure of the exact issue, but there was a problem with the library initializing it's tape drives. When the library would attempt to auto-load a tape and label it, the tape would partially mount physically, but then crash, with the tape leader partially wound on to the drive reel. The library would then attempt to restart the drive. At this point, the drive wakes up to find a tape partially loaded, and freaks out, then immediately shuts down. All of this would be fine, except that the cycle created tremendous noise on the scsi chain, and basically shuts down the internal TTL bus used to monitor the drives. The remaining drive that still operates has to deal with CRC errors on the scsi chain. These errors freak out Networker, who cannot understand the resulting data coming from the drive. We would get back random volume ids from the drives, which leads to conflicts when mounting tapes. Ugly.

We spent the past week with techs on site trying to resolve the issues, with the final result being to replace all of our tape drives. looks like some kind of firmware conflict.

Here was our event cycle within networker for reference. Hopefully this will help someone else avoid the pain we went through.

First, we get this when auto-labeling a tape:
*********************************************
                cancellation: none;
             completion code: ;
               error message: ;
                 last update: 1195663833;
                    messages: ;
                        name: [email protected];
          operation instance: 900;
            operation source: nsrd jb op;
                    progress: \
"Cannot read the current volume label `Tape label read for volume ? in pool ?, is not recognised by\
 Networker: Too many open files in system'.";
             prompt response: ;
                      prompt: ;
                  start time: 1195663656;
                      status: running
*********************************************

We then get this error, sometimes.
*********************************************
                cancellation: none;
             completion code: ;
error message: Expected volume `07N089' in slot `11'. The actual volume is `<NULL>'., Expected volume `07N089' in slot `11'. The actual volume is `<NULL>'.;
                 last update: 1195663396;
                    messages: ;
                        name: [email protected];
          operation instance: 898;
            operation source: nsrd jb op;
                    progress: failed;
             prompt response: ;
                      prompt: ;
                  start time: 1195663151;
                      status: failed
*********************************************

We then get this error an indefinite number of times.
*********************************************
                cancellation: none;
             completion code: ;
               error message: \
Duplicate volume name `07N111'. Select a new name or remove the original volume.;
                 last update: 1195662396;
                    messages: ;
                        name: [email protected];
          operation instance: 893;
            operation source: nsrd jb op;
                    progress: failed;
             prompt response: ;
                      prompt: ;
                  start time: 1195662098;
                      status: failed
*********************************************

We then get this error, and the drive gets flagged as offline. In the system event viewer, we can see the CRC errors are also being triggered.
*********************************************
                cancellation: none;
             completion code: ;
               error message: \
"Jukebox:[email protected] access:[email protected] failed:MOVE MEDIUM key:5 status:CHECK CONDITION UNKNOWN, In\
compatible Medium Installed",
"Jukebox:[email protected] access:[email protected] failed:MOVE MEDIUM key:5 status:CHECK CONDITION UNKNOWN, In\
compatible Medium Installed";
                 last update: 1195662410;
                    messages: ;
                        name: [email protected];
          operation instance: 894;
            operation source: nsrd jb op;
                    progress: retryable;
             prompt response: ;
                      prompt: ;
                  start time: 1195662403;
                      status: retryable
*********************************************


At this point, I can restart the cycle by going to the media database and deleting the tape. The library will then remount the tape, and try to label it. I always get error #1, but sometimes it labels it properly and bypasses the chain of errors above.


---------------------------------------------------------
Ben McBeen
Information Technology Systems Engineer
Plumas Bank
530-283-7305 x7602


---------------------------------------------------------
Ben McBeen
Information Technology Systems Engineer
Plumas Bank
530-283-7305 x7602

On Nov 20, 2007, at 1:00 PM, McBeen, Ben wrote:

Been getting the same problem here. Seems to be some problem with labeling... at least that's what we have found. In some cases we are able to getting things moving again by relabeling the tape with a new barcode and trying again.

We are also getting errors at the same time about duplicate media in the tape database, even though all IDs are unique.

here is another one in the same group...
Failed false operation: "Load"; operation device: "\\.\Tape1"; operation slots: " 2"; write enabled: "Yes"; Expected volume ID `3509667549' for volume `079007'. The actual volume ID is `2850665173'., Expected volume ID `3509667549' for volume `079007'. The actual volume ID is `2850665173'. 85

No idea what's going on, but it looks like it's time to call support.

---------------------------------------------------------
Ben McBeen
Information Technology Systems Engineer
Plumas Bank
530-283-7305 x7602

On Nov 20, 2007, at 9:50 AM, MIchael Leone wrote:

Here's a weird one. I have NW 7.4 on Win2003 Enterrpise server. I'm
labeling some new tapes (via script), and I've started getting this:

Info: Loading volume `-' from slot `3' into device `\\.\Tape2'.
Info: Operation `Verify label' in progress on device `\\.\Tape2'
Info: Cannot read the current volume label `Tape label read for volume ?
in pool ?, is not recognised by Networker: Too
many open files in system'.
Info: nsrmmgd assumes the volume is unlabeled and will write a new label. Info: Operation `Label without mount' in progress on device `\\. \Tape2'
Info: Label: `1127153791', pool: `PHAPRD', capacity: `<NULL>'.
Info: Operation `Eject' in progress on device `\\.\Tape2'
Info: Unloading volume `-' from device `\\.\Tape2' to slot 3.
Error: fsr 1 (read): drive status is The tape drive is ready for use
39077:nsrjb: error, Jukebox command terminated with errors.

Knowledgebase on Powerlink searches turn up nothing, nor does a web search
(everything refers to Linux, and this is a Windows server).

(BTW ... love that error message - "drive status is: The tape drive is
ready for use". Informative, that is ... :-))

I can't label using the GUI, after it fails, either. I suppose I could try
stopping all NW services and re-starting ...

--
Michael Leone
Network Administrator, ISM
Philadelphia Housing Authority
2500 Jackson St
Philadelphia, PA 19145
Tel: 215-684-4180
<mailto:michael.leone AT pha.phila DOT gov>

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER



----------------------------
NOTICE: This electronic mail message and any files transmitted
with it are intended exclusively for the individual or entity to
which it is addressed. The message, together with any attachment,
may contain confidential and/or privileged information. Any
unauthorized review, use, printing, saving, copying, disclosure
or distribution is strictly prohibited. If you have received this
message in error, please immediately advise the sender by reply
email and delete all
copies.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER



----------------------------
NOTICE: This electronic mail message and any files transmitted
with it are intended exclusively for the individual or entity to
which it is addressed. The message, together with any attachment,
may contain confidential and/or privileged information. Any
unauthorized review, use, printing, saving, copying, disclosure
or distribution is strictly prohibited. If you have received this
message in error, please immediately advise the sender by reply
email and delete all
copies.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>