Networker

Re: [Networker] backup of server failed

2003-01-08 15:12:23
Subject: Re: [Networker] backup of server failed
From: Carston Locher <clocher AT LEGATO DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Wed, 8 Jan 2003 12:14:46 -0800
Penny,

It looks like a potentially bad drive.  I would check the operating
system logs (/var/adm/solaris on Solaris, not sure on Tru64 anymore.

You have media verification running which explains the restarts.  If the
media verification fails all running save sessions assigned to that
nsrmmd are restarted.

Thanks,

Carston

On Wed, 2003-01-08 at 11:57, Penny Jaye Deeney wrote:
> On Wed, 8 Jan 2003, Carston Locher wrote:
>
> > Penny,
> >
> > Also you should have specific error messages in /nsr/logs/daemon.log
> > that should help explain what happened.
> >
> > Could you check if there are specific errors and provide them to the
> > mailing list?
> >
>
> [long message]
>
> I spent some time fishing through daemon.log, but I found some interesting
> stuff.
>
> Bootz was backing up to our DLT jukebox:
> ------
> 01/07/03 09:40:38 nsrd: bootz:/tmp saving to pool 'bootz' (ALX377)
> 01/07/03 09:40:39 nsrd: bootz:/usr saving to pool 'bootz' (ALX377)
> 01/07/03 09:40:39 nsrd: bootz:/ saving to pool 'bootz' (ALX377)
> 01/07/03 09:40:39 nsrd: bootz:/nsr saving to pool 'bootz' (ALX377)
> 01/07/03 09:40:42 nsrd: bootz:/tmp done saving to pool 'bootz' (ALX377) 2 KB
> 01/07/03 09:41:26 nsrd: media warning: /dev/nrmt5h writing: I/O error, at 
> file 63 record 6
> 01/07/03 09:41:26 nsrd: media notice: tz89 tape ALX377 on /dev/nrmt5h is full
> 01/07/03 09:41:26 nsrd: media notice: tz89 tape ALX377 used 9771 MB of 40 GB 
> capacity
> 01/07/03 09:54:04 nsrd: media warning: /dev/nrmt5h moving: fsf 1: I/O error
> 01/07/03 10:07:26 nsrd: media warning: /dev/nrmt5h moving: fsf 63: I/O error
> ------
>
> For some reason, it kept trying to fsf 63 every 15 minutes or so:
> ------
> 01/07/03 11:41:41 nsrd: media warning: /dev/nrmt5h moving: fsf 63: I/O error
> 01/07/03 11:41:41 nsrd: media emergency: could not position ALX377 to file 
> 63, record 65
> ------
> 01/07/03 12:23:00 nsrd: media warning: verification of volume "ALX377", volid 
> 2834174209 f
> ailed, can not read record 65 of file 63 on tz89 tape ALX377
> 01/07/03 12:23:00 nsrd: media notice: verification of volume "ALX377", volid 
> 2834174209 fa
> iled, volume is being marked as full.
> 01/07/03 12:23:07 nsrd: media notice: Save set (451340034) bootz:/nsr volume 
> ALX377 on /de
> v/nrmt5h is being terminated because: Media verification failed
> * bootz:/nsr ! no output
> 01/07/03 12:23:08 savegrp: bootz:/nsr will retry 5 more time(s)
> 01/07/03 12:23:12 nsrd: bootz:/nsr done saving to pool 'bootz' (ALX377)
> ------
>
> So if nsrd knew the tape was full at 09:41:26, why was it trying to fsf
> for over 2 more hours?  Why didn't it mark it full?  Why is it full when
> there is so little of it used?  Did /tmp succeed because there was only 2K
> on it?
>
> So, another tape is mounted. (Note - other lines about the tape mounting
> were not included below for brevity)
>
> ------
> 01/07/03 12:28:15 nsrd: media info: loading volume PDB088 into /dev/nrmt5h
> 01/07/03 12:30:03 nsrd: bootz:/ saving to pool 'bootz' (PDB088)
> 01/07/03 12:30:04 nsrd: bootz:/usr saving to pool 'bootz' (PDB088)
> 01/07/03 12:30:04 nsrd: bootz:/nsr saving to pool 'bootz' (PDB088)
> 01/07/03 12:30:27 nsrd: bootz:/nsr done saving to pool 'bootz' (PDB088) 26 MB
> 01/07/03 12:31:57 nsrd: bootz:/ done saving to pool 'bootz' (PDB088) 340 MB
> 01/07/03 12:41:57 nsrd: media warning: /dev/nrmt5h writing: I/O error, at 
> file 4 record 1743
> 01/07/03 12:41:57 nsrd: media notice: tz89 tape PDB088 on /dev/nrmt5h is full
> 01/07/03 12:41:57 nsrd: media notice: tz89 tape PDB088 used 2045 MB of 40 GB 
> capacity
> ------
> 01/07/03 12:50:48 nsrd: media notice: Save set (453942018) bootz:/usr volume 
> PDB088 on /de
> v/nrmt5h is being terminated because: Media verification failed
> 01/07/03 12:50:48 nsrd: bootz:/usr done saving to pool 'bootz' (PDB088) 1669 
> MB
> 01/07/03 12:50:48 nsrd: write completion notice: Writing to volume PDB088 
> complete
> ------
> 01/07/03 12:50:06 savegrp: bootz:/usr will retry 4 more time(s)
> 01/07/03 12:50:08 nsrd: media warning: /dev/nrmt5h moving: fsr 1737: I/O error
> 01/07/03 12:50:09 nsrd: media emergency: could not position PDB088 to file 4, 
> record 1739
> 01/07/03 12:50:18 nsrd: media warning: /dev/nrmt5h reading: No such file or 
> directory
> ------
> Again, networker thinks the tape is full.  But this time it mounts another
> tape right away.  It starts saving, and again there is a tape full
> warning.
> ------
> 01/07/03 12:58:15 nsrd: media info: loading volume ALX383 into /dev/nrmt5h
> 01/07/03 13:00:13 nsrd: bootz:/usr saving to pool 'bootz' (ALX383)
> 01/07/03 13:09:02 nsrd: media warning: /dev/nrmt5h writing: I/O error, at 
> file 3 record 1746
> 01/07/03 13:09:02 nsrd: media notice: tz89 tape ALX383 on /dev/nrmt5h is full
> 01/07/03 13:09:03 nsrd: media notice: tz89 tape ALX383 used 1246 MB of 70 GB 
> capacity
> ------
> This happens with three more tapes, and finally it gives up.  When I
> restarted the backup (from the GUI, using the Groups button) it backed up
> to our AIT-2 jukebox.
>
> A further look in the log showed that the next DLT tape had the same
> problems.  This time, the second tape that was mounted for that pool was
> an AIT-2 tape, so there were no more notification errors.  Currently,
> nsrwatch shows that DLT tape is still mounted:
>
> /dev/nrmt5h  (J) tz89          PDW843       writing, data
>
> Does this sound to you all like there's a problem with the tape drive?
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=