On Wed, 8 Jan 2003, Carston Locher wrote:
> Penny,
>
> Also you should have specific error messages in /nsr/logs/daemon.log
> that should help explain what happened.
>
> Could you check if there are specific errors and provide them to the
> mailing list?
>
[long message]
I spent some time fishing through daemon.log, but I found some interesting
stuff.
Bootz was backing up to our DLT jukebox:
------
01/07/03 09:40:38 nsrd: bootz:/tmp saving to pool 'bootz' (ALX377)
01/07/03 09:40:39 nsrd: bootz:/usr saving to pool 'bootz' (ALX377)
01/07/03 09:40:39 nsrd: bootz:/ saving to pool 'bootz' (ALX377)
01/07/03 09:40:39 nsrd: bootz:/nsr saving to pool 'bootz' (ALX377)
01/07/03 09:40:42 nsrd: bootz:/tmp done saving to pool 'bootz' (ALX377) 2 KB
01/07/03 09:41:26 nsrd: media warning: /dev/nrmt5h writing: I/O error, at file
63 record 6
01/07/03 09:41:26 nsrd: media notice: tz89 tape ALX377 on /dev/nrmt5h is full
01/07/03 09:41:26 nsrd: media notice: tz89 tape ALX377 used 9771 MB of 40 GB
capacity
01/07/03 09:54:04 nsrd: media warning: /dev/nrmt5h moving: fsf 1: I/O error
01/07/03 10:07:26 nsrd: media warning: /dev/nrmt5h moving: fsf 63: I/O error
------
For some reason, it kept trying to fsf 63 every 15 minutes or so:
------
01/07/03 11:41:41 nsrd: media warning: /dev/nrmt5h moving: fsf 63: I/O error
01/07/03 11:41:41 nsrd: media emergency: could not position ALX377 to file 63,
record 65
------
01/07/03 12:23:00 nsrd: media warning: verification of volume "ALX377", volid
2834174209 f
ailed, can not read record 65 of file 63 on tz89 tape ALX377
01/07/03 12:23:00 nsrd: media notice: verification of volume "ALX377", volid
2834174209 fa
iled, volume is being marked as full.
01/07/03 12:23:07 nsrd: media notice: Save set (451340034) bootz:/nsr volume
ALX377 on /de
v/nrmt5h is being terminated because: Media verification failed
* bootz:/nsr ! no output
01/07/03 12:23:08 savegrp: bootz:/nsr will retry 5 more time(s)
01/07/03 12:23:12 nsrd: bootz:/nsr done saving to pool 'bootz' (ALX377)
------
So if nsrd knew the tape was full at 09:41:26, why was it trying to fsf
for over 2 more hours? Why didn't it mark it full? Why is it full when
there is so little of it used? Did /tmp succeed because there was only 2K
on it?
So, another tape is mounted. (Note - other lines about the tape mounting
were not included below for brevity)
------
01/07/03 12:28:15 nsrd: media info: loading volume PDB088 into /dev/nrmt5h
01/07/03 12:30:03 nsrd: bootz:/ saving to pool 'bootz' (PDB088)
01/07/03 12:30:04 nsrd: bootz:/usr saving to pool 'bootz' (PDB088)
01/07/03 12:30:04 nsrd: bootz:/nsr saving to pool 'bootz' (PDB088)
01/07/03 12:30:27 nsrd: bootz:/nsr done saving to pool 'bootz' (PDB088) 26 MB
01/07/03 12:31:57 nsrd: bootz:/ done saving to pool 'bootz' (PDB088) 340 MB
01/07/03 12:41:57 nsrd: media warning: /dev/nrmt5h writing: I/O error, at file
4 record 1743
01/07/03 12:41:57 nsrd: media notice: tz89 tape PDB088 on /dev/nrmt5h is full
01/07/03 12:41:57 nsrd: media notice: tz89 tape PDB088 used 2045 MB of 40 GB
capacity
------
01/07/03 12:50:48 nsrd: media notice: Save set (453942018) bootz:/usr volume
PDB088 on /de
v/nrmt5h is being terminated because: Media verification failed
01/07/03 12:50:48 nsrd: bootz:/usr done saving to pool 'bootz' (PDB088) 1669 MB
01/07/03 12:50:48 nsrd: write completion notice: Writing to volume PDB088
complete
------
01/07/03 12:50:06 savegrp: bootz:/usr will retry 4 more time(s)
01/07/03 12:50:08 nsrd: media warning: /dev/nrmt5h moving: fsr 1737: I/O error
01/07/03 12:50:09 nsrd: media emergency: could not position PDB088 to file 4,
record 1739
01/07/03 12:50:18 nsrd: media warning: /dev/nrmt5h reading: No such file or
directory
------
Again, networker thinks the tape is full. But this time it mounts another
tape right away. It starts saving, and again there is a tape full
warning.
------
01/07/03 12:58:15 nsrd: media info: loading volume ALX383 into /dev/nrmt5h
01/07/03 13:00:13 nsrd: bootz:/usr saving to pool 'bootz' (ALX383)
01/07/03 13:09:02 nsrd: media warning: /dev/nrmt5h writing: I/O error, at file
3 record 1746
01/07/03 13:09:02 nsrd: media notice: tz89 tape ALX383 on /dev/nrmt5h is full
01/07/03 13:09:03 nsrd: media notice: tz89 tape ALX383 used 1246 MB of 70 GB
capacity
------
This happens with three more tapes, and finally it gives up. When I
restarted the backup (from the GUI, using the Groups button) it backed up
to our AIT-2 jukebox.
A further look in the log showed that the next DLT tape had the same
problems. This time, the second tape that was mounted for that pool was
an AIT-2 tape, so there were no more notification errors. Currently,
nsrwatch shows that DLT tape is still mounted:
/dev/nrmt5h (J) tz89 PDW843 writing, data
Does this sound to you all like there's a problem with the tape drive?
--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|