Networker

Re: [Networker] backup of server failed

2003-01-08 14:57:53
Subject: Re: [Networker] backup of server failed
From: Penny Jaye Deeney <penny AT TEMPLE DOT EDU>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Wed, 8 Jan 2003 14:57:49 -0500
On Wed, 8 Jan 2003, Carston Locher wrote:

> Penny,
>
> Also you should have specific error messages in /nsr/logs/daemon.log
> that should help explain what happened.
>
> Could you check if there are specific errors and provide them to the
> mailing list?
>

[long message]

I spent some time fishing through daemon.log, but I found some interesting
stuff.

Bootz was backing up to our DLT jukebox:
------
01/07/03 09:40:38 nsrd: bootz:/tmp saving to pool 'bootz' (ALX377)
01/07/03 09:40:39 nsrd: bootz:/usr saving to pool 'bootz' (ALX377)
01/07/03 09:40:39 nsrd: bootz:/ saving to pool 'bootz' (ALX377)
01/07/03 09:40:39 nsrd: bootz:/nsr saving to pool 'bootz' (ALX377)
01/07/03 09:40:42 nsrd: bootz:/tmp done saving to pool 'bootz' (ALX377) 2 KB
01/07/03 09:41:26 nsrd: media warning: /dev/nrmt5h writing: I/O error, at file 
63 record 6
01/07/03 09:41:26 nsrd: media notice: tz89 tape ALX377 on /dev/nrmt5h is full
01/07/03 09:41:26 nsrd: media notice: tz89 tape ALX377 used 9771 MB of 40 GB 
capacity
01/07/03 09:54:04 nsrd: media warning: /dev/nrmt5h moving: fsf 1: I/O error
01/07/03 10:07:26 nsrd: media warning: /dev/nrmt5h moving: fsf 63: I/O error
------

For some reason, it kept trying to fsf 63 every 15 minutes or so:
------
01/07/03 11:41:41 nsrd: media warning: /dev/nrmt5h moving: fsf 63: I/O error
01/07/03 11:41:41 nsrd: media emergency: could not position ALX377 to file 63, 
record 65
------
01/07/03 12:23:00 nsrd: media warning: verification of volume "ALX377", volid 
2834174209 f
ailed, can not read record 65 of file 63 on tz89 tape ALX377
01/07/03 12:23:00 nsrd: media notice: verification of volume "ALX377", volid 
2834174209 fa
iled, volume is being marked as full.
01/07/03 12:23:07 nsrd: media notice: Save set (451340034) bootz:/nsr volume 
ALX377 on /de
v/nrmt5h is being terminated because: Media verification failed
* bootz:/nsr ! no output
01/07/03 12:23:08 savegrp: bootz:/nsr will retry 5 more time(s)
01/07/03 12:23:12 nsrd: bootz:/nsr done saving to pool 'bootz' (ALX377)
------

So if nsrd knew the tape was full at 09:41:26, why was it trying to fsf
for over 2 more hours?  Why didn't it mark it full?  Why is it full when
there is so little of it used?  Did /tmp succeed because there was only 2K
on it?

So, another tape is mounted. (Note - other lines about the tape mounting
were not included below for brevity)

------
01/07/03 12:28:15 nsrd: media info: loading volume PDB088 into /dev/nrmt5h
01/07/03 12:30:03 nsrd: bootz:/ saving to pool 'bootz' (PDB088)
01/07/03 12:30:04 nsrd: bootz:/usr saving to pool 'bootz' (PDB088)
01/07/03 12:30:04 nsrd: bootz:/nsr saving to pool 'bootz' (PDB088)
01/07/03 12:30:27 nsrd: bootz:/nsr done saving to pool 'bootz' (PDB088) 26 MB
01/07/03 12:31:57 nsrd: bootz:/ done saving to pool 'bootz' (PDB088) 340 MB
01/07/03 12:41:57 nsrd: media warning: /dev/nrmt5h writing: I/O error, at file 
4 record 1743
01/07/03 12:41:57 nsrd: media notice: tz89 tape PDB088 on /dev/nrmt5h is full
01/07/03 12:41:57 nsrd: media notice: tz89 tape PDB088 used 2045 MB of 40 GB 
capacity
------
01/07/03 12:50:48 nsrd: media notice: Save set (453942018) bootz:/usr volume 
PDB088 on /de
v/nrmt5h is being terminated because: Media verification failed
01/07/03 12:50:48 nsrd: bootz:/usr done saving to pool 'bootz' (PDB088) 1669 MB
01/07/03 12:50:48 nsrd: write completion notice: Writing to volume PDB088 
complete
------
01/07/03 12:50:06 savegrp: bootz:/usr will retry 4 more time(s)
01/07/03 12:50:08 nsrd: media warning: /dev/nrmt5h moving: fsr 1737: I/O error
01/07/03 12:50:09 nsrd: media emergency: could not position PDB088 to file 4, 
record 1739
01/07/03 12:50:18 nsrd: media warning: /dev/nrmt5h reading: No such file or 
directory
------
Again, networker thinks the tape is full.  But this time it mounts another
tape right away.  It starts saving, and again there is a tape full
warning.
------
01/07/03 12:58:15 nsrd: media info: loading volume ALX383 into /dev/nrmt5h
01/07/03 13:00:13 nsrd: bootz:/usr saving to pool 'bootz' (ALX383)
01/07/03 13:09:02 nsrd: media warning: /dev/nrmt5h writing: I/O error, at file 
3 record 1746
01/07/03 13:09:02 nsrd: media notice: tz89 tape ALX383 on /dev/nrmt5h is full
01/07/03 13:09:03 nsrd: media notice: tz89 tape ALX383 used 1246 MB of 70 GB 
capacity
------
This happens with three more tapes, and finally it gives up.  When I
restarted the backup (from the GUI, using the Groups button) it backed up
to our AIT-2 jukebox.

A further look in the log showed that the next DLT tape had the same
problems.  This time, the second tape that was mounted for that pool was
an AIT-2 tape, so there were no more notification errors.  Currently,
nsrwatch shows that DLT tape is still mounted:

/dev/nrmt5h  (J) tz89          PDW843       writing, data

Does this sound to you all like there's a problem with the tape drive?

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=