Networker

Re: [Networker] Trouble with a clone tape?

2003-09-18 14:26:52
Subject: Re: [Networker] Trouble with a clone tape?
From: Robert Maiello <robert.maiello AT MEDEC DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 18 Sep 2003 14:26:45 -0400
Wow your log file has quite a few errors in it.  It looks like you have a
storage node..  The "nsrmmd restarting"  is troubling; the backup server
is losing communication with the storage node or the storage node software
is being restarted.  I don't know if that could cause an "elidgability"
error with the clones.

I'll take a wild guess at a possible explanation though:

a.)  The data starts cloning a saveset.. it completes or maybe doesn't
complete.
b.) The server loses communication to the tape drive/nsrmmd on the storage
node.
c.)  The server continues to clone after the tape drive comes back.
Problem is, if  saveset A is cloned to tape MAS_c004, and then it goes to
clone again it won't work to this tape. Networker will not clone a saveset
twice to the same clone tape..it will always ask for another tape.

That's all I can think of as to why it didn't like the clone tape?


Robert Maiello
Thomson Medical Economics

On Wed, 17 Sep 2003 14:24:27 -0400, George Sinclair
<George.Sinclair AT NOAA DOT GOV> wrote:

>Hi,
>
>I was in the process of cloning a volume (MAS005) as:
>
>nsrclone -v -s server -b 'MAS Clone' MAS005
>
>There was one writeable clone tape named 'MAS_c004'. The cloning
>operation was running smoothly, but when I came in the next morning, I
>saw the following message in the nwadmin window:
>
>Wed 10:36:35 AM media notice: Volume `MAS_c004' ineligible for this
>operation; Need a different volume from pool `MAS Clone'
>
>I should note that I was also running a recover on the same library, but
>not using the same tapes. There were also some backups running
>throughout the night, but once again, none of the tapes involved in the
>recovery or the cloning operation overlapped each other or anything
>having to do with the backups. Still, there was a lot of stuff going on.
>Maybe not such a hot idea to be doing all that at once.
>
>I checked the volumes window, and LTO volume MAS_c004 is only at 64%
>usage. It should have had plenty of space for the data on LTO volume
>MAS005 since MAS005 is only 7% full. This is why I didn't bother to
>prepare another clone tape before leaving for the night. I ran the
>mminfo command to also check the status of the tape, and it shows is at
>64% usage. I then checked the log files (/nsr/messages, /nsr/daemon.log)
>on the primary server, and there were some problems as shown below.
>Apparently, volume MAS005 was created back when we did not have our
>blocking size set correctly on the Red Hat Linux storage node server
>that manages the LTO library. In other words, this volume was created
>before we had an /etc/stinit.def file. I could reboot the host without
>the stinit.def file, but doing this seemed more suitable if I were only
>recovering the data, but since I'm trying to also write it to a new
>volume it didn't seem to make sense to do that. I considered creating
>another clone volume as requested, but I'm curious if the previous clone
>volume MAS_c004 is still valid if NetWorker won't append to it and
>insists on wanting a new clone volume? The volumes listing shows every
>saveset on MAS_c004 as 'cloned' with 09/16/03 as the date, times vary.
>Should I add a new clone volume and re-start the clone operation, or
>delete volume MAS_c004 and re-start over? I'm concerned that if
>NetWorker refuses to append to MAS_c004 then maybe the cloned savesets
>on there are no good?
>
>Looks like an I/O error occurred on one of the volumes that NetWorker
>was loading to run a backup during this time. I've provided the
>appropriate entries from the messages and deamon.log files below. Any
>volumes beginning with 'FUL' pertain to either the backups or the
>recovery operation. Those beginning with MAS pertain to the cloning
>operation.
>
>Would appreciate any help or advice.
>
>George
>George.Sinclair AT noaa DOT gov
>
><<< /nsr/messages >>>
>Sep 17 04:08:38 server root: [ID 702911 daemon.notice]
>Sep 17 04:21:31 server root: [ID 702911 daemon.notice] NetWorker media:
>(waiting) waiting for LTO Ultrium tape FUL402 on snode_server
>Sep 17 04:22:38 server root: [ID 702911 daemon.notice] NetWorker Media:
>(info) loading volume FUL402 into rd=snode_server:/dev/nst5
>Sep 17 04:24:58 server root: [ID 702911 daemon.notice] NetWorker media:
>(warning) rd=snode_server:/dev/nst5 reading: Input/output error
>Sep 17 04:24:59 server root: [ID 702911 daemon.notice] NetWorker media:
>(warning) rd=snode_server:/dev/nst5 reading: Tape label read:
>Input/output error
>Sep 17 04:25:54 server root: [ID 702911 daemon.notice] NetWorker Media:
>(info) loading volume FUL402 into rd=snode_server:/dev/nst5
>Sep 17 04:28:25 server root: [ID 702911 daemon.notice] NetWorker media:
>(warning) rd=snode_server:/dev/nst5 reading: Input/output error
>Sep 17 04:28:26 server root: [ID 702911 daemon.notice] NetWorker media:
>(warning) rd=snode_server:/dev/nst5 reading: Tape label read:
>Input/output error
>Sep 17 04:36:32 server root: [ID 702911 daemon.notice] NetWorker media:
>(critical) waiting for LTO Ultrium tape FUL402 on snode_server
>Sep 17 04:58:12 server root: [ID 702911 daemon.notice] NetWorker media:
>(alert) waiting for LTO Ultrium tape FUL402 on snode_server
>Sep 17 09:25:56 server root: [ID 702911 daemon.notice] NetWorker media:
>(info) Labeling a new writable volume for pool 'MAS Clone'
>Sep 17 09:25:57 server root: [ID 702911 daemon.notice] NetWorker media:
>(waiting) Waiting for 1 writable volumes to backup pool 'MAS Clone'
>tape(s) on snode_server
>Sep 17 09:25:58 server root: [ID 702911 daemon.notice] NetWorker media:
>(waiting) waiting for LTO Ultrium tape MAS005 on snode_server
>Sep 17 09:26:01 server root: [ID 702911 daemon.notice] NetWorker Media:
>(info) loading volume MAS005 into rd=snode_server:/dev/nst5
>Sep 17 09:32:18 server root: [ID 702911 daemon.notice] NetWorker media:
>(info) restarting nsrmmd #2 on server in 2 minute(s)
>Sep 17 09:32:19 server root: [ID 702911 daemon.notice] NetWorker media:
>(info) restarting nsrmmd #3 on server in 2 minute(s)
>Sep 17 09:32:19 server root: [ID 702911 daemon.notice] NetWorker media:
>(info) restarting nsrmmd #1 on server in 2 minute(s)
>Sep 17 09:33:51 server ntpd[258]: [ID 702911 daemon.notice] ntpd
>4.0.99k23 Mon May  7 23:34:42 EDT 2001 (2)
>Sep 17 09:33:51 server ntpd[258]: [ID 702911 daemon.notice] kernel time
>discipline status 0040
>Sep 17 09:33:56 server root: [ID 702911 daemon.notice] NetWorker
>snode_server: (notice) started
>Sep 17 09:36:50 server root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Volume "MAS005" on device "rd=snode_server:/dev/nst5": Block
>size is 512 bytes not 65536 bytes. Verify the device configuration. Tape
>positioning by record is disabled.
>Sep 17 10:31:24 server root: [ID 702911 daemon.notice] NetWorker media:
>(waiting) waiting for LTO Ultrium tape MAS005 on snode_server
>Sep 17 10:31:25 server root: [ID 702911 daemon.notice] NetWorker media:
>(info) Labeling a new writable volume for pool 'MAS Clone'
>Sep 17 10:31:26 server root: [ID 702911 daemon.notice] NetWorker media:
>(waiting) Waiting for 1 writable volumes to backup pool 'MAS Clone'
>tape(s) on snode_server
>Sep 17 10:31:31 server root: [ID 702911 daemon.notice] NetWorker Media:
>(info) loading volume MAS005 into rd=snode_server:/dev/nst4
>Sep 17 10:36:35 server root: [ID 702911 daemon.notice] NetWorker media:
>(notice) Volume `MAS_c004' ineligible for this operation; Need a
>different volume from pool `MAS Clone'
>Sep 17 10:36:40 server root: [ID 702911 daemon.notice] NetWorker media:
>(info) Labeling a new writable volume for pool 'MAS Clone'
>Sep 17 10:46:30 server root: [ID 702911 daemon.notice] NetWorker media:
>(critical) Waiting for 1 writable volumes to backup pool 'MAS Clone'
>tape(s) on snode_server
>Sep 17 12:57:22 server root: [ID 702911 daemon.notice] NetWorker media:
>(waiting) waiting for sdlt tape AT1107 on snode_server
>
>
><<< /nsr/daemon.log >>>
>09/17/03 04:22:38 AM nsrd: media info: loading volume FUL402 into
>rd=snode_server:/dev/nst5
>09/17/03 04:22:57 AM nsrd: rd=snode_server:/dev/nst5 Verify label
>operation in progress
>09/17/03 04:24:58 AM nsrd: media warning: rd=snode_server:/dev/nst5
>reading: Input/output error
>09/17/03 04:24:59 AM nsrd: media warning: rd=snode_server:/dev/nst5
>reading: Tape label read: Input/output error
>09/17/03 04:25:02 AM nsrd: rd=snode_server:/dev/nst5 Eject operation in
>progress
>09/17/03 04:25:48 AM nsrd: Jukebox 'rd=snode_server:STK_L80' failed:
>Tape label read: Input/output error
>09/17/03 04:25:54 AM nsrd: media info: loading volume FUL402 into
>rd=snode_server:/dev/nst5
>09/17/03 04:26:13 AM nsrd: rd=snode_server:/dev/nst5 Verify label
>operation in progress
>09/17/03 04:28:25 AM nsrd: media warning: rd=snode_server:/dev/nst5
>reading: Input/output error
>09/17/03 04:28:26 AM nsrd: media warning: rd=snode_server:/dev/nst5
>reading: Tape label read: Input/output error
>09/17/03 04:28:29 AM nsrd: rd=snode_server:/dev/nst5 Eject operation in
>progress
>09/17/03 04:29:15 AM nsrd: Jukebox 'rd=snode_server:STK_L80' failed:
>Tape label read: Input/output error
>09/17/03 04:36:32 AM nsrd: media critical event: waiting for LTO Ultrium
>tape FUL402 on snode_server
>09/17/03 04:58:13 AM nsrd: media alert event: waiting for LTO Ultrium
>tape FUL402 on snode_server
>09/17/03 09:23:28 AM nsrd: snode_server:root done browsing
>09/17/03 09:23:37 AM nsrd: client:/home (1/26/03) done reading 232 GB
>09/17/03 09:23:37 AM nsrd: media event cleared: LTO Ultrium tape FUL402
>not used
>09/17/03 09:25:56 AM nsrd: media info: Labeling a new writable volume
>for pool 'MAS Clone'
>09/17/03 09:25:57 AM nsrd: media waiting event: Waiting for 1 writable
>volumes to backup pool 'MAS Clone' tape(s) on snode_server
>09/17/03 09:25:58 AM nsrd: media waiting event: waiting for LTO Ultrium
>tape MAS005 on snode_server
>09/17/03 09:26:01 AM nsrd: media info: loading volume MAS005 into
>rd=snode_server:/dev/nst5
>09/17/03 09:26:20 AM nsrd: rd=snode_server:/dev/nst5 Verify label
>operation in progress
>09/17/03 09:27:00 AM nsrd: rd=snode_server:/dev/nst5 is now write
>enabled
>09/17/03 09:27:00 AM nsrd: rd=snode_server:/dev/nst5 Mount operation in
>progress
>09/17/03 09:27:06 AM nsrd: media event cleared: confirmed mount of
>MAS005 on rd=snode_server:/dev/nst5
>09/17/03 09:32:06 AM nsrd: media event cleared: Waiting for 1 writable
>volumes to backup pool 'MAS Clone' tape(s) on snode_server
>09/17/03 09:32:06 AM nsrd: cloning session:save sets done reading
>09/17/03 09:32:18 AM nsrd: shutting down remote daemons
>09/17/03 09:32:18 AM nsrd: nsrindexd has exited with status 1
>09/17/03 09:32:18 AM nsrd: media info: restarting nsrmmd #2 on server in
>2 minute(s)
>09/17/03 09:32:19 AM nsrd: media info: restarting nsrmmd #3 on server in
>2 minute(s)
>09/17/03 09:32:19 AM nsrd: media info: restarting nsrmmd #1 on server in
>2 minute(s)
>09/17/03 09:32:20 AM nsrd: nsrmmdbd has exited with status 0
>09/17/03 09:36:50 AM nsrd: media notice: Volume "MAS005" on device
>"rd=snode_server:/dev/nst5": Block size is 512 bytes not 65536 bytes.
>Verify the device configuration. Tape positioning by record is disable
>
>--
>Note: To sign off this list, send a "signoff networker" command via email
>to listserv AT listmail.temple DOT edu or visit the list's Web site at
>http://listmail.temple.edu/archives/networker.html where you can
>also view and post messages to the list.
>=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>