Networker

Re: [Networker] nsrclone stalls

2008-03-17 14:07:11
Subject: Re: [Networker] nsrclone stalls
From: Jim Thorsley <thor AT CSD.UWO DOT CA>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 17 Mar 2008 13:59:22 -0400
I resolved this. Seems there was an errant entry in /etc/hosts
on our DNS server. Strikes me as very odd symptoms. Oh well...

-Jim

On Wed, 2008-03-12 at 14:47 -0400, Jim Thorsley wrote:
> Hi.
> 
> I'm trying to clone some savesets and it fails. I've been doing these
> clones every month to provide off-site backup and they have been working
> fine. Last Friday the network manager changed the IP addresses on the
> backup server and storage node. Now I can't clone. Regular incremental
> backup seem to work but when I try to clone the monthly fulls it stalls.
> 
> The command I use is:
> 
> nsrclone -b "$ClonePool" -S `mminfo -t '4 weeks ago' -q
> level=full,pool="$Pool" -r ssid`
> 
> which gets any full backups performed over the last 4 weeks and clones
> them.
> 
> What happen is that when I go to clone some savesets the library loads
> the tape to read and one to write then just sits there. Occassionally,
> nsrclone will spit out "nsrclone command has retried 2 times.". When I
> add a -v to the nsrclone command it would say:
> 
> nsrclone: NSR server giles.csd.uwo.ca: busy
> nsrclone: waiting 30 seconds then retrying
> 
> and it would continue to do this forever. I've included a chunk of the
> nsr daemon log below.
> 
> The server (giles.csd.uwo.ca) is a Sun Enterprise 450 running Solaris 8.
> The storage node (arion.csd.uwo.ca) is a SunFire V880 running Solaris
> 10.
> The version of Networker on both machines is Sun Enterprise Backup 7.4.
> The version of networker for the clients varies from 5.5 to 7.4 but
> since this is just between the server and the storage node I can't see
> that makes any difference.
> 
> Any ideas on how to fix this?
> 
> -Jim
> 
> 
> -start-daemon.log-------------------------------------------------------------
> 0 12/03/08 11:28:51 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd Operation
> 50 started : Load volume `FIL341L1', volume id `4258369864'.. 
> 42502 12/03/08 11:28:51 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> waiting event: Waiting for 1 writable volumes to backup pool 'Gaul
> Clone' tape(s) on arion.csd.uwo.ca 
> 0 12/03/08 11:28:51 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd Operation
> 51 started : Load volume `FKA833L1', volume id `3878346835'.. 
> 42502 12/03/08 11:28:51 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> waiting event: waiting for LTO Ultrium-2 tape FKA833L1 on
> arion.csd.uwo.ca 
> 38752 12/03/08 11:29:31 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> rd=arion.csd.uwo.ca:/dev/rmt/3cbn Verify label operation in progress 
> 38752 12/03/08 11:29:43 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> rd=arion.csd.uwo.ca:/dev/rmt/3cbn Mount operation in progress 
> 42504 12/03/08 11:29:44 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> event cleared: Waiting for 1 writable volumes to backup pool 'Gaul
> Clone' tape(s) on arion.csd.uwo.ca 
> 12361 12/03/08 11:29:47 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> [Jukebox `rd=arion.csd.uwo.ca:hellmouth', operation # 50]. Finished with
> status: succeeded 
> 38752 12/03/08 11:30:01 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn Verify label operation in progress 
> 38752 12/03/08 11:30:11 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn Mount operation in progress 
> 42504 12/03/08 11:30:15 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> event cleared: confirmed mount of FKA833L1 on
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn 
> 11348 12/03/08 11:30:15 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:30:15 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 2 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/3cbn with volume FIL341L1 on host
> arion.csd.uwo.ca 
> 42506 12/03/08 11:30:15 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd write
> completion notice: Writing to volume FIL341L1 complete 
> 11388 12/03/08 11:30:16 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> re-setting overall number of authorized save sessions, changing value
> from (1) to (0) 
> 12361 12/03/08 11:30:19 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> [Jukebox `rd=arion.csd.uwo.ca:hellmouth', operation # 51]. Finished with
> status: succeeded 
> 42502 12/03/08 11:30:45 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> waiting event: waiting for LTO Ultrium-2 tape FKA833L1 on
> arion.csd.uwo.ca 
> 11348 12/03/08 11:35:16 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:35:16 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 1 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn with volume FKA833L1 on host
> arion.csd.uwo.ca 
> 42504 12/03/08 11:35:28 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> event cleared: confirmed mount of FKA833L1 on
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn 
> 39078 12/03/08 11:35:28 AM  0 0 2 4 20635 0 giles.csd.uwo.ca nsrjobd
> Jobs error: Unable to find record for job 59515 during an attempt to
> send message to it  
> 11348 12/03/08 11:35:28 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:35:28 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 3 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/3cbn with volume FIL341L1 on host
> arion.csd.uwo.ca 
> 42506 12/03/08 11:35:28 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd write
> completion notice: Writing to volume FIL341L1 complete 
> 11388 12/03/08 11:35:36 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> re-setting overall number of authorized save sessions, changing value
> from (1) to (0) 
> 42502 12/03/08 11:35:58 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> waiting event: waiting for LTO Ultrium-2 tape FKA833L1 on
> arion.csd.uwo.ca 
> 11348 12/03/08 11:40:35 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:40:35 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 2 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn with volume FKA833L1 on host
> arion.csd.uwo.ca 
> 42504 12/03/08 11:40:39 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> event cleared: confirmed mount of FKA833L1 on
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn 
> 11348 12/03/08 11:40:39 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:40:39 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 3 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/3cbn with volume FIL341L1 on host
> arion.csd.uwo.ca 
> 42506 12/03/08 11:40:39 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd write
> completion notice: Writing to volume FIL341L1 complete 
> 11388 12/03/08 11:40:45 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> re-setting overall number of authorized save sessions, changing value
> from (1) to (0) 
> 42502 12/03/08 11:41:09 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> waiting event: waiting for LTO Ultrium-2 tape FKA833L1 on
> arion.csd.uwo.ca 
> 39078 12/03/08 11:44:52 AM  0 0 2 4 20635 0 giles.csd.uwo.ca nsrjobd
> Jobs error: Unable to find record for job 59515 during an attempt to
> send message to it  
> 39078 12/03/08 11:44:52 AM  0 0 2 4 20635 0 giles.csd.uwo.ca nsrjobd
> Jobs error: Unable to find record for job 59515 during an attempt to
> send message to it  
> 11348 12/03/08 11:45:40 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:45:40 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 1 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn with volume FKA833L1 on host
> arion.csd.uwo.ca 
> 42504 12/03/08 11:45:50 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd media
> event cleared: confirmed mount of FKA833L1 on
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn 
> 11348 12/03/08 11:45:50 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:45:50 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 3 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/3cbn with volume FIL341L1 on host
> arion.csd.uwo.ca 
> 42506 12/03/08 11:45:50 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd write
> completion notice: Writing to volume FIL341L1 complete 
> 11388 12/03/08 11:45:55 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> re-setting overall number of authorized save sessions, changing value
> from (1) to (0) 
> 11348 12/03/08 11:50:52 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> deactivating mmd #0 
> 11349 12/03/08 11:50:52 AM  2 0 0 1 20621 0 giles.csd.uwo.ca nsrd
> Calling mm_deactivate for mmd 2 thats using device
> rd=arion.csd.uwo.ca:/dev/rmt/1cbn with volume FKA833L1 on host
> arion.csd.uwo.ca 
> -end-------------------------------------------------------------------------
> 
> -- 
> Jim Thorsley <thor AT csd.uwo DOT ca>
> Dept. of Computer Science, U.W.O.
> 
> To sign off this list, send email to listserv AT listserv.temple DOT edu and 
> type "signoff networker" in the body of the email. Please write to 
> networker-request AT listserv.temple DOT edu if you have any problems with 
> this list. You can access the archives at 
> http://listserv.temple.edu/archives/networker.html or
> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
-- 
Jim Thorsley <thor AT csd.uwo DOT ca>
Dept. of Computer Science, U.W.O.

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>