Networker

Re: [Networker] aborted savesets when cloning?

2003-05-12 05:01:13
Subject: Re: [Networker] aborted savesets when cloning?
From: Joaquin Camp <joaquin.camp AT PROACT DOT SE>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Mon, 12 May 2003 10:59:29 +0200
  Hi George,

  Yes, I think this is correct, at least is what we suspect too.
It seems like NetWorker sometimes doesn't manage to clean up the status
of the savesets.  You would see this on almost every client with
NetWorker releases pre 6.x.  Or with BSM's that have the 2 GB chunks.
Some sites, seems to just manage to avoid this issues...  But I do not
know how!?

  Thanks again.
/Joaquin Camp.


On Fri, 2003-05-09 at 17:14, George Sinclair wrote:
> We are not running 6.1.3. We're running 6.1.1 under Solaris 8. It was
> explained to me that this business of "aborted" is simply NetWorker's
> weird way of telling you that it's still working on it. When I first
> started experimenting with cloning, I was using rather small savesets. I
> didn't notice anything amiss because they completed so fast. You check
> the volume listing and you see nothing. The next time you check they're
> done. However, when I started testing large savesets, again I would not
> see anything in the volumes listing at first. After a while, though, the
> current cloning sessions (4 in our case since I was cloning like 9 or so
> and the drive sessions was set to 4) would show up with a status of
> "aborted". I thought this meant that NetWorker had aborted the cloning
> operation, but what I failed to notice was that if had run out of tape
> and was requesting another writable clone volume. When I re-tested, I
> added another tape and the "aborted" message continued to show up until
> the affected savesets were completed. Once the cloning operation was
> done on those, the status changed to "recoverable", but again, the next
> ones that continue after that likewise show the "aborted" status until
> they complete. I would have thought NetWorker would use something more
> sane like: "in-progress", but I guess not, or at least not in our
> version.
>
> George
>
> Robert Maiello wrote:
> >
> > I do quite a bit of cloning with 6.1.2 on Solaris.  I don't seem to have
> > this issue..at least on my clone tapes.  It seems like they should be hard
> > to find later as nsrck will remove or clean them up?
> >
> > I did have some backup tapes that weren't recycling.. when listing them is
> > was seen that there were savesets on them that were listed as aborted or
> > in-progress.  I thought nsrck was suppose to clean these up but I guess
> > some manual running of nsrck with options and nsrim are needed every so
> > often.   I simple recycled these tapes.   I'm not showing any now.
> >
> > The indication that this is a 6.1.3 issue is disturbing.  That version is
> > always reccommended to me on my support calls...it should be the one with
> > the most fixed issues ...sigh.
> >
> > Robert Maiello
> > Thomson Medical Economics
> >
> > On Fri, 9 May 2003 10:23:35 +0200, Joaquin Camp <joaquin.camp AT PROACT DOT 
> > SE> wrote:
> >
> > >  Hi George,
> > >
> > >  Yes, this is a known issue by now.  We have some customers that have
> > >this issue.  We have reproduced the issue in our lab environment.
> > >Legato is informed and working on a fix for this.
> > >This is what we have sorted out until know:
> > >
> > >Savesets marked with ssflags "ca" (complete aborted)
> > >The best way to prevent savesets being marked "ca", is to run Networker
> > >6.1.2. This includes the clients as well.
> > >
> > >
> > >When do they occour?...
> > >They occour when stageing or cloning savesets from disk to tape. All
> > >versions of Networker pre 6.x, are affected. And all BSM that make 2GB
> > >chunks.
> > >
> > >We have seen this problem even with Networker 6.x. And we can see it for
> > >sure, with Networker 6.1.3. Therefore, I recommend to install 6.1.2 for
> > >best results.
> > >
> > >NetWare clients are affected aswell. Even if the Networker software is
> > >"new", it hasen't being major changes in the Networker software for
> > >NetWare. Therefore it works as a pre 6.x, which the release version also
> > >tells.
> > >
> > >
> > >What could happen?...
> > >The tests runed with savesets marked as "ca", shows that we have been
> > >able to recover, both individual files and whole savesets. So even if
> > >this seems to be more of a cosmetic issue. Big problems can occour.
> > >Examples: The retention policy's are not longer valid on this savesets.
> > >Therefore the volumes can be overwritten before they should. You will
> > >not longer be able to clone/stage does savesets once they are marked
> > >with "ca".
> > >
> > >  My recommendation, is to open a support call to Legato.  That way they
> > >got to speed up the process of making a working fix.
> > >
> > >  Thanks and have a nice day!
> > >/Joaquin Camp.
> > >
> > >
> > >On Thu, 2003-05-08 at 17:29, George Sinclair wrote:
> > >> When I say reported as "aborted" I mean running something like:
> > >>
> > >> 'mminfo -av -s server clone_volume_name'
> > >>
> > >> shows them as having an "aborted" status. Also, they show up in the
> > >> volumes window with a status of "aborted". You may be right, though,
> > >> because it did in fact run out and was requesting another tape, but
> > >> here's where I have a problem with this theory as the culprit. The drive
> > >> has a max sessions value of 4. There were 10 pathnames listed in my
> > >> input file. The first 5 are rather small, about 4 GB, and the next 5 are
> > >> much larger at around 30 GB. The first 5 completed with no problems. I
> > >> know that because these all were listed with a status of "recoverable"
> > >> by the time the cloning process began the first of the big guns. Now,
> > >> this was a brand new LTO tape with 100 GB of native capacity. 5 savesets
> > >> at 4 GB each only adds up to 20 GB. That still leaves at least 80 GB
> > >> remaining. I have a hard time believing that the next saveset at 30 GB
> > >> could not fit on that tape. It should have, in which case the space
> > >> problem should not have been an issue with the first of the large
> > >> savesets, so I wouldn't have expected this to be a cause of the problem.
> > >> On the other hand, I do only see 4 of the larger savesets listed for the
> > >> volume, along with the 5 small ones for a total of 9. Obviously, it
> > >> never started on the 5th large one. So, it would appear that all 4 of
> > >> the larger ones were multi-plexed to the tape, in which case I guess it
> > >> could not finish any "one" before it ran out of space. Maybe I'll try
> > >> again, and only list two of the large savesets and see what happens. I'm
> > >> sure you're right about the "out of space" theory causing the aborted
> > >> problem, but since I'd seen this before when cloning several small
> > >> savesets, where space on the tape was never an issue, I became
> > >> concerned. I find it odd that NetWorker doesn't report these as
> > >> "in-progress" or something more meaningful.
> > >>
> > >> Thanks for your response! I will re-test.
> > >>
> > >> George
> > >>
> > >> Carl Farnsworth wrote:
> > >> >
> > >> > What do you mean by "reported as aborted"?  You also mentioned the 
> > >> > tape is
> > >> > filled up.  Is it possible the clone session is still active and 
> > >> > waiting
> > >> > for a second tape?
> > >> >
> > >> > I also got confused by this my first time using scripted cloning.  The
> > >> > volumes disply GUI for the clone volume will show an "a" flag for the
> > >> > savesets that are currently being cloned.  At first I thought this
> > >> > meant "a" for active, but I later realized that NetWorker is setting 
> > >> > the
> > >> > flag as "a" for aborted, until it's succesfully completed!
> > >> >
> > >> > (P.S.  This is how I first realized that the cloning operation was not 
> > >> > de-
> > >> > multiplexing the savesets).
> > >> >
> > >> > HTH
> > >> > Carl Farnsworth
> > >> > DigiDyne Inc.
> > >> >
> > >> > On Tue, 6 May 2003 16:58:11 -0400, George Sinclair
> > >> > <George.Sinclair AT NOAA DOT GOV> wrote:
> > >> >
> > >> > >Hi,
> > >> > >
> > >> > >I've noticed that whenever I create a file containing a list of ssids,
> > >> > >and I run the clone command as:
> > >> > >
> > >> > >nsrclone -s server -b 'Clone Pool Name' -S -f input_file
> > >> > >
> > >> > >or if I pass them in as:
> > >> > >
> > >> > >nsrclone -s server -b 'Clone Pool Name' -S ssid1 ssid2 ....
> > >> > >
> > >> > >then the operation works, but there's always one, and sometimes 
> > >> > >several,
> > >> > >that are reported as "aborted". Deleting them and re-running them
> > >> > >wouldn't be so bad except that in this case, they're like 30 GB
> > >> > >savesets! I can't reclaim my space on the tape. The last time I tried 
> > >> > >to
> > >> > >clone 9 savesets, 5 of them were aborted when I came back. I'm going 
> > >> > >to
> > >> > >have to just end up re-labeling the tape because it's filled up, and 
> > >> > >the
> > >> > >only ones that were not aborted are only like 7 GB each, for a total 
> > >> > >of
> > >> > >maybe 35 Gb. Not really worth sacrificing the whole tape just for 
> > >> > >those
> > >> > >few. Anyone have any ideas what causes this abort business?
> > >> > >
> > >> > >I notice that if I run them one at a time, I don't see this problem. 
> > >> > >I'm
> > >> > >not seeing problems during backups, and I'm not seeing errors on the
> > >> > >devices.
> > >> > >
> > >> > >Thanks.
> > >> > >
> > >> > >George
> > >> >
> > >> > --
> > >> > Note: To sign off this list, send a "signoff networker" command via 
> > >> > email
> > >> > to listserv AT listmail.temple DOT edu or visit the list's Web site at
> > >> > http://listmail.temple.edu/archives/networker.html where you can
> > >> > also view and post messages to the list.
> > >> > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
> > >>
> > >> --
> > >> Note: To sign off this list, send a "signoff networker" command via email
> > >> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> > >> http://listmail.temple.edu/archives/networker.html where you can
> > >> also view and post messages to the list.
> > >> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
> > >
> > >--
> > >Note: To sign off this list, send a "signoff networker" command via email
> > >to listserv AT listmail.temple DOT edu or visit the list's Web site at
> > >http://listmail.temple.edu/archives/networker.html where you can
> > >also view and post messages to the list.
> > >=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
> >
> > --
> > Note: To sign off this list, send a "signoff networker" command via email
> > to listserv AT listmail.temple DOT edu or visit the list's Web site at
> > http://listmail.temple.edu/archives/networker.html where you can
> > also view and post messages to the list.
> > =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
--
        Joaquin Camp
        ProAct Legato Core Team

        ProAct IT Sweden AB
        Phone +46 8 410 667 14
        joaquin.camp AT proact DOT se
        www.proact.se

        ----------------------------
        The InfoStructure Specialist
        ----------------------------

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=