Networker

Re: [Networker] Identifying suspect savesets and clflags attribute?

2004-07-23 17:31:01
Subject: Re: [Networker] Identifying suspect savesets and clflags attribute?
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 23 Jul 2004 17:33:06 -0400
Thanks, Darren. I forgot that every saveset is a clone of itself, but
I'm still shaky on the suspect thing.

If I understand correctly, if we had NEVER recovered any data from any
of those tapes at any point in time, and if the saveset was identified
as 'browsable suspect' immediately after the original backup completed
then what command would have shown us this? mminfo -vq
'ssid=value,suspect' or mminfo -vq 'ssid=value' -r 'clflags'?

Are you actually saying that if we had NEVER read those tapes that the
saveset would NOT have shown up in the GUI as 'browsable suspect'? I
know I've seen this happen a few times in the past, but in this case,
mminfo does not report it as incomplete. I run mminfo with something
like -r 'ssflags,sumflags', and it reports the three instances of that
saveset as vF for the ssflags value and hb, mb and tb respectively for
the sumflags, but I don't see anything to indicate recycled (E) or
incomplete (i) or anything else that you'd expect if the saveset had
failed, and I would think that's what you'd see if it was listed in the
volumes window as suspect, immediately after completing.

Can a saveset be listed in the volumes window as suspect if you've never
read the tape? I'm just a little confused here, and I wanna put together
a script that I can run each morning to check for any of these bad
apples that ran the night before, and we will not have read from them or
cloned them. We can only afford to run clone certain special data sets,
but I can always re-run a backup for a failed or suspect saveset.

Thanks.

Darren Dunham wrote:
>
> > Hi,
> >
> > Is there a command or a way to script or query the database for
> > "suspect" savesets?
>
> Yes.  But note that 'suspect' does not apply to an entire saveset, but
> instead to a particular clone.
>
> > Can the clflags attribute really show suspect
> > savesets that have not been cloned?
>
> Yes.   Every saveset has at least one "clone".  If you've "clone"d a
> saveset, then it will have more than one.
>
> clflags and the other clone reports will report on each clone of a
> saveset separately.
>
> > If so, seems like a contradictory
> > attribute name if you didn't clone the saveset(s).
>
> Perhaps.
>
> > Here's the deal. We had a large (137 GB) full saveset (3 volumes) that
> > was identified in the GUI saveset recover and volumes window with a
> > status of 'browsable suspect'. However, nobody noticed this at the time.
> > Recently, we had to recover this saveset. Everything went smoothly on
> > the first tape, and it read part of the data on the second tape and then
> > generated a read error:
> >
> > Jul 18 06:51:29 server root: [ID 702911 daemon.notice] NetWorker Media:
> > (info) loading volume FUL718
> > into rd=snode:/dev/nst5
> > Jul 18 09:38:13 server root: [ID 702911 daemon.notice] NetWorker media:
> > (info) can not read record
> > 6328 of file 135 on sdlt tape FUL718
> >
> > It then immediately moved onto the third  (last) tape, read everything
> > there okay and completed, but the recovery was incomplete due to the
> > tape 2 problem. We got our data back by going back to a previous full
> > with subsequent incrementals. I guess the read error is not surprising
> > given that the status was 'suspect'.
>
> Was it suspect before the restore?  When you do a restore and it
> generates errors, it will mark the clone suspect at that time.
>
> > Anyway, after all this, I'd really
> > like a way to identify this in the future by possibly scripting
> > something to check the database so I don't have to crawl through every
> > saveset in the recover window.
>
> mminfo -av -q 'suspect' -r '...<whatever report you want here>...'
>
> > First, we do have "Auto media verify" turned on for all pools,
>
> Okay, but that's not going to verify most data.  That really only checks
> that the last few bytes were written properly (could be a problem with
> some tape drives and buffering).  It doesn't go back and physically try
> to re-read all the data.  That's probably best done by cloning.
>
> > and I
> > checked the savegroup completion notification for the group, and that
> > savset was listed with a 'V' in front of it, not that means much.
>
> Right, it doesn't.
>
>  Next,
> > I checked the man page for mminfo, and I see that there is a 'suspect'
> > option, but this seems to be used for identifying savesets that were
> > reported as such during recovery and not backup.
>
> That's the only time it should occur.  If there's a write problem during
> backup, the session is aborted and marked failed.  Clones only become
> suspect when problems occur during a later read.
>
> > Also, I see this
> > clflags option that can report suspect stuff, but that seems to be for
> > clones.
>
> Which is what you want.  Think 'instance' instead of 'clone' if it makes
> more sense.
>
> > The man page describes clfags as: "The clone flags summary, from
> > the set ais for aborted, incomplete and suspect (read error),
> > respectively." I know cloning can validate the readability of a saveset
> > since it has to read the data as it's cloning it, but this data we'd
> > prefer not to clone. We can re-run a backup if we know it's suspect,
> > though, and we'd like to know as soon after it completes, before too
> > much time has passed.
>
> With only one copy, you are simply depending on nothing bad happening to
> the tape in the future.
>
> You need to either have multiple copies, clones of this copy, or live
> with the possibility of a bad tape causing problems in the future.
>
> > Anyway, I tried running mminfo, using the ssid of the saveset both with
> > and without the clflags option, and I do see an 's' reported, but only
> > when running with the clflags, and these tapes have never been cloned.
>
> Right.
>
> > mminfo -vq 'ssid=3834391297,valid' -r
> > 'volume,client,name,state,ssflags,sumflags,clflags'
> >  volume        client   name                              ssflags fl
> > clflg
> > FUL716         client1  /1-raid5/exports/www              vF     hb s
> > FUL718         client1  /1-raid5/exports/www              vF     mb s
> > FUL720         client1  /1-raid5/exports/www              vF     tb s
> >
> > It seems odd that clflags would report something that has not been
> > cloned, but if that's the magic ticket then that's fine by me. Is this
> > the best or only way to do this? Does "Auto media verify" factor into
> > this at all?
>
> -q 'suspect' will limit your query to all suspect clones in the database.
>
> AMV won't help you here (in general)
>
> --
> Darren Dunham                                           ddunham AT taos DOT 
> com
> Senior Technical Consultant         TAOS            http://www.taos.com/
> Got some Dr Pepper?                           San Francisco, CA bay area
>          < This line left intentionally blank to confuse you. >
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=