Networker

Re: [Networker] Identifying suspect savesets and clflags attribute?

2004-07-23 16:48:51
Subject: Re: [Networker] Identifying suspect savesets and clflags attribute?
From: Darren Dunham <ddunham AT TAOS DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 23 Jul 2004 13:50:10 -0700
> Hi,
>
> Is there a command or a way to script or query the database for
> "suspect" savesets?

Yes.  But note that 'suspect' does not apply to an entire saveset, but
instead to a particular clone.

> Can the clflags attribute really show suspect
> savesets that have not been cloned?

Yes.   Every saveset has at least one "clone".  If you've "clone"d a
saveset, then it will have more than one.

clflags and the other clone reports will report on each clone of a
saveset separately.

> If so, seems like a contradictory
> attribute name if you didn't clone the saveset(s).

Perhaps.

> Here's the deal. We had a large (137 GB) full saveset (3 volumes) that
> was identified in the GUI saveset recover and volumes window with a
> status of 'browsable suspect'. However, nobody noticed this at the time.
> Recently, we had to recover this saveset. Everything went smoothly on
> the first tape, and it read part of the data on the second tape and then
> generated a read error:
>
> Jul 18 06:51:29 server root: [ID 702911 daemon.notice] NetWorker Media:
> (info) loading volume FUL718
> into rd=snode:/dev/nst5
> Jul 18 09:38:13 server root: [ID 702911 daemon.notice] NetWorker media:
> (info) can not read record
> 6328 of file 135 on sdlt tape FUL718
>
> It then immediately moved onto the third  (last) tape, read everything
> there okay and completed, but the recovery was incomplete due to the
> tape 2 problem. We got our data back by going back to a previous full
> with subsequent incrementals. I guess the read error is not surprising
> given that the status was 'suspect'.

Was it suspect before the restore?  When you do a restore and it
generates errors, it will mark the clone suspect at that time.

> Anyway, after all this, I'd really
> like a way to identify this in the future by possibly scripting
> something to check the database so I don't have to crawl through every
> saveset in the recover window.

mminfo -av -q 'suspect' -r '...<whatever report you want here>...'

> First, we do have "Auto media verify" turned on for all pools,

Okay, but that's not going to verify most data.  That really only checks
that the last few bytes were written properly (could be a problem with
some tape drives and buffering).  It doesn't go back and physically try
to re-read all the data.  That's probably best done by cloning.

> and I
> checked the savegroup completion notification for the group, and that
> savset was listed with a 'V' in front of it, not that means much.

Right, it doesn't.

 Next,
> I checked the man page for mminfo, and I see that there is a 'suspect'
> option, but this seems to be used for identifying savesets that were
> reported as such during recovery and not backup.

That's the only time it should occur.  If there's a write problem during
backup, the session is aborted and marked failed.  Clones only become
suspect when problems occur during a later read.

> Also, I see this
> clflags option that can report suspect stuff, but that seems to be for
> clones.

Which is what you want.  Think 'instance' instead of 'clone' if it makes
more sense.

> The man page describes clfags as: "The clone flags summary, from
> the set ais for aborted, incomplete and suspect (read error),
> respectively." I know cloning can validate the readability of a saveset
> since it has to read the data as it's cloning it, but this data we'd
> prefer not to clone. We can re-run a backup if we know it's suspect,
> though, and we'd like to know as soon after it completes, before too
> much time has passed.

With only one copy, you are simply depending on nothing bad happening to
the tape in the future.

You need to either have multiple copies, clones of this copy, or live
with the possibility of a bad tape causing problems in the future.

> Anyway, I tried running mminfo, using the ssid of the saveset both with
> and without the clflags option, and I do see an 's' reported, but only
> when running with the clflags, and these tapes have never been cloned.

Right.

> mminfo -vq 'ssid=3834391297,valid' -r
> 'volume,client,name,state,ssflags,sumflags,clflags'
>  volume        client   name                              ssflags fl
> clflg
> FUL716         client1  /1-raid5/exports/www              vF     hb s
> FUL718         client1  /1-raid5/exports/www              vF     mb s
> FUL720         client1  /1-raid5/exports/www              vF     tb s
>
> It seems odd that clflags would report something that has not been
> cloned, but if that's the magic ticket then that's fine by me. Is this
> the best or only way to do this? Does "Auto media verify" factor into
> this at all?

-q 'suspect' will limit your query to all suspect clones in the database.

AMV won't help you here (in general)

--
Darren Dunham                                           ddunham AT taos DOT com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=