Networker

Re: [Networker] How deleted files affect saveset recover versus nwrecover?

2004-07-29 05:17:15
Subject: Re: [Networker] How deleted files affect saveset recover versus nwrecover?
From: Davina Treiber <Treiber AT HOTPOP DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 29 Jul 2004 10:15:31 +0100
George Sinclair wrote:
This has me thinking that since nwrecover is such a slick, fast and
swell solution 90% of the time, but has the disadvantage of recovering
previously deleted files that you may not want back, why not just have a
script that runs every night and produces some kind of recursive file
listing of the given path and would itself get backed up. Then, if you
ever had to recover the data, you could just run saveset recover for
whatever instances were needed, and then recover the file listing (maybe
part of the same recover) as of the date of the last saveset recover
instance. Finally, you run a special script that reads through all the
data and uses the listing that you recovered to figure out what's out
there now that wasn't there then and then deletes it? Of course, this is
basically what nwrecover accomplishes, but this would get around the
slowness and limited browse time of nwrecover and would solve the
problem of figuring out which recovered files you don't want.

Whacha think?

You seem to be getting the terminology a little confused above.
nwrecover and recover are functionally the same thing, except one is GUI
and one is command line. The distinction is between index-based and save
set recoveries.

The way I look at it, index-based recovery is the correct and normal way
to do all recoveries, unless there is some reason that it can't be done.

Save set recovery is fine for a straightforward full backup but falls
apart when you have any incrementals or differentials in the picture.
IMO recovering all those deleted files is unacceptable. There are many
examples where this would be damaging, such as a mail spool or print
spool. Apart from that it's just plain not a right thing to do.
Additionally the process of recovering and overwriting the same files
(the ones that change every day) many times is potentially very
time-consuming depending on your system. Another point is that bringing
back old files might mean that there is no longer sufficient space in
the filesystem to complete the recovery.

I don't quite see why you would want to replace Legato's tried and
tested indexing system with some clunky script system that sounds like
it would involve a error-prone manual process to achieve a point-in-time
recovery. If performance of index-based recoveries is the issue, then
this should be addressed by better hardware and by pushing Legato to
optimise their code. If limited browse time is the problem, you can
implement a system where you keep occasional index backups for a long
time, hence allowing you to use nsrck -L7 to make old backups browsable
again. This can be a very effective strategy, allowing you to have
almost unlimited browse time.

I think you ought to reconsider the design of your backups with respect
to the number of successive incrementals you run. IMHO you shouldn't
have more than about 4 or 5 incrementals (10 is definitely too many
IMHO) before you run the next full or a level backup, due to the amount
of time required to run the recoveries. You should perhaps consider
switching to differentials, or perhaps slot in a differential every few
days if more fulls is unacceptable. If you just want a monthly full,
perhaps add a weekly L5 into the cycle to reduce the number of volumes
required for a full point-in-time recovery. Something like:
F-i-i-i-i-i-i-5-i-i-i-i-i-i-5-i.... would work quite well.



George Sinclair wrote:

Thanks, Darren!

Yes, I was referring to having 'store index entries' set to 'yes'. You
confirmed what I thought about the indexes. If you have this feature set
to yes, then using nwrecover is obviously nice because it doesn't
recover previously deleted files, but it can be slow as a dog at times,
making saveset recover a faster option in many cases and preferred even
though you get the deleted files back that you don't want. I think for
the most part, getting back the deleted files is okay for us and is
certainly a small price to pay in exchage for geting the other data back
that was not meant to be deleted!

But there are a few cases wherein one would really not want to recover
previously deleted files. For example, consider data that's going to be
ingested into a database or some such thing and is sitting in a holding
area pending further action. Maybe somebody wipes out some or all of the
data so you have to recover it, and if you use saveset recover then you
end up with a bunch of previously deleted files, so someone will then
have to get rid of those before they can ingest it which means searching
through the whole directory to figure out what should no longer be
there. That could be a lot of work, but the question is: "Is that more
work than waiting for nwrecover to finally wake up?" LOL! I guess it's a
trade off. Of course, if it was just a small directory under the main
one then you could just specify that on the recover paths window of
nwrecover and this wouldn't be so bad, but having to recover the whole
enchilada might result in a lot of effort to locate and remove those
"previously deleted" files since you might be dealing with a lot of
them. I guess I can always tell people the advantages and disadvantages
of each method and let them decide. As long as they understand then it's
fine by me.

George

Darren Dunham wrote:

Hi,

Here's a question that I think was explained to me one time in the past,
but I find myself lacking an answer now and I'm confused.

Let's suppose you run a full against something like /0/path and you have
indexing turned on for the given pool.

What exactly do you mean by that?  Are you talking about having 'store
index entries' set to 'yes' in the pool?


You then run say 10 incrementals
over the next 10 days. Prior to the backups, folks are creating,
modifying and deleting data under /0/path. Now, the incrementals should
be capturing all the files that have either been added or modified since
the previous incremental, but what happens to the deleted files? Is the
index updated with this info? Here's where I'm going with this. If I use
nwrecover to change the date to the last incremental then I should see
an exact picture of the way /0/path looked as of the last incremental,
so any previously deleted files from before that should NOT show up but
any ones deleted after that should, correct? Isn't this the whole point
of the index in that it allows you to change the browse time to reflect
the way the path looked at that time never mind what happened after or
before?

The indexes store which files are in which saveset.  If you had indexes
off, you would only know that it was a (for example) level 9 backup on
Tuesday at 4:34 am.  You would not know any of the contents.  As a
result, you could not browse the backup in nwrecover.


If, in this example, you instead used saveset recover to recover the
original full instance of /0/path, and then you recovered all 10
incrementals (overwriting any identical file names), then the deleted
files would be recovered and would not be removed as you progressed
through the recovers, so you'd end up with the original, the latest
versions of the modified files and all the files that were ever deleted
since the full, right? This seems kind of bogus. I mean, you now have
extra files, right? So what are you to do with those, and how would you
identify them? It seems clear to me that within the browse policy,
saveset recover and nwrecover do not really achieve exactly the same
thing. The files recovered from nwrecover are a subset of what saveset
recover would result in, right? Am I correct in saying that in the
example above, saveset recover = newrecover + all the deleted files? Hmm

Maybe someone can straighten me out here. Why use saveset recover if you
end up with extra files?

1) It might not matter for you.
2) You might not have a choice (you're past the browse period)
3) It could be faster/easier than doing a 'nwrecover', especially for
  savesets with large numbers of files.


What's the advantage of saveset recover *IF*
you have an index and you're recovering data within the browse policy?

It might be much faster than messing with 'nwrecover'.  You might
understand that for your application, recovering previously deleted
files will not cause a problem.


One reason I can think of is that a huge index can take a long time to
load, but saveset recover is a no-brainer that requires no real mental
anguish on the part of the client or the server. We had to run this
recently because nwrecover just sat there all day and never loaded.

Exactly.

--

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=