Networker

[Networker] SUMMARY: determing what files to backup based on modification tim es.

2003-10-29 18:42:02
Subject: [Networker] SUMMARY: determing what files to backup based on modification tim es.
From: Craig Ruefenacht <Craig.Ruefenacht AT US.USANA DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Wed, 29 Oct 2003 16:39:52 -0700
Howdy,

This is a 4 month old message that I posted (and got some feedback on, et
al), but I noticed I never posted a summary.  And since I'm cleaning out old
messages I've saved for various reasons, I'll summarize.

For reference, I've included my original post after my summary.  If you are
interested in this topic, you may want to review my original post in order
to make sense out of the summary.

My summary:

For starters, I failed to mention in my original post that the computer
systems I was discussing in my post was HP-UX 11i (both the machine with the
Oracle database and the Networker server).  Some pointed out that on Windows
platforms, Networker uses the archive bit on files to help determine what to
backup, et al.  Unfortunately on Unix, the archive bit doesn't exist.  We
are also using Networker v6.1.3.

A few other people mentioned that I could do a full on Sunday and then a
level-1 on all other days instead of the incremental, thus catching all the
files that have changed since the full on Sunday.  This would only solve the
problem during the week, because we sync up our Back's Saturday morning at
about 12:30am, and do the full save on Sunday at 6:00am (no BCV in-between
12:30am Saturday and Monday 4:00am).  The next BCV sync occurs at 4:00am
Monday morning.  Doing a subsequent level-1 on Monday at 6:00am would miss
all files that had a modification time between 12:30am Saturday and 6:00am
Sunday (the time difference between Saturday's BCV and Sunday's full
backup).  Of course we could always change our backup times to better handle
this as well.

Others mentioned that having Networker actually examine what's in the
on-line index versus what's on the filesystem and determining what to backup
based on those differences would create huge overhead, and I agree.

So, the fix that I implemented to get around the problem is what I've stuck
with, for now.  See original post below.


-----Original Message-----
From: Craig Ruefenacht [mailto:Craig.Ruefenacht AT US.USANA DOT COM]
Sent: Wednesday, June 25, 2003 11:56 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: [Networker] determing what files to backup based on
modification times.


Hi,

We discovered a perplexing problem recently (and have since implemented a
work-around solution), but I don't really like our work-around.  I'd like to
hear what others have to suggest.

The problem centered around not understanding how Networker determines what
files are to be backed up during the current backup session.

To demonstrate the problem we were having, here is a simple example.

Lets say that on Sunday morning at 6:00am, a full backup is performed of
filesystem ABC.  On Monday morning at 6:00am, an incremental backup is
performed of the same filesystem.  During the middle of the day on Monday, a
file, lets call it XYZ, is placed on the ABC filesystem which has a
modification timestamp of Sunday 8:30pm.

On Tuesday morning 6:00am, an incremental backup is performed of filesystem
ABC.  We noticed that file XYZ was not backed up on Tuesday, even though it
is a file that didn't exist when Monday's incremental ran, and, hence, is a
new file to the filesystem.  Shouldn't the incremental backup on Tuesday
backup this new file?

The logical answer is yes, but, if you look at the definition of an
incremental backup, it saves everything that has changed since the last
backup ran.  The keyword here is "changed".  How does Networker know that a
file has "changed" since the last backup?  By looking at the file's
modification timestamp.  Because file XYZ had a timestamp that pre-dated
Monday's incremental backup, when Tuesday's incremental ran, Networker
assumed that because XYZ's modification time was prior to the previous
incremental backup (Monday's), that the previous backup (Monday's) backed up
the file, so it wasn't backed up on Tuesday.  Networker has no way to know
that the file didn't even exist when Monday's incremental backups ran - it
assumes it did because the modification time says it did.

We have this kind of situation happen every day, because we have a EMC
symmetrix and have BCV volumes which we sync up once a day of our production
Oracle database.  A couple of hours after the BCV sync, we backup the BCV
volumes via Networker.  After a BCV sync has occurred (we do an establish,
let the volumes sync up, and then do a split), new files get written to the
production filesystems (and existing ones get modified).  If the
modification times of these files on the production filesystems occur after
the BCV sync but predate the time that Networker backs up the BCV (by the
time Networker backs up the BCVs the snapshot contained on the BCVs are a
couple of hours old), when the next BCV sync occurs the next morning, these
modified files on the production filesystems will be on the BCV volumes, but
their modification times will predate the Networker backup performed the
previous morning.  So the next Networker incremental backup will not backup
these files.

As a work-around, I wrote my own save script (Networker calls my save
script, not the Networker supplied save command).  My script takes the
command-line arguments that Networker passes to me and I decrement the "-t"
option a few hours.  I then pass the original command-line arguments with
the -t option modified, to Networker's save command.  This in effect tells
the save command to backup all files that have changed since the last
backup, minus a few hours.

I know that there are other ways of dealing with this problem, including
just doing a full backup each day, or doing various "level" backups instead
of doing an incremental backup, or forking out money to use the Oracle
module for Networker itself.  But even doing various "level" backups each
day can exhibit the same problem, if, for example, you do a level 3 backup
on Monday, a level 5 backup on Tuesday, and a level 4 backup on Wednesday.
It should be that Wednesday's level 4 backup would contain all files changed
since Monday's level 3 backup, but what if a file was placed on the
filesystem on Tuesday with a modification time that predates the level 3
backup on Monday?  You wouldn't put that file there manually, but if the
filesystem was part of a BCV, it could happen if the BCV is simply a
snapshot of a filesystem at a given time.  If the time between when the
snapshot was taken and the Networker backup was done is long enough, there
could be files that appear on the next snapshot which have a modification
time that predates the Networker backup.

With all of this in mind, does there exist any kind of methology to deal
with such a situation other than manually modifying the command-line
arguments to the Networker save command to decrement the -t option so that
the Networker save command will catch these files?

This scenario would apply to anyone who uses BCV volumes to mirror a
filesystem and then backup the BCV volume at some later time.  I only used
Oracle as an example because its the application we use BCVs for backup.









Craig Ruefenacht
UNIX Systems Administrator
USANA Health Sciences, Inc.
(801) 954-7559



--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>
  • [Networker] SUMMARY: determing what files to backup based on modification tim es., Craig Ruefenacht <=