Veritas-bu

[Veritas-bu] possible problem with cluster backups

2001-07-11 17:14:17
Subject: [Veritas-bu] possible problem with cluster backups
From: jmeyer AT ptc DOT com (Jonathan Meyer)
Date: Wed, 11 Jul 2001 17:14:17 -0400
   From: scott.kendall AT abbott DOT com
   Date: Wed, 11 Jul 2001 14:01:18 -0500

   If it is a client, no problem.  Either back up each node and all of
   it's drives using the ALL_LOCAL_DRIVES file directive

   OR

   backup each node's local drives and then backup the alias's
   (virtual server's) drives through it's name.

   For example, you can back up Server1 and Server2 in a class that
   contains System_State:\ and C:\.  Then back up VirtualServer1 in a
   class that contains a file list of E:\.

   I prefer the second because it makes restores easier.  I can open
   the backup/restore window for the alias and see all backups,
   regardless of which node owned them when the backup occurred.  If
   you use the first method, you could potentially need to restore a
   full from Server1, a few incrementals from Server2, and possibly a
   few more incrementals from Server1 again.  And if you missed one
   because you didn't realize one of the incrementals was on Server2
   because Server1 normally owns the resource, you would be missing
   files/changes after the restore.


I have not thought about this issue very carefully, but I have a gut
feeling that there is a problem with the first scenario listed above
for cluster backups.  I can tell you definitely that I backup my unix
clusters using method two and it works nicely.

The plan which concerns me is summarized as:

    back up each node and all of it's drives using the
    ALL_LOCAL_DRIVES file directive

Here is an example of a possible problem.  Imagine a cluster with
nodes A and B, and a shared filesystem called shared_fs.  Shared_fs is
on node A to start with.

You do a full backup of cluster nodes A and B, but they don't start at
the same instant.  Full backup for node A begins at 12:00, B's backup
doesn't kick off until two minutes later, at 12:02.

Suppose file1 on shared_fs changed at 12:01, between the start times
of the two backups.  Also, suppose that file1 had been captured on
node A's full backup at 12:00:30, just before it changed.

After those two full backups finish, we have a valid copy of file1 as
it existed at 12:00:00, but the changes which occurred at 12:01 are
not on tape.

Now, a day later we go to run incrementals.  If shared_fs is still on
node A, then file1 will be included in the incremental.  However, if
the cluster has failed over to node B, there could be a problem.  I
think that an incremental for node B will only get files which changed
since node B's last full backup at 12:02.  File1 last changed at
12:01, so it would not be included.

I am not certain that I have accurately described the behaviors of
netbackup's incrementals in the above scenario, but it seems like a
possible concern.

--------------------------------------------------
Jonathan Meyer
(781)370-6594
UNIX Systems Administrator
Paramtric Technology Corporation
--------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>