Networker

Re: [Networker] Problem restoring an older NDMP backup

2004-06-10 15:34:48
Subject: Re: [Networker] Problem restoring an older NDMP backup
From: Stan Sander <ssande AT SANDIA DOT GOV>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 10 Jun 2004 13:34:28 -0600
Stan Horwitz wrote:

Each night, we use our NetWorker 6.1.3 (on Solaris 9) to do a full disk
image of each of our five Mirapoint message stores. These backups are
retained for seven days, then recycled and we do periodic restore testing
to an auxiliar Mirapoint message store.

On Saturday, May 29 at 1:18am, our NetWorker server crashed and rebooted
itself. At the point of the reboot, three of our Mirapoint message stores
were being backed up via NDMP.  The server did a reboot and it restarted
fine. My suspicion (from logging at the logs) is that some software had a
memory leak that caused the server to crash. The log file shows a warning
about not enough memory being available a few seconds prior to the crash.

We need to restore from the backup of a Mirapoint system we call po-d,
which is one of the three Mirapoint message stores that was interrupted
when our NSR server crashed. For some reason, the po-d backup restarted on
its own and it completed without any errors. The backups of other message
stores that were in progress at the time of the crash never restarted, but
they left files in /nsr/tmp. The backup that we want to restore from
consists of three tapes and a total of 210GB of data and its from the po-d
message store. The other NDMP backups that were in progress also involve
backing up a lot more data than than po-d holds so maybe that's why they
did not restart.

As it happens, one of our users made the bad decision to delete all her
email later that same day and as luck would have it, this user's email
resides on po-d. On June 1, one of our help desk consultants was able to
restore some of the user's email from the trash can that the Mirapoint
message stores maintain, but she is missing a lot of email.

We are now trying to recover that May 29th backup, my colleage who manages
our email servers starts up the "recover"  utility on our NetWorker server
and changes time to 05/29/04, the backup utility acknowledges this request
without any complaints. He then directs the recovered data to our spare
Mirapoint server, and still no problem. But when the actual recover
process begins, NetWorker proceeds to recover the most recent data instead
of the data from the May 29th backup.

What's really a pain in the neck is if we interrupt an NDMP recover, it
corrupts the destination system, which requires a complete OS install so
we let the recover process run to completion, which takes several hours.
This is what is happening as I am typing this message. In addition, the
save set in question is listed as browsable in nwadmin, but it doesn't
seem to really be recoverable.

While it is not the end of the world if we cannot complete this restore
properly, I am wondering if somehow, the backup on the 29th is corrupt,
considering its time stamp is about three minutes before the system
crashed and it should have four tapes instead of three.  Would that
explain why the recover utility insists on recovering the most recent
saveset? I am assuming the answer is yes and I have already asked the help
desk manager who has been working with this user to inform her that she
will most likely not get the remainder of her email back, but we are still
trying.

Unfortunately, our NSR server crashed at the start of the Memorial Day
weekend and I did want to reboot the server before the weekend since it
had been running for 35 days, but some people needed to do recoveries so I
didn't get a chance to do that. We also do not run a 24x7 shop and since I
am the only person on staff here who manages our NetWorker server, I did
not investigate the crash until Tuesday, June 1. Our backups also ran to
fruition on the remainer of that weekend. Just as a precaution, I manually
shut down the NSR daemons, manually cleared out the contents of /nsr/tmp,
and then rebooted our NSR server. Everything came up fine and it has been
working fine since.  We have also done some other restores since June 1
and they worked fine.

Oddly enough, we also have a request where someone deleted her entire
calendar from her boss's Mirapoint account, but from a different save set
on a different day (which I extended via nsrmm) and on a different
Mirapoint server. We try to do NDMP restores at all costs because we do
not really have the resources for it, but I have a need to put some data
on our auxiliar Mirapoint server to test a new tape drive so I encouraged
these requests. Restoring that person's calendar data worked fine,
although the user turned out to give us the wrong day to restore from, but
that's another story.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=


Stan,

What does mminfo show for the ssflags?  A normal NDMP backup would be
vNF.  Have you tried nsrck -L6 po-d?

--
Stan Sander - CSU Special Projects
Unix Systems/Server Administrator
(505)284-4915

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>