Networker

Re: [Networker] Power failure and DB corruption - RESOLVED

2013-03-11 12:50:45
Subject: Re: [Networker] Power failure and DB corruption - RESOLVED
From: Michael Leone <Michael.Leone AT PHA.PHILA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 11 Mar 2013 12:42:55 -0400
SO it looks like I may have dodged half a bullet, at least. Along with 
Tech Support (who was extremely helpful, BTW), I was able to do a mmrecov. 
Doesn't look like I need to do any nsrck -L7s, thankfully. (altho later I 
may do some nsrck -L6, just to be safe). That got me back my media 
history, so my mminfo queries were actually showing what they should. 
(haven't had time to see if it's all browsable or not). Luckily, I only 
had 2 AFTD devices that had data written to them on Sat./Sun, and I am in 
the process of scanning those, and will then manually clone them. And I 
have about 10 tapes that were written to between Sat/Sun, which I will 
have to scan, if I want to have a record of those backups.

As I say, I was lucky my bootstrap backup finished when it did, and I have 
it email me a report, so doing the mmrecov was pretty simple, especially 
since the latest bootstrap was *not* in the media db before I started (due 
to the corruption). 
HINT: Be sure to save those emails! And send a copy offsite, to another 
mailbox/cloud storage, too.

TS also told me how to export the media DB using the "nsrmmdbasm" command, 
which they say can then also be used to transfer the media from one server 
to another.
HINT: set that up as a scheduled task, too, just to be on the safe side.

So outside of a lot of scanning for the 2 days of weekend jobs that 
executed after the crash and corruption, I *should* be good to go.

-- 
Michael Leone
Network Administrator, ISM
Philadelphia Housing Authority
2500 Jackson St
Philadelphia, PA 19145
Tel:  215-684-4180
Cell: 215-252-0143
<mailto:michael.leone AT pha.phila DOT gov>


EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU> wrote on 
03/11/2013 09:21:32 AM:

> From: Michael Leone <Michael.Leone AT PHA.PHILA DOT GOV>
> To: NETWORKER AT LISTSERV.TEMPLE DOT EDU, 
> Date: 03/11/2013 09:23 AM
> Subject: [Networker] Power failure and DB corruption
> Sent by: EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
> 
> Gotta love Mondays ...
> 
> On Sat around 1PM, apparently our electrical power decided it didn't 
feel 
> like hanging out with us anymore, and left to go hang out somewhere 
else. 
> So our UPS kicked in. The generator is supposed to kick in at this 
point, 
> and the batteries are there to only hold us over until the generator 
> finishes coming back online.
> 
> This has worked in the past, all smoothly and as expected. But not that 
> day ...
> 
> But the generator decided it didn't want to be bothered, and never 
turned 
> on ... (well, it had apparently had a coolant leak that we didn't know 
> about, and it wouldn't turn on because of that. A cascade of failures 
> ...). And so there was no electrical main power, and no generator, and 
the 
> UPS was left with it's batteries.  And when the batteries got too low 
...
> 
> (you see where I'm going with this, right?)
> 
> CRASH. Everything came down hard.  Luckily (?)  the mains kicked in 
right 
> around then. And it took so long for the main network switch to come up, 

> that DNS resolution was failing (among other things). So NW assigned new 

> client IDs to it's clients. And (apparently) purged a lot of histiory 
> while it was at it, since my log and index drive went from 8G free to 
85G 
> free ...
> 
> <SIGH>
> 
> I have a severity 1 call into EMC, just waiting for a call back. I 
foresee 
> having to do a DR-level recovery, a mmrecov, in order to straighten 
things 
> back out. Luckily, I had a full bootstrap and CFI backup (savegrp -O -l 
> full)  that finished about an hour before this all happened, so - if I 
do 
> have to do a mmrecov - I will just lose the stuff that happened (or 
tried 
> to happen) after the crash, from Sat 1PM onward. But as long as I can 
get 
> back all my client histories for the last 5 years, I'll be happy. (right 

> now, I don't have everything - there are holes in the backup history for 

> various clients, etc, days and months missing, etc).
> 
> (there was a decision that we didn't need to run the UPS software on the 

> servers, to gracefully shut them down if/when the batteries go out; 
that's 
> why nothing shut down gracefully)
> 
> -- 
> Michael Leone
> Network Administrator, ISM
> Philadelphia Housing Authority
> 2500 Jackson St
> Philadelphia, PA 19145
> Tel:  215-684-4180
> Cell: 215-252-0143
> <mailto:michael.leone AT pha.phila DOT gov>