ADSM-L

Re: Keeping an handle on client systems' large drives

2002-06-14 02:00:45
Subject: Re: Keeping an handle on client systems' large drives
From: Dan Foster <dsf AT GBLX DOT NET>
Date: Fri, 14 Jun 2002 05:55:57 +0000
Hot Diggety! Seay, Paul was rumored to have written:
> Ask them where they were on 9-11-2001.  Are they totally brain dead?

Ahhh, so that's what you referred to in passing in the other post.

That's all right, and understandable.

I have a first rate appreciation of this. If you'll allow me to indulge
briefly on a tangentially related (but not completely) issue on this
list, just once...

I used to be a VMS admin. Best, most robust OS that I ever worked with -
probably true for the IBM mainframes but didn't work much with them, alas.
(A little OS/400, DOS/VSE, and one or two other related OSes)

Anyway, come post-9/11, a *lot* of financial firms were in a world of
hurt. The ones who planned and re-tested over and over again, each year,
for an alternate site a good distance away from NYC, was able to reopen
for business only a few days later. Many were based in NJ or about an
hour west/north of NYC... one was even based not too far from home, their
DR site being about 4-5 hours northwest of NYC.

Around this time, I heard that Compaq (company that bought out DEC)
was making a lot of frantic calls all around the country seeking out high
end machines such as the AlphaServer 8400s and VAX 7000s...that had been
discontinued for perhaps 10 years since, because a lot of customers were
suddenly calling in for warranty replacements (under their expensive
support contracts) in NYC and DC -- you can guess what kind of "customer"
it was in DC. How desperate was Compaq? They were calling up even third
level resellers of used equipment that they would normally never ever think
of talking to.

Compaq was in a nasty hole, because they had run out of set-aside reserve
spares. Fab plants *long* since shut down...they can't just "take the
original plans and re-fab", since the engineers no longer there... I'm not
sure how they eventually resolved that... probably offered newer machines to
customers and provided migration assistance at Compaq's cost, is my guess.

But what the bean counters don't realize is that it doesn't take a
catastrophic national event to mean a bad effect on the business bottom
line, which I find unfortunate. Can be all sorts of more 'mundane' (albeit
not very common) events such as that train which burned in a Baltimore
tunnel and closed a part of downtown near Oriole Park at Camden Yards.
My company (used to also own a telco) was personally affected by an homeless
man burning something in a former abandoned railroad tunnel that melted
fiber optics and took out OC-12 to the area for 12+ hours, with a nice
number of servers based out of here.

It doesn't have to be a corporation for a nasty disaster to mean bad
things for their bottom line. I am very well reminded of a colossal failure
at an academic institution almost a decade ago that was a chain of events
ultimately resulting in failure of a critical drive in a RAID-5 array,
and the tapes weren't really usable for recovery...which they found out
the hard way. An entire semester of classwork was effectively disrupted,
with much data lost, before they were finally able to convince DEC to
send out the very best people to recover about 80% off the RAID-5 array
through some custom work. So many classes, projects, research papers, etc.
were affected that it just simply isn't funny. Same place where if the
IBM mainframe ever went down, school was closed for the day. (Happened
only once ever, to best of my knowledge.)

...and that is truly unfortunate, that the people who are actually tasked
to make things happen, like us, understand and appreciate, whereas others
higher up may not share the same view, knowledge, and experience.

In a D/R scenario, it also behooves you to know your power sources, how
they kick in, at what levels, how fast/when, evacuation plans, how to
config PBXes, have emergency equipment handy (eg flashlights), and a million
other details. Hardware that can be quickly hooked up/activated, written
step by step plan nearby, software CDs handy if needed, dry runs done,
backups/restores/app operation verified, and all of this tested once or
twice a year depending on level of need and impact, etc.

Still, I resolve to do my best to do whatever I can realistically do. :)

With that said, I now return you to the normal *SM discussions. ;)
(with the reason for copy stgpools driven home ;) )

-Dan Foster
IP Systems Engineering (IPSE)
IP Systems Engineering (IPSE)
Global Crossing Telecommunications