ADSM-L

Re: How do you verify the Completion and A ccuracy of Backups and Restores?

2006-11-08 14:36:05
Subject: Re: How do you verify the Completion and A ccuracy of Backups and Restores?
From: "Prather, Wanda" <Wanda.Prather AT JHUAPL DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 8 Nov 2006 14:35:14 -0500
Ditto.

To start, set up the TSM Operational Reporter (TOR).
It is free as part of the product.
It works real well right out of the box, but can also be customized to
do some clever things.  

If your TSM folks are running a non-Windows TSM server, they may not be
familiar with it, as it is a Windows application.
If your TSM server is Windows, TOR gets installed when the server is
installed, but you still need to configure it.

If your TSM server isn't Windows, you'll need to install TOR separately
on a Windows host.
But it doesn't have to be a Windows server or anything fancy; you can
run it on your desktop.

It will tell you, every day, EXACTLY which backup schedules completed or
did not; which
clients had missed files, and what those files were.

It also scrapes the TSM activity log for any server-end messages that
need attention (although there are also frequently nuisance messages
that you will want to filter out, using the customization available in
TOR).

You can have the reports generated as HTML that is available for
browsing, or mailed to you.
Sounds like nobody has done this yet.  
SOMEBODY SHOULD REVIEW THIS REPORT EVERY DAY.  
AND ACTUALLY ATTEND TO THE THINGS THAT NEED ATTENDING.

You can read about TOR in the "monitoring your server" section of the
TSM Administrator's Guide.

If the missed backups you are referring to are data bases that are being
backed up using a TSM Data Protection agent (backing up through the
API), you may have to be creative about gathering the reports from those
logs (esp. with Oracle - I think you have to actually view the RMAN logs
to guarnatee that thoose worked correctly.)  But I have had success
writing very small scripts (e.g. perl) that scrape the information out
of those logs, and send it to be displayed in the TOR Daily report. 

Wanda Prather
"I/O, I/O, It's all about I/O"  -(me)
 

 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Mark Stapleton
Sent: Wednesday, November 08, 2006 11:44 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: How do you verify the Completion and A ccuracy of Backups
and Restores?

From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Wesley Smith
>       My problem is that they (that sister agency) do not seem to have
>a reliable way of verifying that all backups have been properly
>completed. They don't even seem to have a way to know that all files
>(that need to be backed up) are being backed up.  I've seen the reports
>that get generated during the backup process and I am definitely
>unimpressed.  Backups start and backups complete.  There doesn't seem
to
>be anything that says how many rows are copied or how large the files
>are or anything else that could be used for verifying the accuracy of
>the backups.  They tell my folks that we should trust Tivoli is doing
>the job correctly.  Trust is the problem....

Let's start there. When you look at the dsmsched.log file, that contains
a record of all scheduled backups and their outcomes, you should have a
record of what files are backed up, the size of the files, and the
timestamps give an idea of how long it took to back each file up. (This
is assuming that the QUIET feature is not present in the client option
file or the client option set designated for that TSM client.) If you're
using the specialized TSM agents for databases or mail apps, the
scheduled backup logs containing fairly granular information about
individual file backups. What more do you need?

>       We have needed to have restores done on just a few databases in
>the past and the restores were not complete and up to date.  In each
>case we were able to rebuild the data using logs maintained within the
>applications but that should not have been necessary.  Each recovery
was
>done at a point after a backup and before additional processing had
been
>done within the apps so they should have been complete.  In each case,
>the folks who run Tivoli for us were able to track down and show that
>problems had occurred during the processing of the backups.  They did
>this through circumstantial evidence and in each case once again said
>that they have no way of verifying that the backups are actually good.
>I hear a lot about the difficulty of trying to write a program to
>process the Tivoli log files.
>
>       I think I'm at wit's end with these folks and the product.  I
>know that the people are competent and I suspect that the product (like
>other things available from IBM) really is weak on the reporting and
>verification issue. 

While TSM itself does lack some reporting functionalities (particularly
when it comes to client backups and restores), I have to say this:

On every properly maintained and monitored TSM system I have touched in
the 12 years I've adminstered and engineered this product, I have
*never* lost a single byte of information. Period. If you cannot do a
restore because of "lost" data, something is happnening during backups
that is not being caught at the time of the backups.

>I'm hoping that someone out there in the Big Wide
>World has already solved this problem with an in-house or third-party
>solution.  Sorry for being so long winded.  Any ideas...?

I think what is needed here is greater familiarity with TSM and its
proper administration. Proper verification of good backups is best done
by regular DR practice of planned bare-metal restores of chosen
machines. If you can take data backed up by TSM and restore a given
machine in a DR environment, and the machine comes back properly, you
know the job is being done right. If it doesn't, *then* you dig into
*why*.

BTW, there are responses to this thread advocating ServerGraph and
Bocada for reporting and monitoring. Be aware that those applications do
a fine job of monitoring server operations. (Well, ServerGraph does,
anyway.) Their reporting, however, is not granular enough to indicate
whether a given file is being backed up properly.

--
Mark Stapleton (mark.s AT evolvingsol DOT com)
Senior TSM consultant