ADSM-L

FW: [ADSM-L] How do you verify the Completion and A ccuracy of Backups and Restores?

2006-11-09 12:01:35
Subject: FW: [ADSM-L] How do you verify the Completion and A ccuracy of Backups and Restores?
From: Wesley Smith <Wesley.Smith AT LA DOT GOV>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 9 Nov 2006 11:00:32 -0600
 Thanks, Wanda.

I believe the TOR tool is what they are currently using.  I think a
large part of their problem is that the reports are so large, it is
impossible for one person to go through the report every day with any
amount of reliability.  I know that they are responsible for handling
the backups of well over 100 servers and that it is being done by just
one person.  I've seen the report as currently generated and noted a
number of problems with it.  The report runs at a scheduled time rather
than having job triggers that would kick it off after the successful
completion of all backups.  As a result, the report will show backups
that started but without showing that they have completed.  On some
days, there will be very few of these.  On other days, quite a few.
Throwing stuff like that into the mix of the real errors and other
"pseudo errors" and you find yourself trying to chase down a lot of
non-errors.

I will be passing along to the appropriate people that perhaps there is
some additional filtering that could be done to these reports to reduce
their size to something that is more manageable.  I'm hoping that we
will be able to come up with some filtering and scripting aids that will
help to automate this process as much as possible and reduce to a
minimum the need for the Tivoli support person to spend a lot of time
every day just reviewing the night's work.

Thanks again for your time and help.

Wesley

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Prather, Wanda
Sent: Wednesday, November 08, 2006 1:35 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] How do you verify the Completion and A ccuracy of
Backups and Restores?

Ditto.

To start, set up the TSM Operational Reporter (TOR).
It is free as part of the product.
It works real well right out of the box, but can also be customized to
do some clever things.  

If your TSM folks are running a non-Windows TSM server, they may not be
familiar with it, as it is a Windows application.
If your TSM server is Windows, TOR gets installed when the server is
installed, but you still need to configure it.

If your TSM server isn't Windows, you'll need to install TOR separately
on a Windows host.
But it doesn't have to be a Windows server or anything fancy; you can
run it on your desktop.

It will tell you, every day, EXACTLY which backup schedules completed or
did not; which clients had missed files, and what those files were.

It also scrapes the TSM activity log for any server-end messages that
need attention (although there are also frequently nuisance messages
that you will want to filter out, using the customization available in
TOR).

You can have the reports generated as HTML that is available for
browsing, or mailed to you.
Sounds like nobody has done this yet.  
SOMEBODY SHOULD REVIEW THIS REPORT EVERY DAY.  
AND ACTUALLY ATTEND TO THE THINGS THAT NEED ATTENDING.

You can read about TOR in the "monitoring your server" section of the
TSM Administrator's Guide.

If the missed backups you are referring to are data bases that are being
backed up using a TSM Data Protection agent (backing up through the
API), you may have to be creative about gathering the reports from those
logs (esp. with Oracle - I think you have to actually view the RMAN logs
to guarnatee that thoose worked correctly.)  But I have had success
writing very small scripts (e.g. perl) that scrape the information out
of those logs, and send it to be displayed in the TOR Daily report. 

Wanda Prather
"I/O, I/O, It's all about I/O"  -(me)
 

 

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Mark Stapleton
Sent: Wednesday, November 08, 2006 11:44 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: How do you verify the Completion and A ccuracy of Backups
and Restores?

From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Wesley Smith
>       My problem is that they (that sister agency) do not seem to have
a 
>reliable way of verifying that all backups have been properly 
>completed. They don't even seem to have a way to know that all files 
>(that need to be backed up) are being backed up.  I've seen the reports

>that get generated during the backup process and I am definitely 
>unimpressed.  Backups start and backups complete.  There doesn't seem
to
>be anything that says how many rows are copied or how large the files 
>are or anything else that could be used for verifying the accuracy of 
>the backups.  They tell my folks that we should trust Tivoli is doing 
>the job correctly.  Trust is the problem....

Let's start there. When you look at the dsmsched.log file, that contains
a record of all scheduled backups and their outcomes, you should have a
record of what files are backed up, the size of the files, and the
timestamps give an idea of how long it took to back each file up. (This
is assuming that the QUIET feature is not present in the client option
file or the client option set designated for that TSM client.) If you're
using the specialized TSM agents for databases or mail apps, the
scheduled backup logs containing fairly granular information about
individual file backups. What more do you need?

>       We have needed to have restores done on just a few databases in
the 
>past and the restores were not complete and up to date.  In each case 
>we were able to rebuild the data using logs maintained within the 
>applications but that should not have been necessary.  Each recovery
was
>done at a point after a backup and before additional processing had
been
>done within the apps so they should have been complete.  In each case, 
>the folks who run Tivoli for us were able to track down and show that 
>problems had occurred during the processing of the backups.  They did 
>this through circumstantial evidence and in each case once again said 
>that they have no way of verifying that the backups are actually good.
>I hear a lot about the difficulty of trying to write a program to 
>process the Tivoli log files.
>
>       I think I'm at wit's end with these folks and the product.  I
know 
>that the people are competent and I suspect that the product (like 
>other things available from IBM) really is weak on the reporting and 
>verification issue.

While TSM itself does lack some reporting functionalities (particularly
when it comes to client backups and restores), I have to say this:

On every properly maintained and monitored TSM system I have touched in
the 12 years I've adminstered and engineered this product, I have
*never* lost a single byte of information. Period. If you cannot do a
restore because of "lost" data, something is happnening during backups
that is not being caught at the time of the backups.

>I'm hoping that someone out there in the Big Wide World has already 
>solved this problem with an in-house or third-party solution.  Sorry 
>for being so long winded.  Any ideas...?

I think what is needed here is greater familiarity with TSM and its
proper administration. Proper verification of good backups is best done
by regular DR practice of planned bare-metal restores of chosen
machines. If you can take data backed up by TSM and restore a given
machine in a DR environment, and the machine comes back properly, you
know the job is being done right. If it doesn't, *then* you dig into
*why*.

BTW, there are responses to this thread advocating ServerGraph and
Bocada for reporting and monitoring. Be aware that those applications do
a fine job of monitoring server operations. (Well, ServerGraph does,
anyway.) Their reporting, however, is not granular enough to indicate
whether a given file is being backed up properly.

--
Mark Stapleton (mark.s AT evolvingsol DOT com)
Senior TSM consultant

<Prev in Thread] Current Thread [Next in Thread>