ADSM-L

Re: [ADSM-L] TSM 6.2.2. restore from Data Domain 880

2011-08-26 11:48:08
Subject: Re: [ADSM-L] TSM 6.2.2. restore from Data Domain 880
From: Ben Bullock <BBullock AT BCIDAHO DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 26 Aug 2011 15:44:20 +0000
Well, I guess we will just hijack this thread... sorry.

There are a few things that can help with that. On the warm stand-by TSM server 
at the DR site, it has a unique dsmserv.opt file with "DISABLESCHEDS YES" on 
the end. When I restore the production TSM database, that setting keeps it from 
running any server or client TSM schedules while it's up.  That keeps it from 
doing most of the annoying things.

As for command routing, I only have one TSM server and no server-to-server 
communications, so I haven't had to consider/deal with that.

Also, I restore the TSM database (with a DSMSERV restore) command, but no 
"commit", and then I don't leave it up. I just restore it, check the return 
codes, and leave it down, but ready. I run a TSM full DB dump on production 
once a day, but then I do an incremental every 3 hours. Depending on what time 
of the day I have my disaster, I would apply the other incremental dumps with a 
"commit" on the last one and I would be ready to go. I could get the TSM server 
back up to a point within the last 3 hours (meeting our SLA).

 The worst case scenario is that the disaster would happen during the first 3 
hours after I restore the full DB dump but without a commit, because then I 
would need to re-restore the full with the "commit" to actually get it in 
working condition. But even in that case, the full 150GB dump takes about 60 
minutes to restore and each incremental about 3 to 10 minutes (depending on the 
activity of the production DB). So I can get up and running from scratch in 
under 2 hours no matter what the case, and it will be from 0 to 3 hours out of 
sync with the prod server. 

Obviously, I have to do some audit volumes on the DR site for things that 
changed on the prod site within the last 3 hours and invalidate any data still 
on the disk pools (we don't sync the disk pools offsite, but keep them just for 
staging before it moves to the DD and relatively empty most of the time). 

It's not bullet proof, but it works pretty well and a lot quicker than what we 
had in the past. We have tested it weekly

Feel free to point out any gotchas and poke holes in my DR plan I might have 
missed. I'd much rather know now than when we have a disaster.

Ben
 
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Rick Adamson
Sent: Friday, August 26, 2011 7:11 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] TSM 6.2.2. restore from Data Domain 880

Jim, my apologies for getting away from your question/subject here, but had a 
question for Ben....

I have given serious thought to handling my DR site in a similar fashion (warm 
stand-by)but have reluctantly decided not to because of the automated tasks 
such as Admin schedules and command routing that I would not want to run on the 
DR data and DR TSM servers as it would result in the DR data being out of sync 
with production. If you can comment on how you addressed the issues it would be 
much appreciated.

Thanks


~Rick


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Ben Bullock
Sent: Thursday, August 25, 2011 4:32 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] TSM 6.2.2. restore from Data Domain 880

We are doing the same thing here, although with older versions of TSM: TSM v 
5.5.5.0 on AIX6.1.

We dump the TSM DB to a DD690, replicate to a DD580 at a DR site. Then we kick 
off a script to restore the DB to a standby TSM server at the DR site.
In testing we did fulls, and fulls+incr and commits and we have not had any 
errors. We've only been doing it for about a month now, but the process runs 
every morning to load the full DB backup (to keep the DR site warmer and 
readier to go) and I haven't had a failure yet. The DB is 150GB, in one 150GB 
file we load from.

My script wipes out the DB every time before loading the DB dump. I don't think 
it's necessary, but I do it just for a clean slate. The command looks something 
like this:
dsmserv format 1 /dev/rlogvol 4 /dev/rdbvol1 /dev/rdbvol2 /dev/rdbvol3 
/dev/rdbvol4

Not sure what the issue could be to cause it to fail intermittently. Hopefully 
it's not related to the TSM server version, because we will eventually have to 
go to 6.X.

Sorry I'm no help in this case.

Ben

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Schneider, Jim
Sent: Thursday, August 25, 2011 1:19 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] TSM 6.2.2. restore from Data Domain 880

Is anybody else performing Disaster Recovery testing of AIX 6.1 TSM 6.2.2.0 by 
restoring the database directly from a Data Domain 880 file device?

My problem is that I can restore the database some of the time, and get 
"ANR4522E RESTORE DB failed with LOG file error." the rest of the time.  I have 
opened PMR 36668,122 but the tech seems to think the problem is that I'm using 
a script to generate the db restore command, despite the fact that I've 
explained that the problem is intermittent.

I've copied the database files to local storage prior to a restore attempt.  If 
I can restore from the local copy I can restore directly from the DD880.  If 
the local copy restore files with a log file error, the DD880 restore has the 
same problem.  It looks like corrupted Db backup files are being written 
(sometimes).  I'm searching through logs and would appreciate any suggestions.

Jim Schneider
United Stationers

The BCI Email Firewall made the following annotations
---------------------------------------------------------------------
*Confidentiality Notice: 

This E-Mail is intended only for the use of the individual or entity to which 
it is addressed and may contain information that is privileged, confidential 
and exempt from disclosure under applicable law. If you have received this 
communication in error, please do not distribute, and delete the original 
message. 

Thank you for your compliance.

You may contact us at:
Blue Cross of Idaho
3000 E. Pine Ave.
Meridian, Idaho 83642
1.208.345.4550

---------------------------------------------------------------------


The BCI Email Firewall made the following annotations
---------------------------------------------------------------------
*Confidentiality Notice: 

This E-Mail is intended only for the use of the individual
or entity to which it is addressed and may contain
information that is privileged, confidential and exempt
from disclosure under applicable law. If you have received
this communication in error, please do not distribute, and
delete the original message. 

Thank you for your compliance.

You may contact us at:
Blue Cross of Idaho 
3000 E. Pine Ave.
Meridian, Idaho 83642
1.208.345.4550

---------------------------------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>