Long-term retention of data and TSM

Bacdef · Jul 29, 2013

Hello everyone!

This may sound a bit like a rant but it is not.
I am really trying to get something done here and I'd like your opinion(s) on how I could proceed.
This will be quite a long post but I have a feeling many TSM users are in a similar situation.

First allow me to state the premise :
I am trying to implement a strategy to satisfy some business requirements that say we must be able to recover our data 'as it was' as far back as 5 years ago. Mostly this is for financial, medical and PR data.
What I need is not assurance that the data has been backed up ans is recoverable. What I need is assurance that we can go back in the past should legal issues arise and so forth. There is no possibility to build this functionnality into the applications because of the risk on intentional fraud/destruction/alteration ans so on.

I know it is impossible to 'catch' every single datum but with other media/job/policy based backup software it is at least possible for everyone in the shop to agree on key dates/days and to run monthly/yearly jobs at those times with say a 5 year retention and make the auditors/users/management happy but... TSM is data based and manages its sauce using file versions associated expiration rules. Thats superb for everyday operations, protecting data on the short term and recovering systems in a DR scenario but that way of doing things, at least for me, really falls on its face when it comes to long-term retention.

So I took a look at how I could achieve what is expected of our data protection solution using TSM.
I tried Backupsets, Archives and I am now gearing up to try "cheatin' it".

Backupsets
Sounds great on paper. Create some nodegroups, generate backupsets on those nodegroups, use the needed retention, problem solved!
But I crossed out that possibility because...
TSM will not generate 'a' backupset from the required data... It will generate individual backupsets for each and every node in the group, sequentially, loading each source tape multiple times for every node as it walks through its data. You need TOC pools. Backupsets are not tracked by the DRM and TSM does not usa scratch pools so offsite'ing media requires cleverness to prevent orphaned tapes. Generating the backupsets interferes with daily operations since the generation process seems to, for lack of a better way of saying it, 'freeze' the data/nodedata it is working on and this leads to endless loop migration processes and other nastiness. The output of Q BACKUPSETS quickly becomes cluttered and management scripts are needed because no human can makes heads or tails of it after a few monthlys or yearlys have been run.

Archives
Again sounds great on paper. Define archive copygroups and pools, thats easy. I can even dedicate a library to that task so as no to impact daily operations. Sweet !
But I crossed out that possibility because...
TSM does not allow to archive the domain: thats strictly for incrementals. So you cannot archive the same data you backup. TSM expects a user sitting at the node, typing 'archive /thisfile.please' but what we have is one or two backup admins and a bunch of schedules. Scheduling archive jobs for the whole shop would be challenging to say the least but you could distribute local scripts on every node that interpret the contents of dsm.opt and generate a tag with the current date and call dsmc toARGH!MYHEADHURTS!.

cheatin' it
So this is where I'm at. Coaxing the product into giving me what I need.
In order to provide the same protection for everyone I have to consider the worst case scenario of a file getting updated/changed each day so in order to provide the requested protection 'The TSM Way' I'd need to build copygroups with VEREXISTS, VERDELETED, RETEXTRA and RETONLY set to 1825. No way this can fly.
The least painful way I can imagine doing something like this is creating two nodes for each computer I need to protect with TSM, using the first node as intended and 'cheating' second one. By 'cheating' I mean using the same domain as the 'straight' node but having a schedule back it up only once a month to a copygroup with VEREXISTS=12 RETEXTRA=365 VERDELETED=12 RETONLY=400. Then add a third node with a yearly schedule backing up to something like VEREXISTS=5 RETEXTRA=1825 VERDELETED=5 RETONLY=1825.
This might work, I might get away with doing just the yearlys if management agrees but I'm still going to have a hard time moving this offsite unless I back it up to a copy pool in order to use DRM.

Has anyone 'cheated' it' with any measure of success?
Any other ideas on how to approach this issue/requirement ?

Trident · Jul 30, 2013

Hi,

I have implemented _mnd, _year nodes for all these long term stuff (for sql, exchange and file). My choice is 'cheating'.

Tiger22 · Jul 30, 2013

As Trident says, most sites with long term backup retention requirements impement _weekly, _monthly and _yearly nodes. It is highly unlikely to have a requirement to restore to a specific day many years ago so generally you accept that the longer back you go the less granular the restore period is going to be. Having separate nodes for the same server based upon retention periods and backup frequency is the best solution for TSM and is certainly not 'cheating'.

tsm_dude · Jul 30, 2013

Yeah, I do the daily node with 35 day retention, then a monthly node with 10 year retention. (government requirement)
With the TDPs its different, we tag oracle backups as monthly or daily and hence script the deletes only for daily.
I have used backupsets before but find the export more usefull, and I would export last 3 days of data so its not spread over too many tapes.
Also there is a TSM archive manager product sepperate from TSM with compliant long-term retention, I haven`t used it but it exists.

Bacdef · Aug 1, 2013

Thanks guys, I appreciate your input.
I will be going ahead and implementing monthly and yearly identities for ye ole nodes so I'm open to any pointers, recommendations, examples or caveat notices you may provide.

tsm_dude are you sure Storage Archive Manager is a different product ? My understanding is it is a TSM server on which you SET ARCHIVERETENTIONPROTECTION ON before commiting any data to it. Again, like all the options I have explored thus far with TSM it sounds fine until you realise it still has to get its data from a BA Client, cannot use domains or schedules in any meaningful way and is even more removed from your backups since it operates on its own server with its own database. Maybe there is a dedicated client for it but I saw no evidence of that in the product litterature.

DanneFr · Aug 1, 2013

Hi

we are doing export node monthly and checkin out the tapes for some nodes. Keeping the database size under control and no need for 3+ nodes per server..

Bacdef · Aug 1, 2013

DanneFr said:
Hi

we are doing export node monthly and checkin out the tapes for some nodes. Keeping the database size under control and no need for 3+ nodes per server..

I am curious to know how you track/vault/recall the media associated with exported node data.

DanneFr · Aug 1, 2013

That's outside tsm... Since it's just an export and not associated with any "standard" archive-method within tsm.

ruccsito · Aug 2, 2013

Hi there.. I am on a situation like this.. Business requirement is to have a monthly backup and send it offsite for 2 years.. So I have created _M nodes to run monthly incrementals going to a dedicated stgpool.. what I do is to just checkout all the tapes of that stgpool and a dbbackup tape every month and keep them offsite. I am curious about the export too..how many nodes are you exporting? and how long it takes? I am planning to use export for yearly backup and keep them forever.

tsm_dude · Aug 4, 2013

TSM archive manager - Its another product that plugs in to TSM EE for regulatory compliance but uses TSM server code.
Probably just a license thing, I dont know, thats all I know about it, never used it.
You track exports via the volume history same as backupsets, dbbackups etc..
q volhist ty=export
The good thing about export is the data on tape that is exported is portable and does not need the same TSM server to restore.
But you need to import the data if you want to restore from it.

Bacdef · Aug 6, 2013

Thank you everyone for your comments.
I will be going ahead with the montly/yearly nodes.

I've already done quite a bit of ground work and things look good up to now.

Long-term retention of data and TSM

Bacdef

Active Newcomer

Trident

TSM/Storge dude

Tiger22

tsm_dude

Bacdef

Active Newcomer

DanneFr

Bacdef

Active Newcomer

DanneFr

ruccsito

tsm_dude

Bacdef

Active Newcomer

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics