TDP for Lotus Domino - best practice and retaining mail indefinitely

el.b00ty

ADSM.ORG Member
Joined
Mar 12, 2008
Messages
34
Reaction score
0
Points
0
At my company, we were recently directed to start retaining our Domino backups forever... anyone out there have any suggestions on how we should set this up? Forgive my ignorance, but TSM was basically installed and setup at our shop by consultants, so there's a lot we have to learn about this product. I'll describe our current setup the best I can...

About 50 servers (couple Windows 2000, mostly 2003) backing up to our TSM server. ~1 TB diskpool. Using LTO4 tapes for our tapepool (10 tapes right now, but have room for more in our library if necessary), using LTO3 tapes for our offsite copy pool.

Honestly, I'm not entirely sure how our Domino guy does his backups - he's a little scatter-brained and I can't for the life of me understand what he's talking about sometimes. He claims to be doing incrementals during the week and full backups during the weekend. I don't know enough about Domino or Tivoli to know what the best practice here is... I deal mostly with the other application and file servers, where a TDP client isn't necessary.

For now, I believe he (the Domino admin) basically changed the Domino Policy's retention to 9999 days. Isn't this going to eat through tapes like crazy?

I suppose what I'm looking for are suggestions on how we should handle our Domino/mail backups. Anyone have any suggestions? Let me know if there is more information needed and where I can get that (what commands output you need, etc)...

Thanks in advance for any help anyone out there can provide.
 
The daily incremental with Weekly Full is not a bad thing. I'm not the Domino Guy but I've been working with our Domino Guy on this. Basically, Domino is a DB Server like SQL (no comments please, I'm generalizing). So the daily incremental is really nothing more than the backup of the Transactional Logs for Domino with full db backups for those db's that have changed DBIID (Database Instance Identifier). A full is required if that ID changes but the TDP will take care of this for you. You can perform incremental backups many times a day if necessary depending on load and how granular you want to be.

The problem is that you don't want to have too many incremental backups or it will take hours if not days to restore a single maildb. So we do a weekly full backup. It's just like any other backup utility. If I need to restore to a point in time, the Domino TDP will grab the previous full and restore to the point in time with the transactional logs.

I'm curious about the data retention and management classes though. I've started experimenting with those to make sure I have my brain wrapped around them for the Domino TDP (it doesn't seem to be exactly like the BA Client for retention). I have no data at this time however.

Now, as for keeping mail indefinitely, this sounds like a regulatory requirement. A TSM backup will not meet your regulatory requirements in the least. What you will find is that it will be very very costly to provide ALL email sent and received during a litigation claim because folks delete mail and that is updated in the transaction logs. It could take you weeks or months to build a maildb that was complete with no deletions.

A solution (though this is only heresay at this point for me) is the use of CommonStore. It is my understanding that it will provide the audit trail and content to meet regulatory requirements. But that is part of another beast entirely (Enterprise Content Management).

Finally, if you are worried about tapes, create a domino pool just for domino backups. This does a couple of things.

1. You know exactly how many tapes these backups are taking and thus can report a true COST to managment for their decision.

2. It keeps the domino data isolated for recovery purposes. Think space reclamation.

We have moved to this model and our full restores are actually a bit faster because I don't have data spread across hundreds of tapes over the life of the servers (then again, my predecessor did nothing but transactional backups too).

Good Luck.
 
I've seen this before. 9999 days is rediculous and won't work. Afterwhile you will have such a performance issue it won't be funny. Couple that with the amount of data you will need to retain, the multiple servers you'll need to expand to in order to retain it etc and it's just not feasible unless you're willing to spend hundreds of thousands of dollars just to retain this data.

Following data management best practices will always be the right way to do things. So:

A backup is not an archive and your folks are trying to force TSM TDP Domino backup to act as an archive. Setting a reasonable policy for backup is 30 days. In order to archive Domino for 7+ years and be able to retrieve it you will need a mail archive product; DB Commonstore is a good choice and is what we use. It has multiple views and is very powerful and cost effective. I recommend you contact your IBM folks and ask about this, and get some feedback on what they're proposing so you have some ammunition against what they're trying to do - nothing good will come of a 9999 day policy.
 
I thought you'll never see if they need that domino backups 27 years later :p
Use CSLD (Commonstore) it's easy to set up and works well for archives...
 
Sorry for resurrecting my own thread here, but it provides the background...

Sooooo... fast forward to today and, despite my best efforts to convince them otherwise, management decided keep forcing a square peg through a round hole. The results? Of course, we ran out of space after a few months, administration became a nightmare, backups failed... armageddon. Again, I tell them we need a different product. But the economy is bad so they don't want to spend the money. They want options. End result? They change retention from 9999 days to 180 days, based on storage estimates.

Not good enough. Things are blowing up again.

Now, in the meantime, they've decided to go ahead and purchase an archive solution. Good. But... they fired our Domino guy before it was completely implemented. Now there are consultants in trying to figure it all out and clean up the mess.

Anyway, bottom line is we have this rolling '180' days of mail being stored in Tivoli that has to find its way into the archiving product. We're capturing all live mail, so after 6 months that data will be useless anyway. What I want to try to do is move it out of Tivoli or at least change how Tivoli is storing it. It's a ton of duplicate data... probably 500-600 GB of actual mail that's taking up ~20 TB worth of tapes.

Can we move it? Copy it somewhere else then delete it from TSM? If so, can we consolidate (de-dupe) it? Can we de-dupe it by restoring it somewhere else? In other words, can I take, say, 2 full backups, restore one, then the other, and have it 'combine' the two?

They're not ready to 'ingest' it into the archive solution yet, and may not be for a couple months still. I'm just looking for ideas here. What a freakin' mess. Anyone out there reading this... do WHATEVER IT TAKES to keep management from putting you in this situation. I fought hard, but should have fought harder.
 
For my company... we retain all email forever... but its not done through TSM its done through a write once retain forever device with built in dedup. Rather than try to import all the data we already had, we decided to retain everything from the point in time that the device was put in place.

You could spin off a one time retain forever archive of your lotus servers. Then configure the normal backup just like any other backup allowing it to expire. Meanwhile the worm mail catcher does all the work of retaining everything forever.

We only keep our mail backups for 7 days.
 
So this 180 days is a point-in-time restore requirement? That's the only way it makes sense in my head at the moment. You're fubar'd if that is the requirement - there is no way to restore all possibly variants of PIT for every NSF without having all of the DB backups and transaction logs. I suppose you could dump them all out of TSM, but the domino TSM agent is fairly well embedded - I don't know how to play translogs manually against every full you have without TSM involved (never tried it).

You say you're capturing all live mail, presumably either via something non-domino or via mail journals. Keeping 180 days worth of mail transactions in domino databases is a bit mad, but if you're already keeping journals via whatever method why can't you just play those against the archival solution and start purging TSM now?

Dedupe won't help you here - the domino NSF structure is resistant to such things. If you run crypto on your NSFs (like here) you're doubly snafu'd.

I imagine you're using Domino NSFs with archival rather than circular logging. If so the main TSM database overhead will be translogs - a lot can get generated in 6 months. Conceivably your management could relax their restore requirement, permitting PIT up to 30 days, and weekly snapshot restores out to 180 days - that would let you expire 5/6th of your translogs but keep the selectives.

Its a tricky one. Let me know about the environment composition if any of the above sparks an idea or two.

T
 
Tony - I don't know if I'd really call it a PIT requirement so much as just an attempt to capture and retain as much mail as possible. It was understood that if someone received a message and deleted it in the same day, between backups, it would not be backed up. However, I guess in the scenario where a user maybe just cleans up their inbox once a week, this would retain the deleted emails. I'm not the Domino guy - I know very little about it, really - but from what I did understand I knew this was going to get ugly. Sometimes I hate being right.

Anyway, thanks for your reply - one thing you said jumps out at me right away. What we are doing right now is weekly full backups with log backups each weekday. Is there a way we can set the log backups to expire after, say, a week or two without affecting the full backups? Would we have to setup 2 different policy domains? Can you elaborate - am I misunderstanding your comment?

The other thing you mentioned was journaling and 'playing' them against the archival solution. Again, Domino (and databases in general) are not a strong area for me, but that was kind of my question to the consultants... why can't we start just pumping this stuff into the archive solution (yes, it's non-Domino/IBM)? For one, none of them really understand what's happening on the Tivoli/TDP side either, and, secondly... correct me if I'm wrong, but we still need to restore/stage the data somehow/somewhere. That's where I'm at right now... trying to figure out how we get this data from TSM to the archiving solution where it should be, then getting TDP back to the point where it's simply being used for backup/recovery.

Does that clarify things at all?
 
Tossing my hat into the ring

First a few points, some already mentioned
  • extended retention of email with TSM is just plain silly as you have discovered
  • and if it is backed up it is fair game in ediscovery proceedings which means hundreds or thousands of possible restores. I know this from experience
  • since management went from 9999 to 180 days then this is not likely a regulatory issue, just someones knee-jerk reaction to some event
  • while no expert I doubt there is any industry that is required to keep email forever. There are industries that are required to keep "some" communications for "selected" employees for protracted periods of time, which is where email and other archiving products come into play. BTW 180 days is not one of the increments familiar to me.
  • Also TDP Domino does actually use the copygroup settings, mostly, which differs from some other database clients. The exception has to do with the translog archives. It will not delete any of those that it feels it needs to recover to whatever the furthest back your settings will allow.
  • I am assuming that you are doing transaction logging. If you are doing circular logging you really do need to perform log archives regularly in order to ensure your Domino servers do not crash from filling their translog filesystems/drives
    • Also if you are doing translogs then yes you can recover an email that was received and deleted in a short span of time, if you know when to look
  • if you look at the schedules used by the Domino clients you should see that there is an object referenced in the Command schedules that are used. These are scripts that should contain one or more "domdsmc" commands. Commands like
    domdsmc archivelog -logfile=...
    domdsmc incremental '*' -logfile=....
    domdsmc inactivatelog -logfile=...
    BTW, if you are not performing the inactivatelog you may have extra useless data on tape
Now to the meat of your question, what to do with the already backed up data. First I do not know of any product that can take backed up data from TSM and somehow ingest it. (If such a product exists that does not require restores first I am interested to know what it is). Thus as you indicated restores are needed. But you can't restore every possible minute of every day if you are doing translogs. Best bet is to somehow segragate the old data from the new data.

Unfortunately I have experience in this area also. For reasons I can't get into I had to essentially rename all of my existing Domino clients and recreate them with their original names but in a different domain/policy with a different retention. Thus I went from infinite to something reasonable while keeping the infinite data available. I also segregate all of my email data from other data into unique primary and copy pools.

The new data is ingested into an archiving application to meet ediscovery needs and I only back up what is required for DR purposes.
The easiest thing to do with the old data is eject it and store it until such time as you are allowed to scratch those tapes out. And that part is easy as all you would need to do is delete the nodes/filespaces of the renamed older node registrations.

If management insists on keeping the data on or near-line then you can move the older data from it's tape pool (move data or move nodedata or migration to a nextpool) into a disk pool/dedupe appliance and see if you can make it smaller. As indicated in a previous post I doubt that will gain you much. it might be an expensive experiment as well.

In either case if you are doing translogs see if you can get permission to delete the translog filespaces (DOMLOGS) for these older renamed clients and just keep the incremental filespace (DOMDBS). A nice side benefit of this is that all you can offer during an ediscovery proceeding is the weekly incremental backups and the restore burden is much lessened.

Please note that I am assuming you are not currently already under some kind of legal ediscovery activity. If you are then all is lost as you will likely not be able to delete anything and you may not even be able to change retentions.
 
rmazzon - thanks for the great info. You're dead on - this was basically a knee-jerk reaction by the previous director. There was a legal issue, old mail was requested by the legal team, and we didn't have it. Nothing is currently going on, so we should be able to change retention policies.

Yes, we are doing transaction, not circular, logging. Also, I took a look at the primary mail server and did not see a cmd file that uses the inactivelog backup option, so we'll have to take a look at that for future backups.

On to the rest of your comments... your solution/idea sounds promising. Ideally, I could just convince everyone that this data will never be worth restoring due to the effort involved in actually doing anything useful with it, and we could just get rid of it. But I don't think I'll be winning that battle. Now, getting rid of the translogs... there might be some potential there. I did a 'q filespace' to try to see how much space we'd be saving, but apparently that's not the command I'm looking for (the Domino nodes show 0 for capacity and utilization). How can I see the amount of space used by the logs vs. DBs?

So, essentially, I'd be creating a new policy with sane retention policies, then re-configure the clients as nodes in this new policy, more or less orphaning the old data? If they wanted to allow me to delete the data earlier, I could just delete those filespaces as you mentioned, but eventually (~180 days) they'd expire anyway, yes?

Still soaking all this up...
 
Found what I was looking for... 'q occ'...

Another (dumb?) question - could I delete/clear the log filespaces before doing any of the rest of this? Will they just be recreated on the next backup? If/when we move the existing data to different storage pool(s), then there would be about 20% less data to move...
 
Well you are not going to reconfigure the clients so much as recreating them. Just so I am clear the steps you want to perform are as follows
  1. Rename the existing client to a new name, maybe preface each nodename with LGL-
  2. Register a new node but using the same name as used by the original node and put this node into a new domain/policy. Thus you end up with, as an example, DOMCLIENT1 and LGL-DOMCLIENT1
    1. Keep in mind that the next backup the new DOMCLIENT1 will perform is going to be a full backup so you may need to stage all of these renames/registrations over time depending on your environment. This may cause short term storage capacity pain.
    2. You will also need to recreate the schedules in the new domain
    3. And you will also want to remove the original nodes (now starting with LGL- in this example) from any schedule associations.
  3. Assuming you need to keep all 180 days worth of the data associated with LGL-DOMCLIENT1 you will need to increase the copygroup settings of the original Domain/policy to infinite across the board otherwise expiration may still delete data from this dataset. If you don't care about keeping all 180 days then change nothing in the original policy and in time you will be left with just the data dictated by the verdel and retonly values. And I am assuming that the mail clients are in their own domain and thus you can make changes to the copygroup settings.
Also you can run the inactivatelogs on the old data right now as I suspect the logs from the time that you were running with 9999 retention are still there. This should be done at least weekly if not daily. We do it as part of every incremental schedule run by our Domino clients.

We don't run full backups. Our Domino admins perform housekeeping including what is called a "compact big B" on subsets of our mail data and that changes the DBIID and as a result an incremental will essentially perform a "full" backup of that data subset.

You don't want to clear the logs until you separate the old data from the new data otherwise you will destroy your ability to recover anything to a PIT until after your next full/incremental.
 
Hi,

Chiming in with a couple of thoughts:

We also do a weekly full with translogs in between (every 4 hours). We also do a daily incremental to catch new databases (to be more precise new DBIIDs - each domino database has a unique ID that matches the reference in the transaction logs). Its a good idea to take daily incrementals but you'll need to ensure your domino admins don't run a compact task with the -B option (which changes the DBIID). a compact -b is fine on a daily basis though. I personally like to run a weekly selective rather than rely on compact changing DBIIDs because a couple of key databases can't be compacted in that manner without shutting down Domino first - I prefer not to rely on external admin operations to ensure I have a good full.

Domino is capable of writing mail journals (a seperate NSF, or rather series of NSFs), but if you've not been collecting them (have a look for ll_names starting with MJ) that doesn't do you a lot of good now (in terms of collecting mail into an archive).

In essence selective backups are capable of being restored as is, or they can be restored in an inactive state, and then transaction logs can be applied to bring the NSF to any point in time. In theory this could be used to acquire a complete record of all mail, but in practise it can't. You'd need to restore every NSF to every PIT to capture any deleted mails etc etc. Its a no-go in terms of playing the restored NSFs against an archival solution.

Transaction logs are marked inactive when the domino tsm client runs in that mode. TSM then expires the logs according to the bound mgmt class. If you have been storing your NSF files in one mgmt class and your S*.TXN files in a different mgmt class you can expire them on different criteria. As we do here, for example - we store databases out to 1 year (reto) if needs be, but we only keep 30 days of translogs. We can then do PIT restore out to 30 days, and a restore of a deleted NSF out to 1 year (as at the time of its last backup). If your xlogs and DBs are the same mgmt class then you don't have that level of flexibility.

If it helps here are the copygroup settings used here:

Code:
TYPE       CLASS_NAME             VEREXISTS     VERDELETED     RETEXTRA     RETONLY      SERIALIZATION          DESTINATION       
------     ------------------     ---------     ----------     --------     --------     ------------------     ------------------
Backup     DATA                   NOLIMIT       NOLIMIT        30           365          SHRSTATIC              DATABASE_D        
Backup     DATA_SAN               NOLIMIT       NOLIMIT        30           365          SHRSTATIC              DATABASE          
Backup     LOGS                   NOLIMIT       NOLIMIT        30           30           SHRSTATIC              LOGS_META_D       
Backup     TDP_JRNL               NOLIMIT       NOLIMIT        NOLIMIT      NOLIMIT      SHRSTATIC              DATABASE_D
NB: TDP_JRNL is the class used here for mail journals, which have an annoyingly infinite retention due to fairly stringent legal requirements.

T
 
Guys - I can't tell you how much I appreciate your help here. Your expertise is truly appreciated. I have more questions... lots more... but will try to focus on one or two at a time. Again, I'm not a Domino guy (or DBA), so there are most likely some basic concepts that I just don't have a grasp on - I apologize for any lack of rudimentary knowledge in these areas... I'm a quick study, though, so hopefully it won't be too painful.

Also, before I go further... let me know if there are queries I can/should run that would be helpful. I'll post the output if it will clarify things.

So, for one, I've found that one of our mail servers was *not* using transaction logs. That particular server only has about 28 GB of data... yet a 'q auditocc' (ran an 'audit license' right before) shows over 9 TB of used storage. I'm assuming what's happening is we're taking a 'full' backup every day and retaining it for 210 days (which we haven't reached yet).

So, I suppose I need a better understanding of what's truly happening. I started poking around on the mail servers... looking, but not touching, of course... our main mail server has about 460 GB in the data/mail directory... another 8 GB of .txn files that are around 65 MB w/ date/time starting right after the last scheduled archive log backup (makes sense).

I took a look at the schedules associated w/ that node... there's 3... looks like an incremental M-F, a selective on Saturday, and archivelog every day. Haven't dug into each .cmd file yet - that's next.

So, I guess 'selective' is like a full DB backup - is that right? So every weekend, we're grabbing a copy of every db on the server - whether it's changed or not? Then we have the incrementals - which would end up creating somewhat of a redundant copy if the DBIID changes (from a compacting job)? I get the transaction log backups... for the most part... except I'm still not quite sure I understand the inactivelogs option.

Regarding that server that's not doing transaction logging... any reason why someone would set up a server that way? The guy's not here anymore (I think I mentioned that above), otherwise I'd just ask him... :)

Guess that's enough questions for now. Back to the admin guide...
 
oh, one more... would I actually need to create a whole new policy domain (and storage pool), or could I just create a different management class/copy group and assign the new nodes that way (referring to rmazzon's suggestions). If not, can you clarify the difference?
 
Hey again,

Aye, a selective is a full. On servers which perform circular transaction logging an incremental is also effectively a full :(

The daily incrementals are typically needed (on an archive logging server) to pick up any new databases...the downside being you also pick up newly compacted databases. If you can control compaction by ensuring your Domino admins are sync'd with your full backup you're good to go. Make very sure that compact doesn't run at the same time as backup...its bad (50% perf hit).

Domino is an interesting beast because it clusters internally. The reason why you might not run archive logging (i.e. use circular mode) is that you may not need to. Here we define one site as primary for Domino data and another as a secondary. All Domino data is replicated at a domino level between sites. The primary sites runs archival logging and gets the full gamut of backups. The secondary site runs circular logging and does not backup any bulk data. Data gets restored to the primary.

The inactivatelogs option is a reconciliation tool between Domino and TSM. The TSM server has no intelligence about the contents of the data it stores - it merely has retention parameters for each object. It has no way of determining whether a stored transaction log is useful in any way (i.e. can be played against any existing NSF object). All Domino transaction logs are stored in TSM as active objects (i.e. excluded from expiration processing). The inactivelogs option is used to permit the Domino instance/API to scan the stored transaction logs, and for any log which cannot be played against an active Domino database, marks that log as inactive. At that point TSM expiration processing kicks in.

T
 
last one for the day (most likely)...

I don't know much about either of these options (still reading), but would colocation or active data pools apply to my predicament? Pros/cons?
 
So, Tony - re:inactivelogs... running that type of backup will purge the inactive logs regardless of retention settings in the management class? I'm getting a little confused on terminology, but... which databases would be considered 'active' in a scenario such as ours where we've essentially set everything to 210 (versions & retention)?
 
So, Tony - re:inactivelogs... running that type of backup will purge the inactive logs regardless of retention settings in the management class? I'm getting a little confused on terminology, but... which databases would be considered 'active' in a scenario such as ours where we've essentially set everything to 210 (versions & retention)?

It won't actually purge the transactions logs - it will just mark them inactive so that TSM can purge them during expiration (assuming they match the copygroup purge criteria).

An active database (in TSM) is one which has an active NSF file with the same DBIID on the Domino server. Databases in TSM go inactive if either they are deleted from the domino server, or if the DBIID is altered at the Domino server.

re collocation - it can help with the transaction logs, certainly. Applying translogs is a pain in the neck and you don't need media wait into the bargain.
 
Back
Top