Unexplained database growth on replication target

lteBCK

Newcomer
Joined
May 11, 2020
Messages
3
Reaction score
0
Points
0
Hi,

I have a TSM replication source/target pair running version 8.1.8 that has been running with a near constant backup volume for quite a while. Recently the database on the replication target has started to grow ever so slightly. The backup and replication volumes has stayed constant and the database size on the source server is also constant. If the growth continues, I will run out of space for the database in about a month.

The start of growth seems to coincide with some new error messages in dsmffdc.log.
Three messages like these occur every 10 minutes:

[05-19-2020 10:10:36.982][ FFDC_GENERAL_SERVER_ERROR ]: (sdrefcount.c:916) Encountered error 9973 processing range 2184654291979925338 to 2234952175654497472
[05-19-2020 10:10:40.224][ FFDC_GENERAL_SERVER_ERROR ]: (sdrefcount.c:916) Encountered error 9973 processing range -9223372036854775808 to 2234952175654497472
[05-19-2020 10:10:45.230][ FFDC_GENERAL_SERVER_ERROR ]: (sdrefcount.c:916) Encountered error 9973 processing range -9223372036854775808 to 2234952175654497472


Now those are from the servermon scripts and probably mostly harmless in themselves, but if, as they indicate, some 64bit counter has overflowed, maybe there are problems elsewhere.

My suspicion is that the normal database reorganization and cleanup is not running as intended.
Are there some way to manually trigger database cleanup?

Any ideas on where to look further would be appreciated!

thanks.
 
Hi , just seen this one. Did you get it explained/ fixed?
 
Thanks for following up, but unfortunately, no, I haven't found any solution to this.
As it stands, I am planning to wipe the replication server and start from scratch.
 
How's you housekeeping running on the replication server? Expiration runs completely every day on both servers?
 
As it stands, I am planning to wipe the replication server and start from scratch.
That leaves you with some exposure by not having a 2nd copy or your data. Maybe it's an issue with reorgs that's causing the DB on the target to be much larger.

It's a long read, but you should start here: https://www.ibm.com/support/pages/r...oli-storage-manager-v711200-and-later-servers

Also, if your database grows larger on disk, and you expire a bunch of data and the online reorgs frees up some space, that only frees up database pages that can be used, but it doesn't release space to the operating system, there's a section in the link I referenced on releasing the space back to the operating system. Typically, that's only needed if you need to reuse that space elsewhere though.

And when you say "larger", are you referring to Space Used by Database(MB) or Used Pages. The former is how much the DB takes on the disk by the files that the database is stored into, but there can be free pages in those files. If you are looking at Used Pages, that's how many pages in the DB have data.
 
Thank you for that vast piece of information. I remember having read pieces of that before, but I will have a more careful read. I do think it is the internal database maintenance that is failing, and the reorg part is a possible candidate. I'll have a closer look and report back later.
 
Hello!
I'm apparently a year late but I started having the exact same issue starting a few days ago with a replication target server running the exact same 8.1.8.0 server version. My servers use exclusively a single directory container storage pool. Currently the replication target DB is over 15% larger than the source, and this seems to have started after some large filespace deletions on the source.

Did you ever find a solution?

Thanks.
 
Check the REORG settings in the server's option file - make sure automatic reorg is not turned-off.
mass deletions can cause some fragmentation , but it should balance over some period.
There has been some cases where off-line reorg would be needed if it that out of wack.
I believe recent versions 8.1.10 -> might have addressed some of this.
make sure expirations is up to date.
 
Currently the replication target DB is over 15% larger than the source, and this seems to have started after some large filespace deletions on the source.
Did you also delete those filespaces on the target server? Replication cannot replicate a filespace that is deleted, so the changes are not propagated to the target.

  • When replication is configured for a file space, the DELETE FILESPACE command deletes only the file space on the server where you issued the command. If you issue the REPLICATE NODE command, the file space is not deleted on the other replication server.
source: https://www.ibm.com/docs/en/spectru...filespace-delete-client-node-data-from-server
 
Thanks for the replies. I have verified that expiration is running and reorgs are not disabled, and that all replicated filespaces that were delete were done on both sides.
Next I'm going to try and compare the occupancy numbers on both servers to hopefully get a clue.
 
Back
Top