Expiration Woes

jgraul · Apr 2, 2013

We seem to have the slowest big ass TSM on the planet. Why in the world would a 32 core Windows 2008 R2, 64 GB of RAM, with the DB running on SSD disks (4 SSD configured as RAID 5) be so slow. Please no Windows jokes, we are having problems completing our daily jobs.

We have about 530 nodes in our enterprise, almost entirely Windows servers. We are now expiring about 9 million files a day. We never get more than 3.5 million objects expired per hour. In the last six days expiration increased daily. Day 1 took 162 minutes, day 2 took 196 minutes, day 3 217, day 4 252, day 5 274, day 6 374. As I write this on day seven, it's still running after 300 minutes and has only deleted 5.5 million objects.

Has anyone experienced anything like this?

Harry_Redl · Apr 2, 2013

Hi,

just few questions/pointers ...
What is your TSM version? Have you tried running multiple expiration processes in parallel (expire inventory resource=X)? Are the nodes equal in number of files expired or the expiration of single node/filespace is the problem?
Anything in the db2diag.log? Do you use deduplication?

Harry

jgraul · Apr 3, 2013

We are on TSM 6.3.3.0 and we use resource=8. We don't use deduplication because we have a Data Domain as our primary storage. It has built-in deduplication. The Windows 2008 servers definitely have a lot more system state files than Windows 2003, but I don't think that would explain the 2x increase in time for the expiration process over 6 days given the number of files examined and processed is less than 500K difference.

moon-buddy · Apr 3, 2013

One question:

Is compression turned ON for the clients?

I have much bigger number of nodes than you have (> 530) and expiration runs fast. We are on Linux and TSM 6.1.5, and use Data Domain with devclass=file for primary backup.

jgraul · Apr 3, 2013

We don't have compression turned on.

moon-buddy · Apr 3, 2013

jgraul said:
We don't have compression turned on.

What kind of link do you have with the DD? and what model of DD are you using?

I use bonded NICs for an aggregate of 4 GB.

jgraul · Apr 3, 2013

We have 10Gb nics on the TSM server and the DD, which is a DD670.

moon-buddy · Apr 3, 2013

jgraul said:
We have 10Gb nics on the TSM server and the DD, which is a DD670.

The DD is mounted to the Windows server as a drive or as a share?

jgraul · Apr 3, 2013

We setup the device classes as type file and the directory as UNC paths to the DD (\\DDipaddress\TSM\STGPOOLNAME).

jgraul · Apr 3, 2013

I wouldn't think expiration would have a dependency on the stg pool device, but rather the TSM DB which is on SSD disks.

moon-buddy · Apr 3, 2013

jgraul said:
I wouldn't think expiration would have a dependency on the stg pool device, but rather the TSM DB which is on SSD disks.

True but I have seen weird things on Windows.

I also want to get an understanding of what your setup looks like.

moon-buddy · Apr 3, 2013

Here is one thing to check.

Do you have the latest drivers for the SSD, or the RAID card (I am assuming hardware RAID)?

Have you done off-TSM test with the SSD environment? Is the result as fast as expected?

moon-buddy · Apr 3, 2013

If you run expiration like this:

expire inventory node=a,b,c resource=3

Would this be faster than:

expire inventory node=a,b,c (resource defaults to 1)

I am looking at the possibility of a bug or flaw in 6.3.3 assuming all of your hardware checks out.

jgraul · Apr 3, 2013

We normally run 'expire inventory skipdirs=no resource=8'. Earlier in the year we started experimenting with the resources and found that we could expire about 8.5 million objects in 140 or so minutes pretty consistently. We really didn't see any improvements with higher resources.

Lately we haven't been able to get consistent results. I rebooted the server yesterday after expiration completed. Very interested to see how it runs today.

Mikey D · Apr 3, 2013

So the SSD disks are internal (not SAN) ? If so what make and model server are these on? Just wondering if an internal PCI bus bottleneck you're running in to.

jgraul · Apr 3, 2013

It's an IBM x3650 M4. Internally we have 2 SAS mirrored drives for the OS, swap, and applications, 2 SAS mirrored drives for the archive logs, and 4 SSD drives RAID 5 for the database and active logs. We have an IBM ServRAID M5110e controller installed.

jgraul · Apr 4, 2013

So yesterday it took 398 minutes to process 8,857,527 objects. It has process over 7 million after 35 minutes. What the heck is it doing the rest of the time?

I should point out that this is actually a 16 core system. The 32 comes from hyperthreading...

moon-buddy · Apr 4, 2013

jgraul said:
So yesterday it took 398 minutes to process 8,857,527 objects. It has process over 7 million after 35 minutes. What the heck is it doing the rest of the time?

I should point out that this is actually a 16 core system. The 32 comes from hyperthreading...

I am making a guess here.

Since we don't know the ins and outs of the how TSM interfaces with the new DB2 environment, I am guessing that there must be some reorganization of the TSM metadata or TSM is 'looking' for the other 1.8+ millions objects to expire.

jgraul · Apr 4, 2013

I'm starting to think it may be something with expiring Windows 2008 and Windows 7 system state files. They are taking a really long time. One Windows 7 pc's system state took over 90 minutes to expire.

Expiration Woes

jgraul

Active Newcomer

Harry_Redl

Moderator

jgraul

Active Newcomer

moon-buddy

jgraul

Active Newcomer

moon-buddy

jgraul

Active Newcomer

moon-buddy

jgraul

Active Newcomer

jgraul

Active Newcomer

moon-buddy

moon-buddy

moon-buddy

jgraul

Active Newcomer

Mikey D

jgraul

Active Newcomer

jgraul

Active Newcomer

moon-buddy

jgraul

Active Newcomer

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics