Migrating disk to tape closes all but one of 6 processes

sandragon · Jul 10, 2015

We have a single domain with a 10.5tb disk pool. This diskpool fills up every night (we cannot allocate more disk at this time). To mitigate this, we have migrations running pretty frequently to lower the data within the pool. However, the system is behaving oddly. The diskpool is configured with 6 migration processes and a high mig pct of 75.

I have 6 LTO6 drives, all attached via 8GB fiber in a IBM ts3500 library.

This same behavior occurred when we were using LTO5 media.

Diskpool will be at some level of more than 50% full. A manual migration is triggered, which spawns the 6 migrations from disk to tape.

Despite having a pct migr of over 50% (or over 5tb), 2 of the jobs end almost right away with a closure code of "Success" after moving about 200-300MB of data.

Within 30 minutes, the system has Completed "successfully" on all but 1 migration job. I'll have 40-50% of the diskpool still migrate-able.
Canceling the job and restarting the migration spawns 6 jobs again, which follow the same pattern all over again.

What kind of logic is TSM using in determining it only needs a single migration stream when there's that much data? How can I work around this? I'm filling my diskpool every night and failing over to direct-to-tape backups as well as just flat out having backup failures because even an auto-triggered migration follows the same pattern. Starts all 6 migrations, and within 30 minutes, there's maybe 2 processes but usually just 1.

moon-buddy · Jul 10, 2015

What is the type of data you are backing up? Is it one big file?

LED888 · Jul 10, 2015

Data migration is done per node.
If one node have lots of data or some very large data like a TDP, the migration will take some time to complete. While other node data migration complete due to few files have change since the last backup.

Good Luck,
Sias

sandragon · Jul 10, 2015

Moon-buddy: its varying servers, from file servers to application servers. All of them are file system only.

led888: so the implication here then is that that 5tb of data (generated in one night) is just one server's filesystem? Is there a way to check, or to see how much data in a domain came from each node in the last 24 hours?

sandragon · Jul 10, 2015

I think I found the culprit.

Node Name Type Filespace Name FSID Storage Pool Name Number of Files Physical Space Occupied (MB) Logical Space Occupied (MB)
CRXUSADB303_ORACLE Bkup /adsmorc 1 DISKPOOL01 2 23.262 23.262
NWTDWDB30_ORACLE Bkup /adsmorc 1 DISKPOOL01 519 4,760,162 4,760,162

A single oracle database pushed over 5tb of data last night. And it does it every night. The DBA's configured only a full Level 0 script for the system using TDP. So you were right, led888, it's a single node. And since a migration can only move one node's data over one migration job, I'm stuck dogpaddling after this server every night and never catching up.

moon-buddy · Jul 12, 2015

So, it is really one big file as I said.

Files, in the strictest sense, does not mean plain files. Is can also mean DB files.

Migrating disk to tape closes all but one of 6 processes

sandragon

moon-buddy

LED888

sandragon

sandragon

moon-buddy

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics