What's the scoop on hard links?

ldmwndletsm

ADSM.ORG Senior Member
Joined
Oct 30, 2019
Messages
232
Reaction score
5
Points
0
PREDATAR Control23

Hi, I hunted around and found a few old discussions of this here, and some sundry IBM documentation, but I have several questions on this issue. We have a large collection of data that makes extensive use of hard links. As a test, I ran a first-time backup of a 1.7 TB file system (one of over a hundred such file systems), and it used up 2.6+ TB of tape! At first, this seemed very puzzling before I finally figured out that it backed up each instance of the hard links. Sheesh! I then tried to restore it, but it didn't rebuild the hard links, and we ran out of space on the target file system. I then found an IBM document (https://www.ibm.com/support/pages/apar/IT02889 ) stating that the resourceutilization needs to be set to 1 (default for restores) to allow the links to be reestablished. We had it at a higher value for better optimization on backups. Anyway, this worked, and everything looks correct. There was also one source that suggested using a no query restore as opposed to a classic restore? I used a classic restore, however, so I haven't tested that.

1. Is there any way to turn off this "back up each hard link instance" carte blanche behavior and instead only back up the metadata for the hard links and one copy of a given hard link, not all of them?

This seems an utter waste of tape, time and resources.

2. Why does TSM have this behavior?

I don't see this with EMC NetWorker. If I back up a 1.7 TB file system, the size of the backup is 1.7 TB. Restoring the data rebuilds the links, and as I recall, there is the same caveat, as with TSM, in that all the links need to be restored simultaneously, not in groups or just some of them. Otherwise, it works just dandy with no redundant copies.

3. Is there some option that we could set that would skip all but one (for the given inode) on the backups and still allow us to rebuild the hard links?

We need to be able to rebuild the hard links from a restore. We don't want to have to manually recreate them. They do NOT have predictable naming conventions.

4. I thought I saw something in the IBM documentation that suggested that an archive, as opposed to a backup, does not do this? I can't find that page now, but I may have misunderstood.

We need to be able to run incrementals, so an archive would not work as that would force a full. Regardless, does anybody know if the behavior for archiving is the same for hard links?
 
PREDATAR Control23

Marclant,

Thanks for your response. :) I had seen that earlier (I should have noted such in my post). I haven't spoken directly with IBM about this. Maybe they might be able to shed a little more light on it. If I come across anything, I'll follow up here with an update.

Perhaps, there are not that many operations (relative, of course) making heavy use of hard links these days wherein the TSM behavior is a major imposition. My understanding is that the product has migrated between three companies over they years, with IBM being the third, so maybe it's a legacy left over from before. Don't know, just surmising. Then again, since TSM tracks every file separately in the database then maybe there's some limitations, but still seems odd that it would require having to back up all the matching inodes as their own entries.
 
Top