A question on hard links?

ldmwndletsm

ADSM.ORG Senior Member
Joined
Oct 30, 2019
Messages
232
Reaction score
5
Points
0
I found a little discussion of this forum, and I did read this: https://www.ibm.com/support/knowledgecenter/en/SSEQVQ_8.1.4/client/c_bac_hardlnkunx.html

This is on Linux, using Spectrum Protect 8.1.3.

I conducted a test last night wherein I created two hard links (hlink2_file1, hlink2_duplicate_file1) to an already existing file (file1) under that same directory. I checked, and as expected, all three had the same inode numbers (`ls -li`), the link count was 3, and all modtimes, attributes and MD5 digests matched. The parent file system (ext4) had previously been backed up a number of times using an 'incr', and file1 existed prior to the very first backup. The two hard links had never previously existed. I then restored the file system to another file system (xfs), and the three files were restored, but one of them was assigned a different inode than the other two. So file1 was given an inode of 6381833, with a link count of 1, but hlink2_file1 and hlink2_duplicate_file1 were assigned and inode of 6291572, and their link count is 2. Otherwise, all the attributes, digests and modtimes are identical between the three.

Of course, it's to be expected that restored files will be assigned new indode numbers, but I was a little surprised that TSM (or the OS?) didn't maintain the link count of 3 by making all three restored files the same "new" inode. I carried out a similar test using EMC NetWorker on a different Linux box (source files on an ext3 file system; restored to a different ext3 file system), and when I restore hard links there, it works as expected. I think one problem I ran into in the past with NetWorker was if you failed to recover all the hard links then mischief might ensue. But this was not the case in these tests.

Does anyone know what might have happened here? Is this the expected TSM behavior wherein 2 of the 3 or 4 of the 5, etc. would be restored as hard links but one would not? Unless I'm missing somethig obvious here, could anyone test this?


BACKGROUND
We have an archive of data wherein when someone checks out a unit directory to make changes, and then checks it back in, a new version subdirectory is created wherein any files that have not changed are hard links to their counterparts in the original subdirectory (version-01). Otherwise, any new files are created in a new version directory (02, 03, etc.). A given archive unit directory could have many version subdirectories. Symbolic links could be used instead, but I think the hard linking is a result of rsync, or some such thing, and was not contrived.

Anyway, based on the behavior that I see from TSM, it looks like if we restore one of these, and lets say, for example, that it has a total size of 250 GB, with 10 version subdirectories, and most of the files have a link count of 10, then even though we'd end up with 10 subdirectories, most of the files would have a link count of 9, and the files in one of the version subdirectories would have a link count of 1. Otherwise, everything would work fine since the content, attributes, etc. would remain intact. But in this case, we'd have a redundant copy of the files (as far as the file system is concerned), but not a subset or superset. Probably nobody would notice unless they were comparing the link counts with a manifest of the original data attributes. So nothing would break, but we'd end with 500 GB of data not 250 GB.
 
Back
Top