Networker

Re: [Networker] Questions on backing up hard links?

2005-08-19 18:26:23
Subject: Re: [Networker] Questions on backing up hard links?
From: "Brian O'Neill" <oneill AT OINC DOT NET>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 19 Aug 2005 18:22:26 -0400
Here is a bit of technical info on how hard links work on a "standard" UNIX filesystem (i.e. BSD type, although most others mimic this).

Let's call each data object that would be stored on a filesystem a "file". This includes directories, device files, named pipes, symbolic links, etc. A "hard link" is NOT a data object in itself.

Each file is stored on the filesystem in available data blocks, and an inode is allocated that stores the pointers to the data blocks, and everything there is to know about the file - except the name.

Names are associated to files via special files called directories. They are nothing more than files that the system treats a little differently, and all they are is a table that lists inodes and the names we want to associate with them.

A "hard link" is nothing more than adding an additional directory entry that points to the same inode. So two or more different names refer to the same inode, and thus the same data block.

When you create a hard link, it increments the "link count" value in the inode (which starts at one). When you remove a directory entry ("rm" the file), it decrements it. If the link count reaches zero, the data blocks and inode are marked unused.

Now, as far as what Networker does, I'm not sure how it handles them specifically. However, Networker does keep track of the inodes of the files that are being backed up (you can see them with "ls -la" in recover), and when it encounters another directory entry to a file it has already backed up, it simply adds another index entry, referring to the same data.

When you recover the file, it isn't going to "recover the original and rename it", since which exactly _is_ the original? That information is not stored anywhere. If you only recover one particular entry to the file, a new file is created (inode and data blocks), and a new directory entry is created. It won't get its original inode number (except by amazing luck) because that inode may be in use already. If you recover more than one link to the same file, only one "file" is created - but each directory entry recovered will point to the same inode - so there is still only one "copy" of the data.

What if you have multiple links to the same file, delete one of them, update the data through another, then recover the one that was deleted? I'm not 100% sure, but I believe the result will be that the recovered version will be a separate file unto itself, and not related to the originals any more, so now you have two distinct copies.

What if, in the same scenario, you didn't delete the link, and recover with the "overwrite" option? Well, that may depend on the file operations used - either it truncates and rewrites the existing data, or it unlinks the one you are recovering and you end up with a separate file as above. I'd have to run a test to be sure, but can't at the moment.

Note that symbolic links are completely and utterly different and are nothing more than files that contain a path to the destination file, and that are backed up and recovered as a regular file would be - except they are very small.

George Sinclair wrote:
Does anyone know how NetWorker handles the backup/recovery of hard links or how it's supposed to?

I tested backing up two files (TEST and TEST.ln). TEST was approx 10 KB. The second file is a hard link to the first. The one thing I noticed is that NetWorker seems to indicate that only 10 KB was backed up. If I run nwrecover, I can see entries for both files, however. I guess this makes sense because I wouldn't think NetWorker would actually back up files to tape that are hard links since they share the same inode, and that would be redundant data on tape, plus they are actually the same file as far as the OS is concerned, anyway, and NetWorker only sees what the OS tells it. In other words, if I have 10 original files totaling 50 MB, and I then create a hard link to each one (link count now = 2 for all 20 files), and I back up all 20 files, NetWorker will indicate that only 50 MB was backed up, not 100 MB even though recover and verbose show all 20 pathnames.

So is NetWorker merely updating the client index with information about the hard link but not writing it to tape?

Another thing I notice is that if I remove the hard link (TEST.ln), and then I recover the hard link, NetWorker indicates that it's reading 10 KB, and when it restores it, while it has the same mtime, and same name (TEST.ln), it now has a new inode, so it's technically no longer a hard link to TEST. A diff between the files shows no differences, but editing one is not reflected in the other, so NetWorker did not recover it as a hard link. It appears to be just an ordinary copy with the original name.

How does is it able to recover it? Does it just recover the original an rename it TEST.ln? Is this the behavior one would expect?

Does it make any sense to back up hard links? My testing shows that they are not recovered as hard links so seems pointless to do so? You'd have your pathnames back, but they would just be copies taking up space, not actual hard links. You'd have to re-create the hard links from scratch. We're running an older 6.1.1 release on Solaris 2.8 primary server. I was doing my tests on a Linux client, but the backups and recovers were from our Linux storage node.

Thanks.

George

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems wit this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>