Networker

Re: [Networker] hypothetical question

2011-01-21 17:15:23
Subject: Re: [Networker] hypothetical question
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 21 Jan 2011 17:13:09 -0500
Tim Mooney wrote:
In regard to: Re: [Networker] hypothetical question, Valere Binet said (at...:

We did the first transfer with star and run rsync once a day.
I wrote "hoping" because we didn't test the promotion of a storage node to
server ... yet.
We have NetWorker 7.4.4-1 on all our systems. The server and storage nodes
are running CentOS 5.5.

From this conversation, it seems our strategy could be doomed to fail
because :
1) rsync runs on a live server (we don't shutdown nsr)
2) we don't use the -S option

Well, it wouldn't hurt to use any options that preserve sparse files, but
as I said earlier, these days I'm not even certain any of the indexes will
ever be sparse.  I think the only place there would be any danger is
(unfortunately) the media database, but those files are so small that
I think the danger is minimal.  It might be worth a google to "detect
sparse files" and then follow that procedure, to see if any of your db
files are sparse.

This command seems to do the trick:

cd /nsr
find . -type f | xargs -i /bin/sh -c 'echo -n {} " "; stat -c "%s - (%b*%B)" {} | bc' | awk '{print $2 " " $1}' | egrep -v '^-|^0'

OK, I googled this, and from what I can infer, if the reported file size is smaller than blocksize*numblocks then the file is *not* sparse. Otherwise, if the file size is larger than blocksize*numblocks then the file is sparse. A good source of information on sparse files was found at:

http://administratosphere.wordpress.com/2008/05/23/sparse-files-what-why-and-how/

So I'm on RH linux, and I have a file named /tmp/snapshot3.jpg, and I ran the following three commands:

/bin/ls -ls /tmp/snapshot3.jpg
24 -rw-r--r-- 1 owner group 21790 Jan 20 19:41 /tmp/snapshot3.jpg

du -sh /tmp/snapshot3.jpg
24K     /tmp/snapshot3.jpg

stat -c "%s %b %B" /tmp/snapshot3.jpg
21790 48 512

It looks as if the number of blocks reported by stat is 48 at 512 bytes per block, whereas the number of blocks reported by 'ls' is 24 at 1024 bytes per block. Not sure why they differ, but 48*512=24*1024 or 24576 in either case. Now, 24576 is larger than the file size of 21790 so the file is not sparse.

Next, I created a sparse file, using the following command as shown in the google/URL source above:

dd if=/dev/zero of=/tmp/sparse bs=1 count=1 seek=1024k
1+0 records in
1+0 records out
1 byte (1 B) copied, 4.2e-05 seconds, 23.8 kB/s

I then run the same three commands as before:

/bin/ls -ls /tmp/sparse
8 -rw-r--r-- 1 owner group 1048577 Jan 21 16:24 /tmp/sparse

du -sh /tmp/sparse
8.0K    /tmp/sparse

stat -c "%s %b %B" /tmp/sparse
1048577 16 512

Either way, 16*512=8*1024=8192 is less than 1048577 so the file is indeed sparse.

So, I ran this command on our primary backup server:

cd /nsr
find . -type f | xargs -i /bin/sh -c 'echo -n {} " "; stat -c "%s - (%b*%B)" {} | bc' | awk '{print $2 " " $1}' | egrep -v '^-|^0'

and it reported the following:

20729856 ./cores/nsrmmgd/core.4974
28672 ./ftype_devices/server_ftype2/volume
28672 ./ftype_devices/server_ftype_orion_DR/volume
28672 ./ftype_devices/server_ftype6/volume
28672 ./ftype_devices/server_ftype5/volume
28672 ./ftype_devices/server_ftype4/volume
28672 ./ftype_devices/server_ftype/volume
28672 ./ftype_devices/server_ftype3/volume
28672 ./ftype_devices_mmrecov/server_ftype_DR_orion2/volume

As you can see, the only sparse files, other than the one core file, are the volume files that are under the directories for our file type devices. A typical directory looks like this:

cd /nsr/ftype_devices/server_ftype2
/bin/ls -l
total 661412
-rw------- 1 root root        47 Mar 15  2010 .nsr
-rw------- 1 root root 313032704 Mar 15  2010 4003329465.0
-rw------- 1 root root  24117248 Mar 15  2010 4020106043.0
-rw------- 1 root root 312999936 Mar 15  2010 4154322682.0
-rw------- 1 root root  26411008 Mar 15  2010 4171099824.0
-rw------- 1 root root     65536 Mar 15  2010 volume

That was all that was reported. Nothing under /nsr/mm, /nsr/index, etc. The command 'stat -c "%s - (%b*%B)" fname | bc' reports a negative or 0 value for every other file.

1. Maybe we should have our file type devices located on another file system or directory other than /nsr so they won't be impacted if we tar /nsr somewhere? Maybe it's moot if we're using the '--sparse' option?

2. I did read the man page for GNU tar, rsync and cp, and they all support sparse files, using the '--sparse' option. It appears that cp does this by default, but you can force it to with '--sparse=always' or not to with '--sparse=never'. I tried copying the sparse file that I created, and it does create a sparse copy with the same block size of 8K. Finally, I tried copying it without preserving the sparseness as:

/bin/cp --sparse=never sparse notsparse

My checks showed that the copy of the file was no longer sparse, and now 'du -sh' reports a size of 1.1M, and '/bin/ls -ls' reports 1032. However, they both still have the same MD5 checksum.

3. I do notice, however, that if I create a tar file not using the --sparse option then I end up with a non-sparse copy wherein 'du -sh' reports 1.1M for a size, but if I use the sparse option then I get a copy wherein 'du -sh' reports 0 bytes but the same file size. Odd?

Clearly, tarring up /nsr and untarring it somewhere else, and validating it with checksums, is not going to prove that you have an *exact* copy in terms of the blocks; you might not have preserved the sparse files. That could create problems in that 1. those copies will now take up more disk space which could be problematic and 2. some type of application that is expecting a database file to be sparse might not be happy??? Not sure about that. Maybe that wouldn't matter, or the application wouldn't care? Hmm ...

George


NetWorker doesn't technically need to be down, you just need to have the
databases quiescent during the entire sync.  The problem is that it's not
always easy to predict when NetWorker is going to kick off an automated
index checking process, so it's hard to predict when you're going to have
an appropriate window to perform your sync.

You could do things like checksumming, log scraping, or a second "dry run"
rsync to try detect whether any of the db files changed, but any of those
methods add complexity. Even pre-running a full consistency check -- if you
force the consistency checks to run on a schedule you can predict, they're
less likely to kick off during your rsync.

Whatever you do, don't make it the only thing you have for disaster
recovery.  That's certain to bite you at a time when you least need
additional trouble.

Tim


--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER