Re: [Networker] hypothetical question

Tim Mooney wrote:

In regard to: Re: [Networker] hypothetical question, Valere Binet said(at...:

We did the first transfer with star and run rsync once a day.
I wrote "hoping" because we didn't test the promotion of a storagenode to
server ... yet.
We have NetWorker 7.4.4-1 on all our systems. The server and storagenodes
are running CentOS 5.5.
From this conversation, it seems our strategy could be doomed to fail
because :
1) rsync runs on a live server (we don't shutdown nsr)
2) we don't use the -S option


Well, it wouldn't hurt to use any options that preserve sparse files, but
as I said earlier, these days I'm not even certain any of the indexes will
ever be sparse.  I think the only place there would be any danger is
(unfortunately) the media database, but those files are so small that
I think the danger is minimal.  It might be worth a google to "detect
sparse files" and then follow that procedure, to see if any of your db
files are sparse.


This command seems to do the trick:

cd /nsr

find . -type f | xargs -i /bin/sh -c 'echo -n {} " "; stat -c "%s -(%b*%B)" {} | bc' | awk '{print $2 " " $1}' | egrep -v '^-|^0'

OK, I googled this, and from what I can infer, if the reported file sizeis smaller than blocksize*numblocks then the file is *not* sparse.Otherwise, if the file size is larger than blocksize*numblocks then thefile is sparse. A good source of information on sparse files was found at:


http://administratosphere.wordpress.com/2008/05/23/sparse-files-what-why-and-how/

So I'm on RH linux, and I have a file named /tmp/snapshot3.jpg, and Iran the following three commands:


/bin/ls -ls /tmp/snapshot3.jpg
24 -rw-r--r-- 1 owner group 21790 Jan 20 19:41 /tmp/snapshot3.jpg

du -sh /tmp/snapshot3.jpg
24K     /tmp/snapshot3.jpg

stat -c "%s %b %B" /tmp/snapshot3.jpg
21790 48 512

It looks as if the number of blocks reported by stat is 48 at 512 bytesper block, whereas the number of blocks reported by 'ls' is 24 at 1024bytes per block. Not sure why they differ, but 48*512=24*1024 or 24576in either case. Now, 24576 is larger than the file size of 21790 so thefile is not sparse.

Next, I created a sparse file, using the following command as shown inthe google/URL source above:


dd if=/dev/zero of=/tmp/sparse bs=1 count=1 seek=1024k
1+0 records in
1+0 records out
1 byte (1 B) copied, 4.2e-05 seconds, 23.8 kB/s

I then run the same three commands as before:

/bin/ls -ls /tmp/sparse
8 -rw-r--r-- 1 owner group 1048577 Jan 21 16:24 /tmp/sparse

du -sh /tmp/sparse
8.0K    /tmp/sparse

stat -c "%s %b %B" /tmp/sparse
1048577 16 512

Either way, 16*512=8*1024=8192 is less than 1048577 so the file isindeed sparse.


So, I ran this command on our primary backup server:

cd /nsr

find . -type f | xargs -i /bin/sh -c 'echo -n {} " "; stat -c "%s -(%b*%B)" {} | bc' | awk '{print $2 " " $1}' | egrep -v '^-|^0'


and it reported the following:

20729856 ./cores/nsrmmgd/core.4974
28672 ./ftype_devices/server_ftype2/volume
28672 ./ftype_devices/server_ftype_orion_DR/volume
28672 ./ftype_devices/server_ftype6/volume
28672 ./ftype_devices/server_ftype5/volume
28672 ./ftype_devices/server_ftype4/volume
28672 ./ftype_devices/server_ftype/volume
28672 ./ftype_devices/server_ftype3/volume
28672 ./ftype_devices_mmrecov/server_ftype_DR_orion2/volume

As you can see, the only sparse files, other than the one core file, arethe volume files that are under the directories for our file typedevices. A typical directory looks like this:


cd /nsr/ftype_devices/server_ftype2
/bin/ls -l
total 661412
-rw------- 1 root root        47 Mar 15  2010 .nsr
-rw------- 1 root root 313032704 Mar 15  2010 4003329465.0
-rw------- 1 root root  24117248 Mar 15  2010 4020106043.0
-rw------- 1 root root 312999936 Mar 15  2010 4154322682.0
-rw------- 1 root root  26411008 Mar 15  2010 4171099824.0
-rw------- 1 root root     65536 Mar 15  2010 volume

That was all that was reported. Nothing under /nsr/mm, /nsr/index, etc.The command 'stat -c "%s - (%b*%B)" fname | bc' reports a negative or 0value for every other file.

1. Maybe we should have our file type devices located on another filesystem or directory other than /nsr so they won't be impacted if we tar/nsr somewhere? Maybe it's moot if we're using the '--sparse' option?

2. I did read the man page for GNU tar, rsync and cp, and they allsupport sparse files, using the '--sparse' option. It appears that cpdoes this by default, but you can force it to with '--sparse=always' ornot to with '--sparse=never'. I tried copying the sparse file that Icreated, and it does create a sparse copy with the same block size of8K. Finally, I tried copying it without preserving the sparseness as:


/bin/cp --sparse=never sparse notsparse

My checks showed that the copy of the file was no longer sparse, and now'du -sh' reports a size of 1.1M, and '/bin/ls -ls' reports 1032.However, they both still have the same MD5 checksum.

3. I do notice, however, that if I create a tar file not using the--sparse option then I end up with a non-sparse copy wherein 'du -sh'reports 1.1M for a size, but if I use the sparse option then I get acopy wherein 'du -sh' reports 0 bytes but the same file size. Odd?

Clearly, tarring up /nsr and untarring it somewhere else, and validatingit with checksums, is not going to prove that you have an *exact* copyin terms of the blocks; you might not have preserved the sparse files.That could create problems in that 1. those copies will now take up moredisk space which could be problematic and 2. some type of applicationthat is expecting a database file to be sparse might not be happy??? Notsure about that. Maybe that wouldn't matter, or the application wouldn'tcare? Hmm ...


George


NetWorker doesn't technically need to be down, you just need to have the
databases quiescent during the entire sync.  The problem is that it's not
always easy to predict when NetWorker is going to kick off an automated
index checking process, so it's hard to predict when you're going to have
an appropriate window to perform your sync.

You could do things like checksumming, log scraping, or a second "dry run"
rsync to try detect whether any of the db files changed, but any of those

methods add complexity. Even pre-running a full consistency check -- ifyou

force the consistency checks to run on a schedule you can predict, they're
less likely to kick off during your rsync.

Whatever you do, don't make it the only thing you have for disaster
recovery.  That's certain to bite you at a time when you least need
additional trouble.

Tim



--
George Sinclair
Voice: (301) 713-3284 x210

- The preceding message is personal and does not reflect any official orunofficial position of the United States Department of Commerce -

- Any opinions expressed in this message are NOT those of the US Govt. -

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER