Networker

Re: [Networker] hypothetical question

2011-01-24 16:32:31
Subject: Re: [Networker] hypothetical question
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 24 Jan 2011 16:31:45 -0500
jee wrote:
Hi George,

may thanks for this fantastic analysis. Nice one.

Jee, please see my responses and caution below, including my question on how to test my file type devices. Would be curious to know what others would recommend.




Just a quick note about the find command:

It doesnt work with file names containing spaces and/or special characters (I am using att's ksh on debian linux): I had some trouble with files like (1) "two words.txt"
(2) "two - words.txt"
(3) "two & words.txt"

I could fix the spaces problem as follows:
- append a colon and remove spaces after file name ({}) passed by xargs to sh (line 2)
- use colons as the field separators with awk (line 3)

( awk expects 2 fields per line but spaces on file names are treated as field separators and introduce more fileds )


Ah, yes, good idea. We've all encountered files with spaces before - it's always a pain - but I've not seen NW use any spaces or special characters in file names. It's always possible, however, and your enhancement is a better universal check regardless of whether its NW or some other data that you're checking.


I have split the command into a few lines and used line numbers to make it more readable
+--------------------------
01: find . -type f |\
02:   xargs -i /bin/sh -c 'echo -n "{}:"; stat -c "%s - (%b*%B)" "{}" | bc' |\
03:   awk -F":" '{print $2 " " $1}' |\
04:   egrep -v '^-|^0'
----------------------------------------------+



I am still playing with the special character "&"...I haven't solved this yet (it may need some encapsulation using scripts or funcions)

Not sure about that one. A script could certainly be made to deal with it.

One thing I discovered is that it appears that the only files under /nsr that are sparse - at least in our shop - is the single 'volume' file that gets created the first time we create a regular file type device. Otherwise, the rest of the data files are non-sparse. Also, there was one lone core file that was sparse.

*** CAUTION *** *** CAUTION ***  *** CAUTION *** *** CAUTION ***
I don't know about AFTDs, but I suspect it's similar. Each file type device will have one of these 'volume' files. I tried using 'tar' with the 'S' option to preserve the sparseness, and it works, *BUT* it did not preserve the block size when untarring. The original block size was 36K. The new block size for the untarred file was 4K. Next, I tried it with 'cp' (default is --sparse=auto). This also resulted in a 4K copy. I also tried forcing sparseness with --sparse=always. Again, it creates a 4K block copy. Finally, I tried it with 'rsync --sparse', and it creates a 12K block copy. In all cases, I ran these commands as root and created the copies on the same file system as /nsr, which is where our file type devices live. In all cases, the MD5 checksums are the same as the original, however.

Clearly, it would appear that none of these tools seems to be completely reliable in terms of preserving the block size - unless, of course, I'm just completely missing something here?

I would wager a very small amount (ahem) that if you copy a sparse file, and you don't preserve the sparseness then the application that uses that file will probably be OK, assuming, of course, that you don't use up too much disk space in the process of not preserving the sparseness. After all, why will it care about areas that are now occupied by, say, zeros, if it's not going to look there anyway. And if it does, chances are that it might interpret those as nothing - maybe - just like before????

On the other hand, if you instead preserve the sparseness, but you end up with a smaller block size - well, that could be very *bad*. The application might miss something in reading that file or interpreting it. If I had a choice, I think I'd rather not preserve the sparseness, and have the copy be bigger, versus preserving it but having the block size be smaller than before. I think that could create more of a problem????

In the case of NW, I suspect it wouldn't matter with the 'volume' file, but with other applications, it very well might.

Question:
1. How can I test this with NW?

Should I tar off a copy of one of these file type devices (with or without the sparse option), untar it, change the configuration on NW to use the new copy, and then see if I can mount it or maybe also try running scanner against the copy? Would that suffice as a test? I would think that NW will try to read that volume file when I try to mount it or when it runs scanner, correct?

Or should I do something more involved like cloning one of the save sets on that untarred file type device copy to a tape to see if it works. If so, NW would have to have read that volume file, and I don't think NW is going to write to the volume file, just read it. Also, maybe try extracting a save set from the clone tape, or even from the untarred copy of the device itself, using scanner -x with uasm and extracting in raw mode and comparing to the same save set extracted from the original FTD?

These file type devices contain server bootstrap and indexes but are cloned to actual tape immediately after they're written to disk. They're also staged to a different tape pool after two weeks. Ordinarily, I wouldn't try to recover a server bootstrap or index unless in an emergency. I don't want to have NW end up trying to do some disaster recovery, but if using uasm in raw recover mode with scanner -x option then maybe that would recover it from the untarred FTD copy as a plain file to somewhere else safely and not try to do anything with it so I could then compare it to the same recovered copy when reading from the original FTD?

2. Otherwise, how can I reliably make a copy of /nsr and preserve the sparseness of those 'volume' files? Maybe it doesn't matter? They're only 65536 bytes, anyway, and there's only a few devices with one volume file per device, so there's no danger with filling up disk space. I wonder if NW would even care if a copy was non-sparse or even a smaller bock size but still sparse.

Thanks.

George




jee




On Friday 21 January 2011 22:13:09 George Sinclair wrote:
Tim Mooney wrote:
In regard to: Re: [Networker] hypothetical question, Valere Binet said

(at...:
We did the first transfer with star and run rsync once a day.
I wrote "hoping" because we didn't test the promotion of a storage
node to
server ... yet.
We have NetWorker 7.4.4-1 on all our systems. The server and storage
nodes
are running CentOS 5.5.

From this conversation, it seems our strategy could be doomed to fail
because :
1) rsync runs on a live server (we don't shutdown nsr)
2) we don't use the -S option
Well, it wouldn't hurt to use any options that preserve sparse files, but
as I said earlier, these days I'm not even certain any of the indexes
will ever be sparse.  I think the only place there would be any danger is
(unfortunately) the media database, but those files are so small that I
think the danger is minimal.  It might be worth a google to "detect
sparse files" and then follow that procedure, to see if any of your db
files are sparse.
This command seems to do the trick:

cd /nsr
find . -type f | xargs -i /bin/sh -c 'echo -n {} " "; stat -c "%s -
(%b*%B)" {} | bc' | awk '{print $2 " " $1}'  | egrep -v '^-|^0'

OK, I googled this, and from what I can infer, if the reported file size
is smaller than blocksize*numblocks then the file is *not* sparse.
Otherwise, if the file size is larger than blocksize*numblocks then the
file is sparse. A good source of information on sparse files was found at:

http://administratosphere.wordpress.com/2008/05/23/sparse-files-what-why-an
d-how/

So I'm on RH linux, and I have a file named /tmp/snapshot3.jpg, and I
ran the following three commands:

/bin/ls -ls /tmp/snapshot3.jpg
24 -rw-r--r-- 1 owner group 21790 Jan 20 19:41 /tmp/snapshot3.jpg

du -sh /tmp/snapshot3.jpg
24K     /tmp/snapshot3.jpg

stat -c "%s %b %B" /tmp/snapshot3.jpg
21790 48 512

It looks as if the number of blocks reported by stat is 48 at 512 bytes
per block, whereas the number of blocks reported by 'ls' is 24 at 1024
bytes per block. Not sure why they differ, but 48*512=24*1024 or 24576
in either case. Now, 24576 is larger than the file size of 21790 so the
file is not sparse.

Next, I created a sparse file, using the following command as shown in
the google/URL source above:

dd if=/dev/zero of=/tmp/sparse bs=1 count=1 seek=1024k
1+0 records in
1+0 records out
1 byte (1 B) copied, 4.2e-05 seconds, 23.8 kB/s

I then run the same three commands as before:

/bin/ls -ls /tmp/sparse
8 -rw-r--r-- 1 owner group 1048577 Jan 21 16:24 /tmp/sparse

du -sh /tmp/sparse
8.0K    /tmp/sparse

stat -c "%s %b %B" /tmp/sparse
1048577 16 512

Either way, 16*512=8*1024=8192 is less than 1048577 so the file is
indeed sparse.

So, I ran this command on our primary backup server:

cd /nsr
find . -type f | xargs -i /bin/sh -c 'echo -n {} " "; stat -c "%s -
(%b*%B)" {} | bc' | awk '{print $2 " " $1}'  | egrep -v '^-|^0'

and it reported the following:

20729856 ./cores/nsrmmgd/core.4974
28672 ./ftype_devices/server_ftype2/volume
28672 ./ftype_devices/server_ftype_orion_DR/volume
28672 ./ftype_devices/server_ftype6/volume
28672 ./ftype_devices/server_ftype5/volume
28672 ./ftype_devices/server_ftype4/volume
28672 ./ftype_devices/server_ftype/volume
28672 ./ftype_devices/server_ftype3/volume
28672 ./ftype_devices_mmrecov/server_ftype_DR_orion2/volume

As you can see, the only sparse files, other than the one core file, are
the volume files that are under the directories for our file type
devices. A typical directory looks like this:

cd /nsr/ftype_devices/server_ftype2
/bin/ls -l
total 661412
-rw------- 1 root root        47 Mar 15  2010 .nsr
-rw------- 1 root root 313032704 Mar 15  2010 4003329465.0
-rw------- 1 root root  24117248 Mar 15  2010 4020106043.0
-rw------- 1 root root 312999936 Mar 15  2010 4154322682.0
-rw------- 1 root root  26411008 Mar 15  2010 4171099824.0
-rw------- 1 root root     65536 Mar 15  2010 volume

That was all that was reported. Nothing under /nsr/mm, /nsr/index, etc.
The command 'stat -c "%s - (%b*%B)" fname | bc' reports a negative or 0
value for every other file.

1. Maybe we should have our file type devices located on another file
system or directory other than /nsr so they won't be impacted if we tar
/nsr somewhere? Maybe it's moot if we're using the '--sparse' option?

2. I did read the man page for GNU tar, rsync and cp, and they all
support sparse files, using the '--sparse' option. It appears that cp
does this by default,  but you can force it to with '--sparse=always' or
not to with '--sparse=never'. I tried copying the sparse file that I
created, and it does create a sparse copy with the same block size of
8K. Finally, I tried copying it without preserving the sparseness as:

/bin/cp --sparse=never sparse notsparse

My checks showed that the copy of the file was no longer sparse, and now
'du -sh' reports a size of 1.1M, and '/bin/ls -ls' reports 1032.
However, they both still have the same MD5 checksum.

3. I do notice, however, that if I create a tar file not using the
--sparse option then I end up with a non-sparse copy wherein 'du -sh'
reports 1.1M for a size, but if I use the sparse option then I get a
copy wherein 'du -sh' reports 0 bytes but the same file size. Odd?

Clearly, tarring up /nsr and untarring it somewhere else, and validating
it with checksums, is not going to prove that you have an *exact* copy
in terms of the blocks; you might not have preserved the sparse files.
That could create problems in that 1. those copies will now take up more
disk space which could be problematic and 2. some type of application
that is expecting a database file to be sparse might not be happy??? Not
sure about that. Maybe that wouldn't matter, or the application wouldn't
care? Hmm ...

George

NetWorker doesn't technically need to be down, you just need to have the
databases quiescent during the entire sync.  The problem is that it's not
always easy to predict when NetWorker is going to kick off an automated
index checking process, so it's hard to predict when you're going to have
an appropriate window to perform your sync.

You could do things like checksumming, log scraping, or a second "dry
run" rsync to try detect whether any of the db files changed, but any of
those methods add complexity.  Even pre-running a full consistency check
-- if you
force the consistency checks to run on a schedule you can predict,
they're less likely to kick off during your rsync.

Whatever you do, don't make it the only thing you have for disaster
recovery.  That's certain to bite you at a time when you least need
additional trouble.

Tim





--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>