Veritas-bu

Re: [Veritas-bu] How to properly calculate the Catalog

2009-03-15 22:12:16
Subject: Re: [Veritas-bu] How to properly calculate the Catalog
From: "bob944" <bob944 AT attglobal DOT net>
To: <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Sun, 15 Mar 2009 21:46:37 -0400
> That formula in the manual is completely worthless.  I can't
> believe they still publish it.  The SIZE of the data you're
> backing up has NOTHING to do with the size of the index. What
> matters is the number of files or objects.
> [...]
> I could backup a 200 TB database with a smaller NBU catalog than

[snipping the obvious (though perhaps not to NetBackup beginners):
since 99% of the catalog is a list of paths and attributes of files
backed up, a list of a million tiny files and a list of a million
giant files are going to occupy about the same catalog size.]

> To get the real size of the index:
> 1.    Calculate number of files/objects [...] 
> (I say 200 bytes or so.  The actual number is based on the
> average length of your files' path names.  200 is actually
> large and should over-estimate.)

Um, to quote some guy...

> That formula [...] is completely worthless.

Just kidding.  Files-in-the-catalog times 200 is very old-school.
And right out of the older manuals which used 150, IIRC.

There are a couple of things to take into account here.which made me
move away from files*150--aside from the drudgery of figuring out
file-count stats per client per policy per schedule per retention.

1.  smaller sizes using the binary catalog introduced in 4.5.  No
idea what the file formats are, but in perusing various backups,
there appears to be a lot of deduplication of directory and file
names happening.

2.  catalog compression, which may or not be important to the
calculations.  Using compression, IME, reduces catalog size by
two-thirds on average, thus tripling catalog capacity for users with
longer retentions.

3.  Full backups versus incrementals.  The *imgRecord0 file is
usually the largest binary-catalog file for a backup; in an
incremental it is not appreciably smaller than in a full.  So, in
the event that an incremental finds only, say, 10 changed files in a
100,000-file selection, the size of the catalog entry for that
incremental is nowhere near what one would expect from a small
backup--it's much closer to a full.

Though this is little predictive help to a new NetBackup
installation, getting a handle on catalog sizing for existing
systems is too easy:  the number of files backed up and the size of
the files file are each lines in the metadata file.  Dividing size
by files doesn't _really_ give you the number of bytes per file
entry, but it yields a great planning metric.  This script:

#!/bin/sh
cd /usr/openv/netbackup/db/images
find . -name \*[LR] | \
while read metaname
do
    if [ -f ${metaname}.f.Z ]
    then    COMPRESSED=C
    else    COMPRESSED=" "
    fi
    awk '
        /^NUM_FILES/       { num_files = $2 }
        /^FILES_FILE_SIZE/ { files_file_size = $2 }
        END { if ( num_files > 2 && files_file_size > 2 ) {
            printf "%4d (%s %11d / %11d ) %s\n", \
                files_file_size / num_files, \
                compressed, \
                files_file_size, num_files, FILENAME

            }
        }
    ' compressed="$COMPRESSED" $metaname
done

can be used to get a handle on catalog sizing.  Sample output:
(first column is files_file_size divided by files in the backup; "C"
is for a compressed catalog entry, followed by the files-file size,
number of files and the pseudo-backupID)

  33 (C      331651 /        9884 )
./u2/1235000000/prod-std_1235118647_FULL
  36 (C     1654789 /       45203 )
./u2/1235000000/prod-std_1235119960_FULL
  33 (C      331497 /        9884 )
./u2/1235000000/prod-std_1235202798_FULL
  36 (C     1655827 /       45223 )
./u2/1235000000/prod-std_1235203103_FULL
  33 (C       74293 /        2236 )
./u2/1235000000/prod-std_1235286142_INCR
  35 (C       79497 /        2212 )
./u2/1235000000/prod-std_1235286246_INCR
  33 (C      332661 /        9884 )
./u2/1235000000/prod-std_1235808812_FULL
  36 (C     1657187 /       45245 )
./u2/1235000000/prod-std_1235810235_FULL
  32 (C       73757 /        2236 )
./u2/1235000000/prod-std_1235890933_INCR
  35 (C       79389 /        2212 )
./u2/1235000000/prod-std_1235891054_INCR
 101 (      1001512 /        9884 )
./u2/1236000000/prod-std_1236498790_FULL
 102 (      4644469 /       45185 )
./u2/1236000000/prod-std_1236498992_FULL
 446 (      1001548 /        2243 )
./u2/1236000000/prod-std_1236664989_INCR
2092 (      4646723 /        2221 )
./u2/1236000000/prod-std_1236665069_INCR

Notice the last and third-last lines.  They are a full and a diff of
the same filesystem.  imgRecord0 makes up 3.25MB of the 4.64
files_file_size whether it's a full (45,185 files) or an incremental
(2221 files).  

To loop back to the middle of this, I find that 100 bytes/file
uncompressed (35 compressed) is a good planning value for fulls on
most systems; the exceptions tend to be systems where the apps use
pathnames longer than any human would want to type.  (The imageDB
part of hot catalog backups produces a  files_file_size / files
metric more like 170.)

More than most people wanted to know.  


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>