This is a multi-part message in MIME format.
--------------090701060905080600060304
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Hal Skelly wrote:
> I started work on this script a while ago and haven't released it into
> the wild yet (I believe this was published in Sysadmin magazine a while
> ago). Thus I'm not going to guarantee it. BUT, if you are familiar
Hey -- I know your name. I got your script from Curtis Preston -- he's
got it up on the StorageMountain.com website at
<http://www.storagemountain.com/free-backup-software5.html>.
I tried using it but found that it was a bit slow and had problems
dealing with filenames with commas (since it uses the comma character as
a delimeter).
I did some fixing to the script and gave it to Curtis to put up, but it
looks like he didn't do that.
See attached README and improved version of the script with a couple of
added features.
I use it on a daily basis for a 1.5TB Windows fileserver. Let me know if
you have any problems with the new version and I'll be glad to do my
best to fix them.
Enjoy!
--
Michael L. Barrow
<michael AT mlbarrow DOT com>
--------------090701060905080600060304
Content-Type: text/plain;
name="README"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="README"
nbusplit.pl
Michael L. Barrow (michael AT mlbarrow DOT com)
2003-11-08
This Perl script is used to split a directory or filesystem into sane-
sized pieces for backing up with Veritas Netbackup. It's based on chopit.pl
that's available on storagemountain.com, W. Curtis Preston's storage
information website.
The original program allowed the user to specify the total number of streams
and it would split the filesystem into that number of more or less equal
sized pieces. For our needs, we wanted to be able to specify the size of a
stream and have the program create however many streams of that size it
needed, so I went about modifying chopit.pl.
Other modifications that I made include:
- Fixing the buildstreams() function making it up to 113 times
faster than the original chopit.pl script
- Allowing the user to give several pathnames all at once to
include in a single includes file
This script has become invaluable in backing up large filesystems and
directories. I hope it's useful for others.
Here's a sample invocation that shows me asking the script to traverse
the C:\ directory to build streams of up to 4GB in size:
C:\>nbusplit.pl -f c:\temp\inc.txt -s 4g c:\
Splitting filesystem into Netbackup streams
Filesystem: c:\
Determining directory sizes..../
Total size to divide up is 4443014315
Building streams...|
Completed in 39 second(s)
* end *
--------------090701060905080600060304
Content-Type: text/plain;
name="nbusplit.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="nbusplit.pl"
#!/usr/bin/env perl
# nbusplit.pl
# by Michael Barrow (michael AT mlbarrow DOT com)
# nbusplit.pl version 1.0
# This is a modified version of chopit.pl, by Harold F. Skelly, Jr.
# Copyright 2001, Harold F. Skelly jr.
use strict;
use Getopt::Std;
use File::Spec;
# Enable auto-flush
$|=1;
my ($streamtotal, $chunksize, %dirs, $streamct, $starttime, @STREAMS, $debug,
$verbose, $path, $filemode);
my ($totaltime, $duration) = 0;
our ($opt_c, $opt_s, $opt_v, $opt_f, $opt_a, $opt_D);
# Figure out what the path separator is on this platform
my ($path_sep);
$path_sep = File::Spec->catfile('FOO', 'FOO');
$path_sep =~ s|FOO||g;
if (!getopts('c:s:f:vaD')) {
print "Invalid option specified.\n";
usage();
}
if ((!$opt_c && !$opt_s) || !$opt_f) {
print "You must specify chunksize or streamcount and output file.\n";
usage();
}
if ($opt_c && $opt_s) {
print "You can't specify streamtotal *and* chunksize.\n";
usage();
}
if ($opt_c) {
$streamtotal = $opt_c;
if ($streamtotal < 2) {
print "You must specify at least 2 streams.\n";
usage();
}
}
if ($opt_s) {
$chunksize = size2int($opt_s);
if (!defined($chunksize)) {
print "Chunksize must be a number, or a number followed by k,
m, or g\n";
usage();
}
}
if ($opt_D) {
$debug = 1;
} else {
$debug = 0;
}
if ($opt_v) {
$verbose = 1;
} else {
$verbose = 0;
}
$filemode = ">";
if ($opt_a) {
$filemode = ">>";
}
if ($#ARGV < 0) {
print "You must specify one or more pathnames to be split.\n";
usage();
}
$totaltime = 0;
open (RSLTS,"$filemode $opt_f") || die "Unable to open for writing: $opt_f\n";
# Make it a binary file so that we don't get Windows line endings (this is a
NOOP on Unix)
binmode(RSLTS);
print "Path sep is ${path_sep}\n" if ($debug);
foreach (@ARGV) {
$path = $_;
$starttime = time();
%dirs = ();
@STREAMS = ();
$streamct = 1;
print "Splitting filesystem into Netbackup streams\n";
print "Filesystem: $path\n";
# convert any \ to /, as required
if ($path_sep == "\\") {
$path =~ s/\\/\//g;
}
printf("Determining directory sizes....");
summarize($path);
print "\nTotal size to divide up is $dirs{$path}\n\n";
# Dynamically set the chunksize if the user requested a total number of
streams
if ($streamtotal) {
$chunksize = $dirs{$path} / $streamtotal;
}
printf("Building streams...");
buildstreams($path);
printstreams();
$duration = time() - $starttime;
$totaltime += $duration;
printf("\nCompleted in %d second(s)\n", $duration);
}
# Tack on the EOF marker that can be used by scripts to test if this file is
complete
printf(RSLTS "\n#EOF#\n");
close(RSLTS);
if ($#ARGV > 0) {
printf("\nEntire execution took %d second(s)\n", $totaltime);
}
exit;
sub summarize {
# Subroutine to collect the sizes of the directories under a certain
path
# Utilizes global variables: %dirs
# arguments:
my $dir = shift; # directory to process
# variables
my @entries; # list of all files in $fdir
my $file; # loop index for @entries
my $re; # regexp to exclude certain files
my $dir_c; # directory name used to build path components
$dirs{$dir} = -s "$dir";
if (opendir(D, $dir)) {
# Collect a list of all of the files in the directory,
# excluding '.', '..', and '.snapshot'
$re = '(^\.$)|(^\.\.$)|(^\.snapshot$)|(^~snapshot)';
@entries = grep(! /$re/, readdir(D));
closedir(D);
$dir_c = $dir;
# If the specified directory ends in a slash, get rid of the
terminating
# slash, because later code will cause double slashes in a row
if ($dir_c =~ /\/$/) {
substr($dir_c, -1) = undef;
}
# Now check each of the files in the directory
foreach $file (@entries) {
next if (-l "${dir_c}/${file}"); # ignore symlinks
if (-d "${dir_c}/${file}") {
summarize("${dir_c}/${file}");
$dirs{$dir}+= $dirs{"${dir_c}/${file}"};
} else {
$dirs{$dir}+= -s "${dir_c}/${file}"
}
spinner();
}
}
print "$dir ($dir_c) [$dirs{$dir}]\n" if ($debug);
}
sub printstreams {
# print out streams and chunk sizes
my ($k, $grandsum, $sz);
print RSLTS "# nbusplit.pl split $path into $streamct streams of
$chunksize bytes\n";
if ($verbose) {print "Created $streamct streams of $chunksize bytes\n";}
for ($k=0; $k<$streamct; $k++) {
$sz = $STREAMS[$k]{size};
$grandsum+=$sz;
$STREAMS[$k]{list} =~ s/\0/\n/g;
$STREAMS[$k]{list} =~ s|/|${path_sep}|g;
printf RSLTS "NEW_STREAM\n";
print "NEW_STREAM\n" if ($verbose);
printf RSLTS "$STREAMS[$k]{list}\n";
print "$STREAMS[$k]{list}\n" if ($verbose);
print "\tSIZE=$sz\n" if ($verbose);
}
print "\nThe Grand total of all streams is $grandsum bytes\n" if
($verbose);
}
sub buildstreams {
# take a directory and return set of streams composed of
# various subdirs. and files
# look at ea. directory from the top down. If the directory size
# is LESS than chunksize, then include this (and thus all subdirs
# and files) in the # backup stream. If if is too large though,
# then loop through all of the subdirs. down one level.
my $indir = @_[0];
my ($i, $elem, $sz);
my @allelems;
my $streamlist;
my $indir_c;
print "buildstreams: streamcount=$streamct; looking at $indir\n" if
($debug);
spinner();
# check size of current dirname to see if it will fit in any existing
# stream and add it if so.
for ($i=0; $i<$streamct; $i++) {
if (!defined($STREAMS[$i]{size})) {
$STREAMS[$i]{size} = 0;
$STREAMS[$i]{list} = '';
}
if ( ($STREAMS[$i]{size} + $dirs{$indir}) <= $chunksize ) {
$STREAMS[$i]{list} .= "\0" . $indir;
$STREAMS[$i]{size} += $dirs{$indir};
return;
}
}
# We didn't find an existing stream large enough so either create
# a new stream (if it will fit in one) or descend to new subdirs.
if ( $dirs{$indir} <= $chunksize ) {
$STREAMS[$streamct]{list} = $indir;
$STREAMS[$streamct]{size} = $dirs{$indir};
$streamct++;
return;
} else {
#go down one level using opendir and readdir till done
opendir THISDIR, $indir or die "couldn't open $indir to recurse\n";
# get rid of . and .. and make all full path names
$indir_c = $indir;
$indir_c =~ s/\/$//;
@allelems = map("$indir_c/$_", grep(!/^\.\.?$/, readdir(THISDIR)));
close THISDIR;
# run the following loop twice to look a subdirs first then files
# second recursing on directories
foreach $elem (@allelems) {
next if (-f $elem);
# else we recurse on each subdir
if ( -d $elem ) {buildstreams($elem);}
}
ELEM:
foreach $elem (@allelems) {
next if (-d $elem); #we've already streamified dirs, right?
spinner();
$sz = -s $elem;
print "stuff: $elem ($sz)\n" if ($debug);
if ( -f $elem || -l $elem) {
# add to a stream if it will fit, else build a new
stream
for ($i=0; $i < $streamct; $i++) {
if (($STREAMS[$i]{size} + $sz) <= $chunksize) {
print "addstream $i: $elem ($sz)
$STREAMS[$i]{size}\n" if ($debug);
$STREAMS[$i]{list} .= "\0" . $elem;
$STREAMS[$i]{size} += $sz;
next ELEM; #we've it placed in a stream
}
}
if ( $sz > $chunksize) {
print "WARNING: Single file $elem exceeds
$chunksize bytes\n" if ($verbose);
}
$STREAMS[$streamct]{list} = $elem;
$STREAMS[$streamct]{size} = $sz;
print "newstream $streamct: $elem ($sz)
$STREAMS[$streamct]{size}\n" if ($debug);
$streamct++;
next ELEM;
}
}
}
}
sub usage {
# Prints the usage message and exits the script
print "\n\nUsage:\n nbusplit.pl [-v] [-a] -f <outfile> -c <stream
count>|-s <stream size> <directory>...\n";
print "Where: <stream count> is the number of streams to divide the
filesystem into,\n";
print "<stream size> is the maximum size of a stream, <directory> is
the pathname\n";
print "that must be split, and <out file> is the filename to store the
include list\n";
print "Option flags: -v for verbose output, -a for appending to outfile
instead of\n";
print "overwriting it.\n";
exit(1);
}
sub size2int {
# converts supplied argument to an integer, performing multiplication
# as necessary if caller included k, m, g in the argument.
# arguments:
my $x = shift;
my $multiplier = 1;
$x = lc($x);
if ($x !~ /^[0-9]+[kmg]{0,1}$/) {
return(undef);
}
if ($x =~ /k/) {
$multiplier = 1024;
}
elsif ($x =~ /m/) {
$multiplier = 1024 ** 2;
}
elsif ($x =~ /g/) {
$multiplier = 1024 ** 3;
}
return (int($x) * $multiplier);
}
sub spinner {
return if ($verbose || $debug);
$| = 1;
my $spins = "|/-\\";
our ($y);
if (!defined($y)) {
$y = 0;
} else {
$y++;
$y = 0 if ($y > 3);
}
print substr($spins, $y, 1), "\b"
}
--------------090701060905080600060304--
|