Amanda-Users

Re: amanda inparallel not working on large filesystems

2003-08-15 04:22:37
Subject: Re: amanda inparallel not working on large filesystems
From: jason.walton AT nomadsoft.co DOT uk
To: amanda-users AT amanda DOT org
Date: Fri, 15 Aug 2003 09:06:41 +0100
Hello all,
Many thanks to all those who contributed to solving my dillema, my dumps
are now down to under six hours, I've included my gtar.wrapper script so
that anybody else who ever has the same symptoms as me, can speed their
backups up.
My symptoms were:
1) backups suddenly started taking fourteen hours to complete
2) gtar was running incremental backups on a client, even though I am doing
full backups everyday

The solution to (1) had nothing to do with amanda, it was a simple RAID
controller problem. Once the controller was repaired, the backups went back
to 7 hours.
This still left (2) doing incrementals which take forever, effectively
forcing one tar to wait for the other to complete. The solution is to strip
out the --listed-incremental flag and its subsequent argument.

I've included the gtar wrapper's main skeleton and also my before and after
results from amanda. I was thinking of switching to a commercial product,
but I've seen the light. Amanda is truly a great product, that can only
keep on getting better. Many thanks to all who have helped develop it over
the years.

.#!/bin/ksh
###
# This script is called by an Amanda client to run GNU tar.  We look
# through the arguments for what is being processed and (optionally)
# run extra things before and/or after GNU tar, or alter the actual
# command run (e.g. different GNU tar flags and options or a completely
# different program).
#
# Remember that this script runs as root under the Amanda runtar program.
# Exercise appropriate caution.
#
# To test, run with the DEBUG environment variable set to "echo".
# The command line need not look exactly like what Amanda issues, but
# must include the --directory <dir-to-back-up> and --file <outfile>
# flags as well as the trailing ".".  To test an estimate, set the
# output file to "/dev/null":
#
#   env DEBUG=echo ./gtar-wrapper.ksh --directory /whatever --file /dev/null .
#
# and to test the real dump, set it to "-":
#
#   env DEBUG=echo ./gtar-wrapper.ksh --directory /whatever --file - .
#
# John R. Jackson (jrj AT purdue DOT edu)
# Purdue University Computing Center
#
# 24-Mar-00: Initial version.
#
# 12-Sep-01: Fixed several bugs (sigh) and converted the code so it would
#          work under bash as well as ksh.
###

#PN=${0##*/}
#export DEBUG=echo

if [[ -z $DEBUG ]]
then
   log=/tmp/amanda$0.$$       # <<< change as needed
   log=/tmp/amanda/gtarwrapper.$$         # <<< change as needed
   rm -f $log
else
   log=/dev/tty
fi
GTAR=${GTAR:-/usr/local/bin/gtar}         # <<< change as needed
SED=${SED:-/usr/bin/sed}                  # <<< change as needed

###
# Define functions to be called at various execution points.  The name
# determines when the function is called:
#
#  pre_estimate_* called before a GNU tar estimate
#  pre_real_*           called before a GNU tar real run
#  run_estimate_* called to run a GNU tar estimate
#  run_real_*           called to run a GNU tar real run
#  post_estimate_*      called after a GNU tar estimate
#  post_real_*          called after a GNU tar real run
#
# The rest of the function name comes from the directory being processed
# by converting everything other than alphanumerics and the underscore
# to an underscore, e.g. /home/abc becomes _home_abc.
#
# If a function does not exist for a particular execution point, nothing
# special is done.
###

function run_estimate__
{
   echo "$PN: preparing to estimate /" >> $log
   a=$(df -k / |awk 'NR==2 {print $3*1024}')
   echo "Total bytes written:" $a >&2
   return $?
}

function run_estimate__boot
{
   echo "$PN: preparing to estimate /boot" >> $log
   a=$(df -k /boot |awk 'NR==2 {print $3*1024}')
   echo "Total bytes written:" $a >&2
   return $?
}


###
# Utility functions.
###

function IsFunction
{
   f=$1
   echo "$PN: looking for $f" >> $log
   if [[ -n "$BASH_VERSION" ]]
   then
      t=$(type -t "$f" 2> /dev/null)
      r=$?
   else
      t=$(whence -v "$f" 2> /dev/null)
      r=$?
   fi
   if [[ $r -eq 0 ]]
   then
      if [[ "$t" != *function* ]]
      then
         r=1
      fi
   fi
   return $r
}


###
# Start of main code.
###

###
# Set up a log file in /tmp/amanda.
###

echo "$PN: start: $(date)" >> $log
echo "$PN: pre-args:" "$@" >> $log
#
# JOW 20030813 now strip out the listed-incremental flag. Note that the next 
argument is the file it wishes to write to
#
POSTARGS=`echo $@ | /usr/bin/nawk '{for (i=1;i<=NF;i++) {if 
("--listed-incremental" == $i) {i=i+1} else {printf("%s ",$i)}}}END {print}'`
echo "$PN: post-args:" "$POSTARGS" >> $log

###
# Find the directory Amanda is asking us to back up.  Also figure out
# if we are doing an estimate or the real thing.
###

typeset -i get_directory=0
typeset -i get_file=0
directory_arg=
file_arg=

for arg
do
   if ((get_directory))
   then
      directory_arg=$arg
      get_directory=0
   elif ((get_file))
   then
      file_arg=$arg
      get_file=0
   elif [[ X"$arg" = X"--directory" ]]
   then
      get_directory=1               # --directory dir-to-back-up
   elif [[ X"$arg" = X"--file" ]]
   then
      get_file=1              # --file file-to-write-to
   fi
done

echo "$PN: directory: $directory_arg" >> $log
echo "$PN: file: $file_arg" >> $log

if [[ X"$file_arg" = X"-" ]]
then
   type=real                        # real dump if output is stdout
elif [[ X"$file_arg" = X"/dev/null" ]]
then
   type=estimate              # estimate if output is /dev/null
else
   type=unknown                     # no idea what is going on
fi

###
# Make the directory name into something we can use as part of a function
# name by converting the slashes to underscores.  Hopefully there is
# not anything else annoying in the name.
###

d=$(echo $directory_arg | $SED 's/[^a-zA-Z0-9_]/_/g')

###
# See if there is something to be done before we call GNU tar.
###

if IsFunction pre_${type}_$d
then
   echo "$PN: running pre_${type}_$d" >> $log
   pre_${type}_$d
else
   echo "$PN: pre_${type}_$d not found so nothing special run" >> $log
fi

###
# Run GNU tar or a private function.
###

#
#JOW 20030813 note the use of POSTARGS, was using $@ but now we strip out the 
incremental argument
if IsFunction run_${type}_$d
then
   echo "$PN: running run_${type}_$d $POSTARGS" >> $log
   run_${type}_$d "$POSTARGS"
   exit_code=$?
else
   echo "$PN: running $GTAR $POSTARGS" >> $log
   $DEBUG $GTAR $POSTARGS
   exit_code=$?
   echo "$PN: exit code is $exit_code" >> $log
fi

###
# See if there is something to be done after we ran GNU tar.
###

if IsFunction post_${type}_$d
then
   echo "$PN: running post_${type}_$d" >> $log
   post_${type}_$d
else
   echo "$PN: post_${type}_$d not found so nothing special run" >> $log
fi

###
# Exit with the GNU tar exit code.
###

echo "$PN: end: $(date)" >> $log
exit $exit_code


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dump from the end of july, before the RAID was fixed
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
STATISTICS:
                          Total       Full      Daily
                        --------   --------   --------
Estimate Time (hrs:min)    0:12
Run Time (hrs:min)        14:55
Dump Time (hrs:min)       41:46      41:46       0:00
Output Size (meg)       142504.2    142504.2        0.0
Original Size (meg)     142504.2    142504.2        0.0
Avg Compressed Size (%)     --         --         --
Filesystems Dumped           39         39          0
Avg Dump Rate (k/s)       970.5      970.5        --

Tape Time (hrs:min)        3:27       3:27       0:00
Tape Size (meg)         142505.4    142505.4        0.0
Tape Used (%)              71.3       71.3        0.0
Filesystems Taped            40         40          0
Avg Tp Write Rate (k/s) 11751.0    11751.0        --

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dump from the statr of August, after the RAID was fixed
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
STATISTICS:
                          Total       Full      Daily
                        --------   --------   --------
Estimate Time (hrs:min)    0:12
Run Time (hrs:min)         7:04
Dump Time (hrs:min)       21:39      21:39       0:00
Output Size (meg)       140115.1    140115.1        0.0
Original Size (meg)     140115.1    140115.1        0.0
Avg Compressed Size (%)     --         --         --
Filesystems Dumped           39         39          0
Avg Dump Rate (k/s)      1841.4     1841.4        --

Tape Time (hrs:min)        3:24       3:24       0:00
Tape Size (meg)         140116.3    140116.3        0.0
Tape Used (%)              70.1       70.1        0.0
Filesystems Taped            39         39          0
Avg Tp Write Rate (k/s) 11696.8    11696.8        --

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dump from last night, after the RAID was fixed and gtar wrapper hacked
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
STATISTICS:
                          Total       Full      Daily
                        --------   --------   --------
Estimate Time (hrs:min)    0:14
Run Time (hrs:min)         5:42
Dump Time (hrs:min)       19:34      19:34       0:00
Output Size (meg)       140495.6    140495.6        0.0
Original Size (meg)     140495.6    140495.6        0.0
Avg Compressed Size (%)     --         --         --
Filesystems Dumped           39         39          0
Avg Dump Rate (k/s)      2041.7     2041.7        --

Tape Time (hrs:min)        3:26       3:26       0:00
Tape Size (meg)         140496.8    140496.8        0.0
Tape Used (%)              70.3       70.3        0.0
Filesystems Taped            39         39          0
Avg Tp Write Rate (k/s) 11641.2    11641.2        --