I'm so happy with Amanda that when an error occurs and a partition fails to
dump, I just wait a day or two and it's usually corrected. Maybe this is more
correctly labeled "laziness." However, I just noticed that the main web
directory on my main web server hasn't been backed up in a week, and I've lost
the level 0 backup. Now, I'm worried.
Both my tape backup Amanda server and the web server are 'www.' The web
directory is pretty large:
amanda@www:/etc/amanda/Outside$ df -h /var/www/
Filesystem Size Used Avail Use% Mounted on
/dev/sda8 18G 14G 4.3G 76% /var/www
amanda@www:/etc/amanda/Outside$
I was trying to back up this partition as a whole, using no compression and
'dump' but it often complained that it was too large. I'm backing up to DDS-3
tapes, without hardware compression, on a Sony SDT-10000 drive built into my
Dell PowerEdge 2450 server. Here's the version information:
amanda@www:/var/amanda/Outside$ amadmin Outside version
build: VERSION="Amanda-2.4.2p2"
BUILT_DATE="Tue Apr 2 21:24:21 UTC 2002"
BUILT_MACH="Linux cyberhq 2.4.18pre2 #1 SMP Tue Jan 8 18:13:43 PST 2002
i686 unknown"
CC="gcc"
paths: bindir="/usr/sbin" sbindir="/usr/sbin"
libexecdir="/usr/lib/amanda" mandir="/usr/share/man"
AMANDA_TMPDIR="/tmp/amanda" AMANDA_DBGDIR="/tmp/amanda"
CONFIG_DIR="/etc/amanda" DEV_PREFIX="/dev/"
RDEV_PREFIX="/dev/r" DUMP="/sbin/dump"
RESTORE="/sbin/restore" SAMBA_CLIENT="/usr/bin/smbclient"
GNUTAR="/bin/tar" COMPRESS_PATH="/bin/gzip"
UNCOMPRESS_PATH="/bin/gzip" MAILER="/usr/bin/Mail"
listed_incr_dir="/var/lib/amanda/gnutar-lists"
defs: DEFAULT_SERVER="localhost" DEFAULT_CONFIG="DailySet1"
DEFAULT_TAPE_SERVER="localhost"
DEFAULT_TAPE_DEVICE="/dev/null" HAVE_MMAP HAVE_SYSVSHM
LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE
AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS
CLIENT_LOGIN="backup" FORCE_USERID HAVE_GZIP
COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
amanda@www:/var/amanda/Outside$
I know that this is an old version of amanda, but it's what is distributed with
Debian stable. When I have some free time, I'll upgrade it outside of the
Debian system.
Lat night, I tried to split this partition up, with these entries in disklist
and amanda.conf:
amanda@www:/etc/amanda/Outside$ grep "/var/www" disklist
www /var/www/main/htdocs nocomp-highpri-tar -1 local
#www:/var/www/main/htdocs
www /var/www/ www-sda8-exclude-htdocs-main -1 local #www:/var/www/
excluding /var/www/main/htdocs, using nocomp-highpri-tar
amanda@www:/etc/amanda/Outside$ egrep -A 5 'nocomp|highpri|tar|www-sda8'
amanda.conf
define dumptype nocomp {
global
comment "No compression"
compress none
}
--
define dumptype highpri {
global
comment "High priority"
priority high
}
--
define dumptype tar {
global
comment "Using GNUTAR"
program "GNUTAR"
}
--
define dumptype nocomp-highpri {
nocomp
highpri
comment "No compression with high priority"
}
define dumptype nocomp-highpri-tar {
nocomp
highpri
tar
comment "No compression with high priority using GNUTAR"
}
define dumptype www-sda8-exclude-htdocs-main {
nocomp-highpri-tar
exclude "./htdocs/main"
comment "Special dumptype for www:sda8, excluding /var/www/htdocs/main,
using nocomp-highpri-tar"
}
I got these errors this morning:
These dumps were to tape Outside-15.
*** A TAPE ERROR OCCURRED: [[writing file: No space left on device]].
Some dumps may have been left in the holding disk.
Run amflush to flush them to tape.
The next tape Amanda expects to use is: Outside-16.
FAILURE AND STRANGE DUMP SUMMARY:
www /var/www/ lev 0 FAILED [disk /var/www/ offline on www?]
www /var/www/main/htdocs lev 0 FAILED [disk /var/www/main/htdocs
offline on www?]
real sda4 lev 0 FAILED [out of tape]
real sda4 lev 0 FAILED ["data write: Connection reset by peer"]
real sda4 lev 0 FAILED [dump to tape failed]
Here are lines from /var/amanda/Outside/amdump.1 which refer to /var/www:
amanda@www:/var/amanda/Outside$ grep -B 1 -A 2 "/var/www" amdump.1
setting up estimates for www:/var/www/main/htdocs
www:/var/www/main/htdocs overdue 12157 days for level 0
setup_estimate: www:/var/www/main/htdocs: command 0, options:
last_level -1 next_level0 -12157 level_days 0
getting estimates 0 (0) -1 (-1) -1 (-1)
setting up estimates for www:/var/www/
www:/var/www/ overdue 12157 days for level 0
setup_estimate: www:/var/www/: command 0, options:
last_level -1 next_level0 -12157 level_days 0
getting estimates 0 (0) -1 (-1) -1 (-1)
--
got result for host www disk /var/www/: 0 -> -1K, -1 -> -1K, -1 -> -1K
got result for host www disk /var/www/main/htdocs: 0 -> -1K, -1 -> -1K, -1 ->
-1K
--
FAILED QUEUE:
0: www /var/www/
1: www /var/www/main/htdocs
--
planner: FAILED www /var/www/ 0 [disk /var/www/ offline on www?]
planner: FAILED www /var/www/main/htdocs 0 [disk /var/www/main/htdocs offline
on www?]
Here are lines from log.20030415.0:
amanda@www:/var/amanda/Outside$ grep "/var/www" log.20030415.0
INFO planner Adding new disk www:/var/www/main/htdocs.
INFO planner Adding new disk www:/var/www/.
FAIL planner www /var/www/ 0 [disk /var/www/ offline on www?]
FAIL planner www /var/www/main/htdocs 0 [disk /var/www/main/htdocs offline on
www?]
Obviously, /var/www/ isn't offline, it's part of the tapeserver itself.
Any suggestions on how to troubleshoot this problem? I'm hoping after I've
taken up this much time writing all this down, it's not just some boneheaded
typing error, but, if so, I can't spot it.
Thanks so much for your suggestions.
-Kevin Zembower
-----
E. Kevin Zembower
Unix Administrator
Johns Hopkins University/Center for Communications Programs
111 Market Place, Suite 310
Baltimore, MD 21202
410-659-6139
|