Amanda-Users

Troubleshooting partition offline error

2003-04-15 12:15:58
Subject: Troubleshooting partition offline error
From: KEVIN ZEMBOWER <KZEMBOWE AT jhuccp DOT org>
To: amanda-users AT amanda DOT org
Date: Tue, 15 Apr 2003 10:11:03 -0400
I'm so happy with Amanda that when an error occurs and a partition fails to 
dump, I just wait a day or two and it's usually corrected. Maybe this is more 
correctly labeled "laziness." However, I just noticed that the main web 
directory on my main web server hasn't been backed up in a week, and I've lost 
the level 0 backup. Now, I'm worried.

Both my tape backup Amanda server and the web server are 'www.' The web 
directory is pretty large:
amanda@www:/etc/amanda/Outside$ df -h /var/www/
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda8              18G   14G  4.3G  76% /var/www
amanda@www:/etc/amanda/Outside$ 

I was trying to back up this partition as a whole, using no compression and 
'dump' but it often complained that it was too large. I'm backing up to DDS-3 
tapes, without hardware compression, on a Sony SDT-10000 drive built into my 
Dell PowerEdge 2450 server. Here's the version information:
amanda@www:/var/amanda/Outside$ amadmin Outside version  
build: VERSION="Amanda-2.4.2p2"
       BUILT_DATE="Tue Apr 2 21:24:21 UTC 2002"
       BUILT_MACH="Linux cyberhq 2.4.18pre2 #1 SMP Tue Jan 8 18:13:43 PST 2002 
i686 unknown"
       CC="gcc"
paths: bindir="/usr/sbin" sbindir="/usr/sbin"
       libexecdir="/usr/lib/amanda" mandir="/usr/share/man"
       AMANDA_TMPDIR="/tmp/amanda" AMANDA_DBGDIR="/tmp/amanda"
       CONFIG_DIR="/etc/amanda" DEV_PREFIX="/dev/"
       RDEV_PREFIX="/dev/r" DUMP="/sbin/dump"
       RESTORE="/sbin/restore" SAMBA_CLIENT="/usr/bin/smbclient"
       GNUTAR="/bin/tar" COMPRESS_PATH="/bin/gzip"
       UNCOMPRESS_PATH="/bin/gzip" MAILER="/usr/bin/Mail"
       listed_incr_dir="/var/lib/amanda/gnutar-lists"
defs:  DEFAULT_SERVER="localhost" DEFAULT_CONFIG="DailySet1"
       DEFAULT_TAPE_SERVER="localhost"
       DEFAULT_TAPE_DEVICE="/dev/null" HAVE_MMAP HAVE_SYSVSHM
       LOCKING=POSIX_FCNTL SETPGRP_VOID DEBUG_CODE
       AMANDA_DEBUG_DAYS=4 BSD_SECURITY USE_AMANDAHOSTS
       CLIENT_LOGIN="backup" FORCE_USERID HAVE_GZIP
       COMPRESS_SUFFIX=".gz" COMPRESS_FAST_OPT="--fast"
       COMPRESS_BEST_OPT="--best" UNCOMPRESS_OPT="-dc"
amanda@www:/var/amanda/Outside$ 

I know that this is an old version of amanda, but it's what is distributed with 
Debian stable. When I have some free time, I'll upgrade it outside of the 
Debian system.

Lat night, I tried to split this partition up, with these entries in disklist 
and amanda.conf:
amanda@www:/etc/amanda/Outside$ grep "/var/www" disklist         
www /var/www/main/htdocs nocomp-highpri-tar -1 local    
#www:/var/www/main/htdocs
www /var/www/ www-sda8-exclude-htdocs-main -1 local     #www:/var/www/ 
excluding /var/www/main/htdocs, using nocomp-highpri-tar
amanda@www:/etc/amanda/Outside$ egrep -A 5 'nocomp|highpri|tar|www-sda8' 
amanda.conf        
define dumptype nocomp {
    global
    comment "No compression"
    compress none
}
--
define dumptype highpri {
    global
    comment "High priority"
    priority high
}
--
define dumptype tar {
    global
    comment "Using GNUTAR"
    program "GNUTAR"
}
--
define dumptype nocomp-highpri {
    nocomp
    highpri
    comment "No compression with high priority"
}
define dumptype nocomp-highpri-tar {
    nocomp
    highpri
    tar
    comment "No compression with high priority using GNUTAR"
}
define dumptype www-sda8-exclude-htdocs-main {
    nocomp-highpri-tar
    exclude "./htdocs/main"
    comment "Special dumptype for www:sda8, excluding /var/www/htdocs/main, 
using nocomp-highpri-tar"
}

I got these errors this morning:
These dumps were to tape Outside-15.
*** A TAPE ERROR OCCURRED: [[writing file: No space left on device]].
Some dumps may have been left in the holding disk.
Run amflush to flush them to tape.
The next tape Amanda expects to use is: Outside-16.

FAILURE AND STRANGE DUMP SUMMARY:
  www        /var/www/ lev 0 FAILED [disk /var/www/ offline on www?]
  www        /var/www/main/htdocs lev 0 FAILED [disk /var/www/main/htdocs 
offline on www?]
  real       sda4 lev 0 FAILED [out of tape]
  real       sda4 lev 0 FAILED ["data write: Connection reset by peer"]
  real       sda4 lev 0 FAILED [dump to tape failed]

Here are lines from /var/amanda/Outside/amdump.1 which refer to /var/www:
amanda@www:/var/amanda/Outside$ grep -B 1 -A 2 "/var/www" amdump.1
setting up estimates for www:/var/www/main/htdocs
www:/var/www/main/htdocs overdue 12157 days for level 0
setup_estimate: www:/var/www/main/htdocs: command 0, options:
    last_level -1 next_level0 -12157 level_days 0
    getting estimates 0 (0) -1 (-1) -1 (-1)
setting up estimates for www:/var/www/
www:/var/www/ overdue 12157 days for level 0
setup_estimate: www:/var/www/: command 0, options:
    last_level -1 next_level0 -12157 level_days 0
    getting estimates 0 (0) -1 (-1) -1 (-1)
--
got result for host www disk /var/www/: 0 -> -1K, -1 -> -1K, -1 -> -1K
got result for host www disk /var/www/main/htdocs: 0 -> -1K, -1 -> -1K, -1 -> 
-1K
--
FAILED QUEUE:
  0: www        /var/www/
  1: www        /var/www/main/htdocs
--
planner: FAILED www /var/www/ 0 [disk /var/www/ offline on www?]
planner: FAILED www /var/www/main/htdocs 0 [disk /var/www/main/htdocs offline 
on www?]

Here are lines from log.20030415.0:
amanda@www:/var/amanda/Outside$ grep "/var/www" log.20030415.0           
INFO planner Adding new disk www:/var/www/main/htdocs.
INFO planner Adding new disk www:/var/www/.
FAIL planner www /var/www/ 0 [disk /var/www/ offline on www?]
FAIL planner www /var/www/main/htdocs 0 [disk /var/www/main/htdocs offline on 
www?]

Obviously, /var/www/ isn't offline, it's part of the tapeserver itself.

Any suggestions on how to troubleshoot this problem? I'm hoping after I've 
taken up this much time writing all this down, it's not just some boneheaded 
typing error, but, if so, I can't spot it.

Thanks so much for your suggestions.

-Kevin Zembower

-----
E. Kevin Zembower
Unix Administrator
Johns Hopkins University/Center for Communications Programs
111 Market Place, Suite 310
Baltimore, MD  21202
410-659-6139