Amanda-Users

failure of backups after reboot

2009-08-24 14:56:35
Subject: failure of backups after reboot
From: Chris Hoogendyk <hoogendyk AT bio.umass DOT edu>
To: AMANDA users <amanda-users AT amanda DOT org>
Date: Mon, 24 Aug 2009 12:32:28 -0400
This is a recently configured Amanda 2.5.1p3 setup on a group of E250's running Solaris 9. It has been running smoothly for a few weeks now. Just before building Amanda, I had done a reconfigure reboot to get drivers running for a new tape library. This weekend, after the weekend backups had completed successfully and everything had flushed out to tape, I did a `touch /reconfigure; reboot` to get a new drive recognized (should have been a hot swappable drive, but seagate has changed their firmware so it doesn't work the way it once did). Anyway, all the services came up alright as far as I could tell. This is actually a departmental server with sendmail, apache, samba, etc. all running on it (they couldn't afford the parts to set up a dedicated backup server).

When I looked at the backup reports this morning, this server had:

*** THE DUMPS DID NOT FINISH PROPERLY!

The next tape Amanda expects to use is: geo-daily-05.
The next new tape already labelled is: geo-daily-03.

FAILURE AND STRANGE DUMP SUMMARY:
 eclogite.geo.umass.edu  /                       RESULTS MISSING
    ...<snip>...
 eclogite.geo.umass.edu  /var/mail               RESULTS MISSING
 gis.geo.umass.edu       /                       RESULTS MISSING
    ...<snip>...
 gis.geo.umass.edu       /usr/local              RESULTS MISSING
 mogollon.geo.umass.edu  /                       RESULTS MISSING
    ...<snip>...
 mogollon.geo.umass.edu  /usr/local              RESULTS MISSING
 driver: FATAL Did not get DATE line from planner


The debug files were sparce:

eclogite:/:root# cd /tmp/amanda
eclogite:/tmp/amanda:root# ls -l
total 16
drwx------   3 amandabackup amandabackup     179 Aug 24 00:45 server
eclogite:/tmp/amanda:root# cd server
eclogite:/tmp/amanda/server:root# ls -l
total 16
drwx------   2 amandabackup amandabackup     546 Aug 24 00:45 daily
eclogite:/tmp/amanda/server:root# cd daily
eclogite:/tmp/amanda/server/daily:root# ls -l
total 80
-rw-r-----   1 amandabackup amandabackup     223 Aug 24 00:45 
amlogroll.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup     220 Aug 24 00:45 
amreport.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup    3875 Aug 24 00:45 
amtrmidx.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup     286 Aug 24 00:45 
amtrmlog.20090824004501.debug
-rw-r-----   1 amandabackup amandabackup     281 Aug 24 00:45 
driver.20090824004500.debug
eclogite:/tmp/amanda/server/daily:root# more driver.20090824004500.debug driver: debug 1 pid 12625 ruid 555 euid 555: start at Mon Aug 24 00:45:00 2009
driver: debug 1 pid 12625 ruid 555 euid 555: rename at Mon Aug 24 00:45:01 2009
driver: time 0.017: Did not get DATE line from planner
driver: time 0.017: pid 12625 finish time Mon Aug 24 00:45:01 2009
eclogite:/tmp/amanda/server/daily:root# date


And there were no debug files on either of the clients for that date/time. The cronlog entry shows a return code of 8:

 CMD: /usr/local/sbin/amdump daily -o reserve=100 -o tapedev="" -o tpchanger=""
 amandaba 12613 c Mon Aug 24 00:45:00 2009
<  amandaba 12613 c Mon Aug 24 00:45:02 2009 rc=8


Paradoxically, when I ran an amcheck this morning to see if I could get more detail, amcheck reported no problems. I've looked over everything I can think of. Since the critical event here is a server reboot, I particularly looked for things that might have been affected by that -- supposing, for example, I had set something up by hand, and on a reboot it didn't come back properly. But, amanda isn't susceptible to too much in that respect, since it runs off cron and isn't dependent on running daemons, etc. Various mount points and permissions seemed to be alright, /tmp/amanda was getting debug logs.

I suppose I could have tweaked the amanda entries for inetd incorrectly and forgot to HUP inetd. Then the reboot would cause it to fail. But the entries present match my other amanda servers. /etc/inetd.conf has the following entry:

amanda dgram udp wait amandabackup /usr/local/libexec/amandad amandad -auth=bsd amdump amindexd amidxtaped

and /etc/services has these entries:

amanda          10080/udp
amandaidx       10082/tcp
amidxtape       10083/tcp


And, amcheck reports no problems right now (and in fact produces debug logs on each of the clients in /tmp/amanda). So, to repeat myself -- this setup has been running smoothly for a couple of weeks. The reboot borked it. I'm stuck. Might it just work tonight? I kind of doubt it. Though something may have gotten poked by the amdump that failed and/or by the amcheck that succeeded.

Here's the output from the amcheck:

eclogite:/usr/local/etc/amanda/daily$ amcheck daily
Amanda Tape Server Host Check
-----------------------------
Holding disk /var/spool/amanda/disk2: 3631598 KB disk space available, using 
2095598 KB
Holding disk /var/spool/amanda/disk1: 3631598 KB disk space available, using 
2095598 KB
Holding disk /var/spool/amanda/disk3: 234427600 KB disk space available, using 
223941840 KB
slot 5: read label `geo-daily-05', date `20090801204500'
NOTE: skipping tape-writable test
Tape geo-daily-05 label ok
Server check took 106.699 seconds

Amanda Backup Client Hosts Check
--------------------------------
Client check: 3 hosts checked in 1.197 seconds, 0 problems found

(brought to you by Amanda 2.5.1p3)


Yeah, I know, the holding disk setup is weird. That's what the new drive is for.

And yup, I've reviewed all my own notes, looked over the troubleshooting on the wiki, and perused all the list archives and forum archives that google brought up when I searched on the error message "Did not get DATE line from planner". I'll sleep on it too, but that'll take another day. ;-) Just thought one of the sharper people on the list might see what I'm missing.


--
---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk AT bio.umass DOT edu>

---------------
Erdös 4



<Prev in Thread] Current Thread [Next in Thread>