Amanda-Users

Re: Troubleshooting partition offline error

2003-04-15 13:30:46
Subject: Re: Troubleshooting partition offline error
From: Jon LaBadie <jon AT jgcomp DOT com>
To: amanda-users AT amanda DOT org
Date: Tue, 15 Apr 2003 11:56:41 -0400
On Tue, Apr 15, 2003 at 10:11:03AM -0400, KEVIN ZEMBOWER wrote:
> I'm so happy with Amanda that when an error occurs and a partition fails to 
> dump, I just wait a day or two and it's usually corrected. Maybe this is more 
> correctly labeled "laziness." However, I just noticed that the main web 
> directory on my main web server hasn't been backed up in a week, and I've 
> lost the level 0 backup. Now, I'm worried.
> 

...
> Lat night, I tried to split this partition up, with these entries in disklist 
> and amanda.conf:
> amanda@www:/etc/amanda/Outside$ grep "/var/www" disklist         
> www /var/www/main/htdocs nocomp-highpri-tar -1 local    
> #www:/var/www/main/htdocs
> www /var/www/ www-sda8-exclude-htdocs-main -1 local     #www:/var/www/ 
> excluding


Don't think this causes any problems, but why the trailing / in /var/www/?
It doesn't appear in the other entry.

> 
> I got these errors this morning:
> These dumps were to tape Outside-15.
> *** A TAPE ERROR OCCURRED: [[writing file: No space left on device]].
> Some dumps may have been left in the holding disk.
> Run amflush to flush them to tape.
> The next tape Amanda expects to use is: Outside-16.
> 
> FAILURE AND STRANGE DUMP SUMMARY:
>   www        /var/www/ lev 0 FAILED [disk /var/www/ offline on www?]
>   www        /var/www/main/htdocs lev 0 FAILED [disk /var/www/main/htdocs 
> offline on www?]
>   real       sda4 lev 0 FAILED [out of tape]
>   real       sda4 lev 0 FAILED ["data write: Connection reset by peer"]
>   real       sda4 lev 0 FAILED [dump to tape failed]
> 
> Here are lines from /var/amanda/Outside/amdump.1 which refer to /var/www:
> amanda@www:/var/amanda/Outside$ grep -B 1 -A 2 "/var/www" amdump.1
> setting up estimates for www:/var/www/main/htdocs
> www:/var/www/main/htdocs overdue 12157 days for level 0
> setup_estimate: www:/var/www/main/htdocs: command 0, options:
>     last_level -1 next_level0 -12157 level_days 0
>     getting estimates 0 (0) -1 (-1) -1 (-1)
> setting up estimates for www:/var/www/
> www:/var/www/ overdue 12157 days for level 0
> setup_estimate: www:/var/www/: command 0, options:
>     last_level -1 next_level0 -12157 level_days 0
>     getting estimates 0 (0) -1 (-1) -1 (-1)
> --
> got result for host www disk /var/www/: 0 -> -1K, -1 -> -1K, -1 -> -1K
> got result for host www disk /var/www/main/htdocs: 0 -> -1K, -1 -> -1K, -1 -> 
> -1K
> --
> FAILED QUEUE:
>   0: www        /var/www/
>   1: www        /var/www/main/htdocs
> --
> planner: FAILED www /var/www/ 0 [disk /var/www/ offline on www?]
> planner: FAILED www /var/www/main/htdocs 0 [disk /var/www/main/htdocs offline 
> on www?]


You omitted the section where the "taper" report is located (is that notes?).
Did it actually fill the tape, possibly after writing lots of other DLE's
and not having sufficient space left for either of these two entries?
At least, not the one it tried.

> Here are lines from log.20030415.0:
> amanda@www:/var/amanda/Outside$ grep "/var/www" log.20030415.0           
> INFO planner Adding new disk www:/var/www/main/htdocs.
> INFO planner Adding new disk www:/var/www/.

This concerns me, a new disk /var/www/ ?
Weren't you already backing up that partition?
Perhaps before it did not have the traiing / ?

When it was not getting backed up (prior to last
evening) was it giving "offline" errors too.  You
may have multiple problems, the need to split and
network connectivity.  Did you make a teeny tiny
configuration change about 10 days to 2 weeks ago
that couldn't have possibly affected amanda but did?

-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)