Amanda-Users

Re: Troubleshooting partition offline error

2003-04-15 16:23:03
Subject: Re: Troubleshooting partition offline error
From: KEVIN ZEMBOWER <KZEMBOWE AT jhuccp DOT org>
To: amanda-users AT amanda DOT org
Date: Tue, 15 Apr 2003 14:36:28 -0400
Jon, thank you so much for your questions and suggestions.

"Don't think this causes any problems, but why the trailing / in /var/www/? It 
doesn't appear in the other entry."

No other reason than some notes that I had made on another host had the 
trailing / in them. I see now that I wasn't consistent. I'll remove them for 
tonight's backup and see if there's a difference.

"You omitted the section where the "taper" report is located (is that notes?). 
Did it actually fill the tape, possibly after writing lots of other DLE's and 
not having sufficient space left for either of these two entries? At least, not 
the one it tried."

It looks like the first errors appeared in amdump.1 in the section on GETTING 
ESTIMATES, FAILED QUEUE. There's no other results in amdump.1 referring to 
taper and these partitions. It did report that it filled the tape:
FAIL taper real sda4 0 [out of tape]
ERROR taper no-tape [[writing file: No space left on device]]

I didn't think of it but yes, it could have filled the tape and not had room 
for the www partition. However, I still suspect another cause, because of the 
estimate failing earlier.

"This concerns me, a new disk /var/www/ ? Weren't you already backing up that 
partition? Perhaps before it did not have the trailing / ?"

I was backing up that section as a whole (commented out in disklist):
#www sda8 nocomp-highpri -1 local               #www:/var/www/

This did work occasionally (had to uncomment in disklist to make amadmin info 
work):
amanda@www:/etc/amanda/Outside$ amadmin Outside info www sda8
Current info for www sda8:
  (Forcing to level 0 dump at next run)
  Stats: dump rates (kps), Full:  1255.0, 1188.0, 1238.0
                    Incremental:  1040.0, 1006.0, 974.0
          compressed size, Full:  60.7%, 60.2%, 61.0%
                    Incremental:  48.1%, 49.1%, 46.1%
  Dumps: lev datestmp  tape             file   origK   compK secs
          0  20030327  Outside-12          9 12428500 7543936 6010
          1  20030328  Outside-13          7  383180  177792  190
          2  20030329  Outside-14          6  174290   61024   98
          3  20030402  Outside-16          9  610500  315264  376
          4  20030411  Outside-13          8  755470  363040  349
amanda@www:/etc/amanda/Outside$ 

I just noticed that this listing shows I must have been using compression, too. 
I'll try that tonight, also.

"When it was not getting backed up (prior to last evening) was it giving 
"offline" errors too.  You may have multiple problems, the need to split and 
network connectivity.  Did you make a teeny tiny configuration change about 10 
days to 2 weeks ago that couldn't have possibly affected amanda but did?"

No, it was giving 'disk too large' errors. I had forgotten about the 
requirement of the media being larger than the largest partition, if no 
compression was used.

I'll make the changes to the filename and compression and see what happens 
tonight. Thanks, again, Jon, for your thoughtful questions.

-Kevin

>>> Jon LaBadie <jon AT jgcomp DOT com> 04/15/03 11:56AM >>>
On Tue, Apr 15, 2003 at 10:11:03AM -0400, KEVIN ZEMBOWER wrote:
> I'm so happy with Amanda that when an error occurs and a partition fails to 
> dump, I just wait a day or two and it's usually corrected. Maybe this is more 
> correctly labeled "laziness." However, I just noticed that the main web 
> directory on my main web server hasn't been backed up in a week, and I've 
> lost the level 0 backup. Now, I'm worried.
> 

...
> Lat night, I tried to split this partition up, with these entries in disklist 
> and amanda.conf:
> amanda@www:/etc/amanda/Outside$ grep "/var/www" disklist         
> www /var/www/main/htdocs nocomp-highpri-tar -1 local    
> #www:/var/www/main/htdocs
> www /var/www/ www-sda8-exclude-htdocs-main -1 local     #www:/var/www/ 
> excluding


Don't think this causes any problems, but why the trailing / in /var/www/?
It doesn't appear in the other entry.

> 
> I got these errors this morning:
> These dumps were to tape Outside-15.
> *** A TAPE ERROR OCCURRED: [[writing file: No space left on device]].
> Some dumps may have been left in the holding disk.
> Run amflush to flush them to tape.
> The next tape Amanda expects to use is: Outside-16.
> 
> FAILURE AND STRANGE DUMP SUMMARY:
>   www        /var/www/ lev 0 FAILED [disk /var/www/ offline on www?]
>   www        /var/www/main/htdocs lev 0 FAILED [disk /var/www/main/htdocs 
> offline on www?]
>   real       sda4 lev 0 FAILED [out of tape]
>   real       sda4 lev 0 FAILED ["data write: Connection reset by peer"]
>   real       sda4 lev 0 FAILED [dump to tape failed]
> 
> Here are lines from /var/amanda/Outside/amdump.1 which refer to /var/www:
> amanda@www:/var/amanda/Outside$ grep -B 1 -A 2 "/var/www" amdump.1
> setting up estimates for www:/var/www/main/htdocs
> www:/var/www/main/htdocs overdue 12157 days for level 0
> setup_estimate: www:/var/www/main/htdocs: command 0, options:
>     last_level -1 next_level0 -12157 level_days 0
>     getting estimates 0 (0) -1 (-1) -1 (-1)
> setting up estimates for www:/var/www/
> www:/var/www/ overdue 12157 days for level 0
> setup_estimate: www:/var/www/: command 0, options:
>     last_level -1 next_level0 -12157 level_days 0
>     getting estimates 0 (0) -1 (-1) -1 (-1)
> --
> got result for host www disk /var/www/: 0 -> -1K, -1 -> -1K, -1 -> -1K
> got result for host www disk /var/www/main/htdocs: 0 -> -1K, -1 -> -1K, -1 -> 
> -1K
> --
> FAILED QUEUE:
>   0: www        /var/www/
>   1: www        /var/www/main/htdocs
> --
> planner: FAILED www /var/www/ 0 [disk /var/www/ offline on www?]
> planner: FAILED www /var/www/main/htdocs 0 [disk /var/www/main/htdocs offline 
> on www?]


You omitted the section where the "taper" report is located (is that notes?).
Did it actually fill the tape, possibly after writing lots of other DLE's
and not having sufficient space left for either of these two entries?
At least, not the one it tried.

> Here are lines from log.20030415.0:
> amanda@www:/var/amanda/Outside$ grep "/var/www" log.20030415.0           
> INFO planner Adding new disk www:/var/www/main/htdocs.
> INFO planner Adding new disk www:/var/www/.

This concerns me, a new disk /var/www/ ?
Weren't you already backing up that partition?
Perhaps before it did not have the traiing / ?

When it was not getting backed up (prior to last
evening) was it giving "offline" errors too.  You
may have multiple problems, the need to split and
network connectivity.  Did you make a teeny tiny
configuration change about 10 days to 2 weeks ago
that couldn't have possibly affected amanda but did?

-- 
Jon H. LaBadie                  jon AT jgcomp DOT com 
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)