Amanda-Users

RE: amdump waits forever for estimates from one host

2005-06-12 01:54:22
Subject: RE: amdump waits forever for estimates from one host
From: Frank Smith <fsmith AT hoovers DOT com>
To: "Lengyel, Florian" <FLengyel AT gc.cuny DOT edu>, "'amanda-users AT amanda DOT org'" <amanda-users AT amanda DOT org>
Date: Sun, 12 Jun 2005 00:32:54 -0500
--On Saturday, June 11, 2005 23:50:49 -0400 "Lengyel, Florian" <FLengyel AT 
gc.cuny DOT edu> wrote:

> It was a firewall problem, though it's a little odd that one client had no
> problem and another did, since the firewall was on the backup server. With
> the firewall
> disabled (I'll attend to that later--the server and all the hosts are on a
> non-routable
> "inside" network) there's a new problem:  a tape error. So I guess I have to
> do
> 
> amadmin Daily tape
> amflush -f Daily
> 
> These are the errors (slightly edited 
> Subject: CUNY Graduate Center AMANDA MAIL REPORT FOR June 11, 2005
> 
> *** A TAPE ERROR OCCURRED: [new tape not found in rack].

Amanda is looking for a new tape, so evidently you have fewer tapes
labeled than what is specified in tapecycle.  You can have more than
that labeled, but Amanda won't reuse an existing tape until at least
that many have been written.

> Some dumps may have been left in the holding disk.
> Run amflush to flush them to tape.

Maybe, maybe not.  Just because you see this message doesn't actually
mean there are any.  You can check for files in your holding disk or
run amflush and see (although in the past, Amanda used to mark a tape
as used when running an amflush even if nothing was there to flush.
Perhaps that was fixed in newer versions).

> The next 2 tapes Amanda expects to used are: a new tape, a new tape.
> The next 2 new tapes already labelled are: Daily029, Daily028.
> 
> FAILURE AND STRANGE DUMP SUMMARY:
>   neptune-gw hda1 lev 0 FAILED [can't switch to incremental dump]
>   m254.gc.cu sda1 lev 0 FAILED [can't switch to incremental dump]
>   neptune-gw hda7 lev 0 FAILED [can't switch to incremental dump]
>   neptune-gw hda6 lev 0 FAILED [can't switch to incremental dump]
>   m254.gc.cu sda5 lev 0 FAILED [can't switch to incremental dump]
>   neptune-gw hda5 lev 0 FAILED [can't switch to incremental dump]

Your 'reserve' parameter is set too high for a level 0 dump to go
to holdingdisk when you have no tape.  Possibly it is at its default
value of 100 which means no fulls will occur without a tape.  Evidently
there was no previous level 0, so Amanda can't do an incremental to
disk since there is no previous level 0 to increment against.
  If you have adequate holding disk space, you night want to set this
lower. The idea of the parameter is that if your tape drive fails you
want to maximize the number of days you can still run backups by not
filling your holding disk with level 0s.  The real problem of using
'100' asa value is that new DLE's won't get backed up at ll.

Frank
 
> ...
> 
> STATISTICS:
>                           Total       Full      Daily
>                         --------   --------   --------
> Estimate Time (hrs:min)    0:03
> Run Time (hrs:min)         0:07
> Dump Time (hrs:min)        0:00       0:00       0:00
> Output Size (meg)           0.0        0.0        0.0
> Original Size (meg)         0.0        0.0        0.0
> Avg Compressed Size (%)     --         --         --
> Filesystems Dumped            0          0          0
> Avg Dump Rate (k/s)         --         --         --
> 
> Tape Time (hrs:min)        0:00       0:00       0:00
> Tape Size (meg)             0.0        0.0        0.0
> Tape Used (%)               0.0        0.0        0.0
> Filesystems Taped             0          0          0
> Avg Tp Write Rate (k/s)     --         --         --
> 
> ^L
> DUMP SUMMARY:
>                                      DUMPER STATS            TAPER STATS
> HOSTNAME     DISK        L ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS  KB/s
> -------------------------- --------------------------------- ------------
> m254.gc.cuny sda2        0 FAILED ---------------------------------------
> m254.gc.cuny sda3        0 FAILED ---------------------------------------
> ....
> neptune-gw.g hda6        0 FAILED ---------------------------------------
> everything in disklist   0 FAILED ---------------------------------------
> 
> (brought to you by Amanda version 2.4.4p3)
>  
> 
> -----Original Message-----
> From: Frank Smith
> To: Lengyel, Florian; ''amanda-users AT amanda DOT org' '
> Sent: 6/11/2005 11:21 PM
> Subject: RE: amdump waits forever for estimates from one host
> 
> --On Saturday, June 11, 2005 23:03:22 -0400 "Lengyel, Florian"
> <FLengyel AT gc.cuny DOT edu> wrote:
> 
>> This is what I have in
> m254:/tmp/amanda/amandad.20050611172037000.debug
>> 
>>  ...
>> /home/m254/yfsong/ 0 SIZE 91630
>> /home/m254/yzhu/ 0 SIZE 10
>> ----
>> 
>> amandad: time 260.577: dgram_recv: timeout after 10 seconds
>> amandad: time 260.577: waiting for ack: timeout, retrying
>> amandad: time 270.577: dgram_recv: timeout after 10 seconds
>> amandad: time 270.577: waiting for ack: timeout, retrying
>> amandad: time 280.577: dgram_recv: timeout after 10 seconds
>> amandad: time 280.577: waiting for ack: timeout, retrying
>> amandad: time 290.577: dgram_recv: timeout after 10 seconds
>> amandad: time 290.577: waiting for ack: timeout, retrying
>> amandad: time 300.577: dgram_recv: timeout after 10 seconds
>> amandad: time 300.577: waiting for ack: timeout, giving up!
>> amandad: time 300.577: pid 15364 finish time Sat Jun 11 17:25:37 2005
>> [root@m254 amanda]#
> 
>> 
>> I previously set
>> 
>> etimeout 400
>> 
>> up slightly from the original 300 seconds.
>> 
>> So it looks like a UDP packet never made it...Oh woe.
> 
> Looks like a firewall problem.  Do you have one on either machine
> and/or one in between them?
> 
> Frank
>> 
>> -----Original Message-----
>> From: Frank Smith
>> To: Lengyel, Florian; 'amanda-users AT amanda DOT org'
>> Sent: 6/11/2005 10:35 PM
>> Subject: Re: amdump waits forever for estimates from one host
>> 
>> --On Saturday, June 11, 2005 17:45:17 -0400 "Lengyel, Florian"
>> <FLengyel AT gc.cuny DOT edu> wrote:
>> 
>>> Amanda version: amanda-2.4.4p3-1
>>> OS: CentOS
>>> Kernel: uname -a
>>> Linux amanda.grid.cuny.edu 2.6.9-5.0.3.ELsmp #1 SMP Sat Feb 19
>> 19:38:02 CST
>>> 2005 i686 i686 i386 GNU/Linux
>>> 
>>> Trouble: amanda is set up on a tape server; there are two clients so
>> far.
>>> One is running RH linux 7.3 but has the latest (2.4.4) amanda code
>> built
>>> from 
>>> source...the other is using an older rpm under RH linux 9. The source
>> build
>>> machine
>>> gives estimates for its DLEs, and the other seems to want to wait
>> until
>>> grass to grows 
>>> under its mounting bracket, according to amstatus Daily, part of
> which
>>> reads:
>>> 
>>> m254.gc.cuny.edu:/home/m254/yzhu/                        getting
>> estimate
>>> m254.gc.cuny.edu:/home/www                               getting
>> estimate
>>> m254.gc.cuny.edu:sda1                                    getting
>> estimate
>>> m254.gc.cuny.edu:sda2                                    getting
>> estimate
>>> m254.gc.cuny.edu:sda3                                    getting
>> estimate
>>> m254.gc.cuny.edu:sda5                                    getting
>> estimate
>>> m254.gc.cuny.edu:sda7                                    getting
>> estimate
>>> neptune-gw.gc.cuny.edu:hda1                  0     8390k estimate
> done
>>> neptune-gw.gc.cuny.edu:hda5                  0  6361840k estimate
> done
>>> neptune-gw.gc.cuny.edu:hda6                  0  1361030k estimate
> done
>>> neptune-gw.gc.cuny.edu:hda7                  0   163620k estimate
> done
>>> 
>>> I'm checking through the documentation...amcheck succeeds. Have I
> made
>> one
>>> of the usual configuration oversights?
>> 
>> Try checking the debug files on m254.gc.cuny.edu (default is in
>> /tmp/amanda)
>> and see if there is more information there.
>> 
>> One possibility is a firewall blocking the estimate response, since
> the
>> response usually occurs long after most firewall connection timeouts.
>> Look for 'no response' errors in the client debg files.
>> 
>> Just so your backups don't hang forever you might want to make sure
> your
>> etimout isn't set larger than necessary. Don't forget it is multiplied
>> by
>> the number of DLEs on the host, so a setting of 1 hour on your m254
> host
>> could result in a wait of up to 7 hours. You can use a negative number
>> it will be the per host timeout instead of per DLE.
>> 
>> Frank
>> 
>> 
>> --
>> Frank Smith
>> fsmith AT hoovers DOT com
>> Sr. Systems Administrator                                 Voice:
>> 512-374-4673
>> Hoover's Online                                             Fax:
>> 512-374-4501
> 
> 
> 
> --
> Frank Smith
> fsmith AT hoovers DOT com
> Sr. Systems Administrator                                 Voice:
> 512-374-4673
> Hoover's Online                                             Fax:
> 512-374-4501



--
Frank Smith                                                fsmith AT hoovers 
DOT com
Sr. Systems Administrator                                 Voice: 512-374-4673
Hoover's Online                                             Fax: 512-374-4501