Amanda-Users

Re: zfs issue ?

2009-07-01 10:32:30
Subject: Re: zfs issue ?
From: Brian Cuttler <brian AT wadsworth DOT org>
To: amanda-users AT amanda DOT org, Chris Knight <knight AT wadsworth DOT org>
Date: Wed, 1 Jul 2009 10:16:44 -0400
I'm still seeing intermittent problems backing up some of
the DLEs on the client Lyra.

googled "amanda dup2 bad file number" which led me to this discussion.

http://www.google.com/search?hl=en&q=amanda+dup2+bad+file+number&aq=f&oq=&aqi=

between Darin Perusich and Jean-Louis Martineau.

Checking for zombie processes on the client I found "many" dozens
stemming from a Jun 19th job, amandad, which I hup'd and get rid of.
The zombies cleared and I'm keeping my fingers crossed.

         This takes me back though, I was running amanda fine until
         I attempted to increase max_dumps (per client), at which
         time I think I inadvertantly allowed more clients than I have
         available UDP ports.

I compiled amanda with --with-port restrictions, because I don't
usually recompile for FW and NON-FW installations, one kit (for
each architecture) usually does the job for me.

I'm wondering if amanda allowed me to shoot myself in the foot.

Does amanda check port restrictions against max_client ? should it ?

                                                thank you,

                                                Brian

On Tue, Jun 23, 2009 at 01:38:42PM -0400, Brian Cuttler wrote:
> 
> I am running amanda server 2.6.1p1 on a Solaris x86 x4500 to an
> SL24/LTO4 jukebox.
> 
> I'm seeing a couple of odd (and intermittent) problems with one
> of the clients. Another Solaris box, with a ZFS file system.
> 
> Most partitions backup ok, but here are some extracts from last
> nights amdump run.
> 
> The system I'm most concerned about is "LYRA"
> 
> Dispite the failures on mailserv, it backed up ok.
> The errors on the squid* machines are likely simply files
> that are being actively written to, and I will ignore those
> issues for now.
> 
> I would not expect the types of errors I'm seeing on Lyra since
> I'm using ZFS snapshot - I see from the client logs that the
> snapshots are being created/destroyed, I can also see (from this
> poorly chosen example) that we are in fact tar'ing up the snapshot
> and not the active file system.
> 
> Error will occur on different ZFS mount points on different days,
> error occurs seeingly randomly. 
> 
> 1245694962.255172: amgtar: GNUTAR-PATH /usr/sfw/bin/gtar
> 1245694962.255197: amgtar: GNUTAR-LISTDIR /usr/local/var/amanda/gnutar-lists
> 1245694962.255219: amgtar: DIRECTORY /db1/.zfs/snapshot/amanda-_db1-check
> 1245694962.255239: amgtar: ONE-FILE-SYSTEM yes
> 1245694962.255258: amgtar: SPARSE yes
> 
>                                               thanks in advance,
> 
>                                               Brian
> 
> 
> FAILURE DUMP SUMMARY:
>    lyra     /db4  lev 0  FAILED [missing size line from sendbackup]
>    lyra     /db4  lev 0  FAILED [too many dumper retry: "[request failed: 
> timeout waiting for REP]"]
>    mailserv /usr1 lev 1  FAILED [missing size line from sendbackup]
>    mailserv /usr1 lev 1  was successfully retried
> 
> STRANGE DUMP SUMMARY:
>    squidzone2 /sqcache2/var lev 0  STRANGE (see below)
>    squidzone1 /sqcache1/var lev 0  STRANGE (see below)
>    loki       /             lev 1  STRANGE (see below)
> 
> 
> FAILED DUMP DETAILS:
> 
> /--  lyra /db4 lev 0 FAILED [missing size line from sendbackup]
> ? dumper: strange [missing size line from sendbackup]
> \--------
> 
> /--  mailserv /usr1 lev 1 FAILED [missing size line from sendbackup]
> sendbackup: start [mailserv:/usr1 level 1]
> sendbackup: info BACKUP=/usr/sbin/ufsdump
> sendbackup: info RECOVER_CMD=/bin/gzip -dc |/usr/sbin/ufsrestore -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> |   DUMP: Date of this level 1 dump: Mon Jun 22 20:08:44 2009
> |   DUMP: Date of last level 0 dump: Tue Apr 28 18:48:36 2009
> |   DUMP: Dumping /dev/rdsk/c3t0d0s6 (mailserv:/usr1) to standard output.
> |   DUMP: Mapping (Pass I) [regular files]
> |   DUMP: Mapping (Pass II) [directories]
> |   DUMP: Mapping (Pass II) [directories]
> |   DUMP: Mapping (Pass II) [directories]
> |   DUMP: Writing 32 Kilobyte records
> |   DUMP: Estimated 46966288 blocks (22932.76MB) on 0.34 tapes.
> |   DUMP: Dumping (Pass III) [directories]
> |   DUMP: Dumping (Pass IV) [regular files]
> |   DUMP: 12.94% done, finished in 1:07
> |   DUMP: 25.56% done, finished in 0:58
> |   DUMP: 38.54% done, finished in 0:47
> |   DUMP: 50.83% done, finished in 0:38
> |   DUMP: 64.93% done, finished in 0:27
> |   DUMP: 79.73% done, finished in 0:15
> |   DUMP: Warning - block 3502400118 is beyond the end of `/dev/rdsk/c3t0d0s6'
> |   DUMP: Warning - block 3402129124 is beyond the end of `/dev/rdsk/c3t0d0s6'
> |   DUMP: Warning - block 3403203312 is beyond the end of `/dev/rdsk/c3t0d0s6'
> |   DUMP: Warning - block 1086749412 is beyond the end of `/dev/rdsk/c3t0d0s6'
> 
> Several pages of similar warnings removed.
> 
> |   DUMP: Warning - block 3268987620 is beyond the end of `/dev/rdsk/c3t0d0s6'
> |   DUMP: Warning - block 1086880458 is beyond the end of `/dev/rdsk/c3t0d0s6'
> |   DUMP: Warning - block 1981420608 is beyond the end of `/dev/rdsk/c3t0d0s6'
> ?   DUMP: More than 32 block read errors from dump device `/dev/rdsk/c3t0d0s6'
> |   DUMP: NEEDS ATTENTION: Do you want to attempt to continue? ("yes" or 
> "no")   DUMP: The ENTIRE dump is aborted.
> ??error [/usr/sbin/ufsdump returned 3]
> ? dumper: strange [missing size line from sendbackup]
> \--------
> 
> 
> DUMP SUMMARY:
>                                        DUMPER STATS               TAPER STATS 
> HOSTNAME     DISK        L ORIG-MB  OUT-MB  COMP%  MMM:SS   KB/s MMM:SS   KB/s
> -------------------------- ------------------------------------- -------------
> c110         /           0    1165     619   53.2   10:32 1003.5   1:02 
> 10216.1
> c110         /opt        1       0       0    1.6    0:40    0.1   0:00  826.3
> curie        /           0    7752    3255   42.0    9:02 6153.2   4:28 
> 12423.7
> curie        /export     1   91553   70776   77.3  115:39 10444.4  98:31 
> 12261.1
> curie        /thump/flar 1       0       0    --     0:02    0.5   0:00  102.3
> curie        -ump/source 0   36459   29865   81.9   50:00 10194.4  41:21 
> 12324.8
> curie        -p/vmfs-bak 1       0       0    --     0:01    0.9   0:00   72.0
> dnix         /dev/sda1   1    1684    1684    --     4:32 6350.3   2:57 9760.1
> everest      /images3    1       0       0    --     0:01    1.1   0:00  250.0
> finsen       /           1     193      31   15.9    2:20  224.5   0:02 
> 12809.2
> finsen       /export     0   33076   18247   55.2   75:01 4151.2  24:50 
> 12543.7
> gatem        /           1     726     726    --     8:27 1466.3   1:11 
> 10499.8
> gatem        /usr1       0   63830   63830    --   291:37 3735.5  86:08 
> 12648.1
> h220         /           1     103       6    6.1    1:56   55.5   0:01 
> 12744.1
> h220         /opt        0    2683     991   36.9   20:33  823.1   1:20 
> 12643.7
> huginn       /           1   21710   21710    --    45:33 8133.6  49:15 7523.0
> ldap1        /           0    7787    2525   32.4   18:09 2374.9   3:29 
> 12387.6
> ldap1        -xport/home 1       0       0    1.0    0:05    1.1   0:00 1514.2
> ldap1        /usr1       1     656      57    8.6    1:07  863.9   0:05 
> 12767.4
> loki         /           1     509     101   19.8    0:51 2023.0   0:14 7192.7
> lyra         /           1    1521     101    6.6    4:48  359.2   0:08 
> 12707.7
> lyra         /3rdparty   1      15       1    4.9    2:55    4.2   0:00 
> 12254.4
> lyra         /db1        1   13318    1954   14.7   30:25 1096.1   2:38 
> 12660.0
> lyra         /db2        1   19806    2225   11.2   39:38  958.1   3:09 
> 12047.3
> lyra         /db3        1   21213    3406   16.1   47:32 1222.9   4:53 
> 11883.9
> lyra         /db4        0 FAILED --------------------------------------------
> lyra         -port/home0 1       2       0    8.3    0:29    4.8   0:00 5825.7
> lyra         /ndevelop   1      48       5   11.3    3:56   23.6   0:00 
> 11855.1
> lyra         /space      0   38440   15668   40.8  155:36 1718.6  21:12 
> 12609.1
> mailserv     /           1    2806     560   20.0    7:11 1330.2   0:59 9729.5
> mailserv     /usr1       1   22918   12059   52.6   68:54 2987.0  16:27 
> 12505.8
> muninn       /           0   26917   12660   47.0   57:22 3766.2  17:34 
> 12303.5
> muninn       /var        1     895      75    8.4    1:34  817.4   0:09 8459.9
> ngato        /           0    6132    2431   39.6   12:10 3412.1   3:17 
> 12647.8
> nlascar      /           1       0       0   31.5    0:19    6.2   0:00 
> 11515.1
> nlascar      /boot       0       9      15  160.7    0:04 4079.6   0:01 
> 12799.7
> nlascar      /data       0     833    1658  199.0    4:16 6627.0   2:16 
> 12508.1
> nlascar      /var        1     132      40   30.1    0:30 1375.2   0:03 
> 12681.9
> panther      /           0    2585    1051   40.7   54:29  329.3   1:26 
> 12520.8
> panther      /data       1       0       0    --     0:02    0.6   0:00  195.5
> pavlov       /           1       2       0    4.9    2:01    0.9   0:00 6766.2
> squidone     /           1     138      17   12.1    5:20   53.6   0:01 
> 12804.4
> squidtwo     /           1      54       4    7.7    5:07   13.7   0:00 
> 12087.1
> squidzone1   -squidguard 1     815     815    --     1:34 8890.4   1:21 
> 10345.2
> squidzone1   -cache1/var 0    7905    7905    --    13:36 9922.6  13:18 
> 10150.1
> squidzone2   -squidguard 1     958     958    --     1:55 8534.3   1:33 
> 10546.2
> squidzone2   -cache2/var 0    9006    9006    --    15:07 10168.6  23:06 
> 6653.8
> trel         /Users      1       3       0    6.3    0:07   22.4   0:00 
> 10902.6
> trel         /trel       1      60      60    --     4:42  218.9   0:05 
> 12810.4
> 
> (brought to you by Amanda version 2.6.1p1)
> 
> ----- End forwarded message -----
> ---
>    Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
>    Computer Systems Support        (v) 518 486-1697
>    Wadsworth Center                (f) 518 473-6384
>    NYS Department of Health        Help Desk 518 473-0773
> 
---
   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.



<Prev in Thread] Current Thread [Next in Thread>