Amanda-Users

Re: zfs backup failure

2009-03-27 15:16:28
Subject: Re: zfs backup failure
From: Brian Cuttler <brian AT wadsworth DOT org>
To: Jean-Louis Martineau <martineau AT zmanda DOT com>
Date: Fri, 27 Mar 2009 14:57:11 -0400
Jean-Louis,

On Fri, Mar 27, 2009 at 01:44:50PM -0400, Jean-Louis Martineau wrote:
> Post the output of: amadmin <config> disklist grifserv /griffyp/hiu
> and the client amandad.*.debug, sendbackup.*.debug and zfs*.*.debug

I re-ran the amdump, the estimate phase seems to have taken 74 minutes,

/usr/local/sbin/amdump     grifserv grifserv /griffyp/hiu

Hostname: grifserv
Org     : Grifserv
Config  : grifserv
Date    : March 27, 2009


The next tape Amanda expects to use is: GRIFSERV07.
FAILURE DUMP SUMMARY:
grifserv /griffyp/hiu lev 0  FAILED [too many dumper retry: "[request failed:
+timeout waiting for REP]"]


STATISTICS:                                                                     
                          Total       Full      Incr.                           
                        --------   --------   --------                          
Estimate Time (hrs:min)    1:14                                                 
Run Time (hrs:min)         1:20                                                 
Dump Time (hrs:min)        0:00       0:00       0:00                           
Output Size (meg)           0.0        0.0        0.0                           
Original Size (meg)         0.0        0.0        0.0                           
Avg Compressed Size (%)     --         --         --                            
Filesystems Dumped            0          0          0                           
Avg Dump Rate (k/s)         --         --         --                            
                                                                                
Tape Time (hrs:min)        0:00       0:00       0:00
Tape Size (meg)             0.0        0.0        0.0
Tape Used (%)               0.0        0.0        0.0
Filesystems Taped             0          0          0
Chunks Taped                  0          0          0

DUMP SUMMARY
:                          DUMPER STATS               TAPER STATS
HOSTNAME     DISK        L ORIG-MB  OUT-MB  COMP%  MMM:SS   KB/s MMM:SS   KB/s
-------------------------- ------------------------------------- ------------
grifserv     -riffyp/hiu 0 FAILED ------------------------------------------ 

(brought to you by Amanda version 2.6.1)


I will include the debug files now.

There are several Amzfs_snapshot*.debug files, they are short but
I've only included 3.

Hum, there is no actual amandad*debug file in /tmp/amanda...

> cat sendbackup.20090326220435.debug 
1238119475.139805: sendbackup: pid 3433 ruid 110 euid 110 version 2.6.1: start 
at Thu Mar 26 22:04:35 2009
1238119475.140292: sendbackup: Version 2.6.1
1238119475.146269: sendbackup: pid 3433 ruid 110 euid 110 version 2.6.1: rename 
at Thu Mar 26 22:04:35 2009
1238119475.157173: sendbackup:   Parsed request as: program `amgtar'
1238119475.157215: sendbackup:                      disk `/griffyp/ivcp'
1238119475.157231: sendbackup:                      device `/griffyp/ivcp'
1238119475.157246: sendbackup:                      level 0
1238119475.157259: sendbackup:                      since NODATE
1238119475.157273: sendbackup:                      options `'
1238119475.157859: sendbackup: pipespawnv: stdoutfd is 0
1238119475.158484: sendbackup: Spawning 
"/usr/local/libexec/amanda/application/amzfs-snapshot amzfs-snapshot 
PRE-DLE-BACKUP --execute-where client --config grifserv --host grifserv --disk 
/griffyp/ivcp --device /griffyp/ivcp --level 0 --df-path /usr/sbin/df --pfexec 
YES --pfexec-path /usr/bin/pfexec --zfs-path /usr/sbin/zfs" in pipeline
1238119481.492440: sendbackup: script: PROPERTY directory 
/griffyp/ivcp/.zfs/snapshot/amanda-_griffyp_ivcp-current
1238119481.492921: sendbackup: pipespawnv: stdoutfd is 50
1238119481.493064: sendbackup: Spawning "/usr/bin/gzip /usr/bin/gzip --fast" in 
pipeline
1238119481.502456: sendbackup: gnutar: pid 3441: 
/usr/bin/gzip1238119481.502555: sendbackup: pid 3441: /usr/bin/gzip --fast
1238119481.502750: sendbackup: pipespawnv: stdoutfd is 5120
1238119481.503062: sendbackup: Spawning 
"/usr/local/libexec/amanda/application/amgtar amgtar support --config grifserv 
--host grifserv --disk /griffyp/ivcp --device /griffyp/ivcp" in pipeline
1238119481.539401: sendbackup: support line: CONFIG YES
1238119481.539488: sendbackup: support line: HOST YES
1238119481.539511: sendbackup: support line: DISK YES
1238119481.539531: sendbackup: support line: MAX-LEVEL 9
1238119481.539571: sendbackup: support line: INDEX-LINE YES
1238119481.539592: sendbackup: support line: INDEX-XML NO
1238119481.539614: sendbackup: support line: MESSAGE-LINE YES
1238119481.539634: sendbackup: support line: MESSAGE-XML NO
1238119481.539652: sendbackup: support line: RECORD YES
1238119481.539672: sendbackup: support line: INCLUDE-FILE YES
1238119481.539692: sendbackup: support line: INCLUDE-LIST YES
1238119481.539713: sendbackup: support line: INCLUDE-OPTIONAL YES
1238119481.539733: sendbackup: support line: EXCLUDE-FILE YES
1238119481.539753: sendbackup: support line: EXCLUDE-LIST YES
1238119481.539773: sendbackup: support line: EXCLUDE-OPTIONAL YES
1238119481.539793: sendbackup: support line: COLLECTION NO
1238119481.539813: sendbackup: support line: MULTI-ESTIMATE YES
1238119481.539832: sendbackup: support line: CALCSIZE YES
1238119481.543855: sendbackup: sendbackup: running 
"/usr/local/libexec/amanda/application/amgtar
1238119481.543920: sendbackup:  backup
1238119481.543931: sendbackup:  --message
1238119481.543941: sendbackup:  line
1238119481.543949: sendbackup:  --config
1238119481.543958: sendbackup:  grifserv
1238119481.543967: sendbackup:  --host
1238119481.543975: sendbackup:  grifserv
1238119481.543983: sendbackup:  --disk
1238119481.543991: sendbackup:  /griffyp/ivcp
1238119481.543999: sendbackup:  --device
1238119481.544007: sendbackup:  /griffyp/ivcp
1238119481.544016: sendbackup:  --level
1238119481.544024: sendbackup:  0
1238119481.544032: sendbackup:  --index
1238119481.544040: sendbackup:  line
1238119481.544048: sendbackup:  --record
1238119481.544056: sendbackup:  --atime-preserve
1238119481.544064: sendbackup:  no
1238119481.544072: sendbackup:  --directory
1238119481.544080: sendbackup:  
/griffyp/ivcp/.zfs/snapshot/amanda-_griffyp_ivcp-current
1238119481.544093: sendbackup: "
1238133199.034536: sendbackup: pipespawnv: stdoutfd is 172296
1238133199.035154: sendbackup: Spawning 
"/usr/local/libexec/amanda/application/amzfs-snapshot amzfs-snapshot 
POST-DLE-BACKUP --execute-where client --config grifserv --host grifserv --disk 
/griffyp/ivcp --device /griffyp/ivcp --level 0 --df-path /usr/sbin/df --pfexec 
YES --pfexec-path /usr/bin/pfexec --zfs-path /usr/sbin/zfs" in pipeline
1238133205.728112: sendbackup: pid 3433 finish time Fri Mar 27 01:53:25 2009


> more Amzfs_snapshot.20090327140120.debug 
1238176880.818329: Amzfs_snapshot: pid 6207 ruid 110 euid 110 version 2.6.1: sta
rt at Fri Mar 27 14:01:20 2009
1238176880.953774: Amzfs_snapshot: script: Amzfs_snapshot

1238176880.954118: Amzfs_snapshot: command: pre_dle_amcheck
1238176880.954388: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs get -H
 -o value mountpoint /griffyp/climsgl
1238176885.514959: Amzfs_snapshot: running: /usr/sbin/df /griffyp/climsgl
1238176885.533505: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs get -H
 -o value mountpoint griffyp/climsgl|
1238176887.908437: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs snapsh
ot griffyp/climsgl@amanda-_griffyp_climsgl-check

> more Amzfs_snapshot.20090327140119.debug 
1238176879.124735: Amzfs_snapshot: pid 6201 ruid 110 euid 110 version 2.6.1: sta
rt at Fri Mar 27 14:01:19 2009
1238176879.254711: Amzfs_snapshot: script: Amzfs_snapshot

1238176879.255060: Amzfs_snapshot: command: post_dle_amcheck
1238176879.255328: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs get -H
 -o value mountpoint /griffyp/csssoft
1238176879.578318: Amzfs_snapshot: running: /usr/sbin/df /griffyp/csssoft
1238176879.593073: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs get -H
 -o value mountpoint griffyp/csssoft|
1238176879.671813: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs destro
y griffyp/csssoft@amanda-_griffyp_csssoft-check|

> cat Amzfs_snapshot.20090327140117.debug 
1238176877.221257: Amzfs_snapshot: pid 6194 ruid 110 euid 110 version 2.6.1: 
start at Fri Mar 27 14:01:17 2009
1238176877.347518: Amzfs_snapshot: script: Amzfs_snapshot

1238176877.347853: Amzfs_snapshot: command: pre_dle_amcheck
1238176877.348131: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs get 
-H -o value mountpoint /griffyp/csssoft
1238176877.397391: Amzfs_snapshot: running: /usr/sbin/df /griffyp/csssoft
1238176877.496817: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs get 
-H -o value mountpoint griffyp/csssoft|
1238176877.614444: Amzfs_snapshot: running: /usr/bin/pfexec /usr/sbin/zfs 
snapshot griffyp/csssoft@amanda-_griffyp_csssoft-check


> Jean-Louis
> 
> Brian Cuttler wrote:
> >Ok, scratched my head a little and thought about was what different
> >between testing in-preproduction and now, we changed the system name
> >and we ADDED quotas to the zfs file systems!!
> >
> >Checking # df output I realized that the file system with the error
> >was the clostest to capacity. It was sitting at about 92% of its
> >quota limit, with the next nearest capacity (size issues asside) at
> >89% capacity.
> >
> >I increased the disk quota so it fell below 90% and tried amdump 
> >again. We failed, I believe, on etimeout, which was set to only
> >300 or 600 seconds. We failed again at 2600 seconds, I uppped it
> >to 4200 and left it for the nightly run to test.
> >
> >We produced the following error
> >
> >FAILURE DUMP SUMMARY:
> >grifserv /griffyp/hiu lev 0  FAILED [too many dumper retry:
> > "[request failed: timeout waiting for REP]"]
> >
> >Currently my timeouts are set
> >etimeout 4200           # number of seconds per filesystem for estimates.
> >dtimeout 1800           # number of idle seconds before a dump is aborted.
> >ctimeout 30             # maximum number of seconds that amcheck waits
> >                        # for each client host
> >
> >The particular file system (underlie a single samba share) is 61 Gig.
> >griffyp/hiu             70G    61G   8.9G    88%    /griffyp/hiu
> >
> >Don't know if I misconfigured something, had not expected issues when
> >moving from testing to production. Did not experience this issue on the
> >older/slower server, but that was a UFS file system running DUMP rather
> >than tar.
> >
> >Any suggestions much apreciated, I really want to get this file system
> >back under backups.
> >
> >If there are no ideas, I'll try to do a straight tar of the system
> >tonight and skip the snapshot for it.
> >
> >If there is a limit for quota'd zfs file systems to produce snapshots,
> >how do I both produce snapshots and keep a lid on usage ??
> >
> >                                                     thanks,
> >
> >                                                     Brian
> >
> >
> >
> >On Thu, Mar 26, 2009 at 03:09:16PM -0400, Brian Cuttler wrote:
> >  
> >>I really am confused...
> >>
> >>When we set up amanda I configured amanda to have priv to
> >>perform snapshots. That is, as root I issued this command.
> >>
> >>/usr/sbin/usermod -P "ZFS File System Management,ZFS Storage Management" 
> >>amanda
> >>
> >>We are able to perform 9 out of 10 ZFS dumps each night on this
> >>system, one fails with an apparently planner error, but I think
> >>it goes a little deeper.
> >>
> >>I had expected that as user amanda I could manually create a
> >>snapshot, but I can't, not on any zfs mount point, even though
> >>we seem to be able to do so when we perfrom nightly dumps.
> >>
> >>    
> >>>id
> >>>      
> >>uid=110(amanda) gid=3(sys)
> >>
> >>    
> >>>zfs snapshot griffyp/hiu@amanda
> >>>      
> >>cannot create snapshot 'griffyp/hiu@amanda': permission denied
> >>
> >>I'm clearly missing a step in the methodology.
> >>
> >>Only fileysstem /griffyp/hiu fails to backup, griffyp being the
> >>name of the amanda pool.
> >>
> >>Can anyone help me ?
> >>
> >>                                            thanks,
> >>
> >>                                            Brian
> >>
> >>    
> >>>df -kl
> >>>      
> >>Filesystem            kbytes    used   avail capacity  Mounted on
> >>/dev/dsk/c1t0d0s0    8262413 7067795 1111994    87%    /
> >>/devices                   0       0       0     0%    /devices
> >>ctfs                       0       0       0     0%    /system/contract
> >>proc                       0       0       0     0%    /proc
> >>mnttab                     0       0       0     0%    /etc/mnttab
> >>swap                 8660832    1632 8659200     1%    /etc/svc/volatile
> >>objfs                      0       0       0     0%    /system/object
> >>sharefs                    0       0       0     0%    /etc/dfs/sharetab
> >>/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1
> >>                     8262413 7067795 1111994    87%    
> >>                     /platform/sun4u-us3/lib/libc_psr.so.1
> >>/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
> >>                     8262413 7067795 1111994    87%    
> >>                     /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
> >>fd                         0       0       0     0%    /dev/fd
> >>/dev/dsk/c1t0d0s6    8262413 1636317 6543472    21%    /var
> >>swap                 2097152    1624 2095528     1%    /tmp
> >>swap                 8659264      64 8659200     1%    /var/run
> >>griffyp              173182464     475 44829887     1%    /griffyp
> >>griffyp/amanda       173182464      74 44829887     1%    /griffyp/amanda
> >>griffyp/climsgl      1048576  245576  802999    24%    /griffyp/climsgl
> >>griffyp/connlab      5242880 4574973  667906    88%    /griffyp/connlab
> >>griffyp/csssoft      1048576      38 1048537     1%    /griffyp/csssoft
> >>griffyp/dew          2097152 1324984  772167    64%    /griffyp/dew
> >>griffyp/encphrev     1048576    1197 1047379     1%    /griffyp/encphrev
> >>griffyp/export       5242880 4433533  809347    85%    /griffyp/export
> >>griffyp/grifadmin    2097152 1548594  548558    74%    /griffyp/grifadmin
> >>griffyp/hiu          68157440 64051634 4105805    94%    /griffyp/hiu
> >>griffyp/hiu2         16777216 14825811 1951405    89%    /griffyp/hiu2
> >>griffyp/ivcp         41943040 37323002 4620038    89%    /griffyp/ivcp
> >>griffyp/virologypt   1048576   16222 1032353     2%    /griffyp/virologypt
> >>
> >>
> >>
> >>
> >>
> >>
> >>On Tue, Mar 24, 2009 at 11:16:52AM -0400, Brian Cuttler wrote:
> >>    
> >>>I'm sure this worked before I put the system into production.
> >>>
> >>>Amanda 2.6.1, Solaris 10/x86, ZFS with snapshots.
> >>>
> >>>I don't know how I have have no DLE to backup, unless there
> >>>is some failure in producing the snapshot.
> >>>
> >>>Now that the system is in production we are running samba with
> >>>about a dozen shares, all off of the same zfs pool. They are
> >>>treated as separate file systems so that I'm able to quota each
> >>>of the shares and provide some continuity of service by preventing
> >>>one share from, say, filling a spindle, which we experienced when
> >>>we use UFS on the drives.
> >>>
> >>>The HIU share is by far the largest of the shares.
> >>>This is a snapshot creation timing issue ?
> >>>How do I compensate for it ?
> >>>
> >>>
> >>>      
> >>>>more driver.20090324105803.debug
> >>>>        
> >>>1237906683.028244: driver: pid 4415 ruid 110 euid 110 version 2.6.1: 
> >>>start at Tu
> >>>e Mar 24 10:58:03 2009
> >>>1237906683.046209: driver: pid 4415 ruid 110 euid 110 version 2.6.1: 
> >>>rename at T
> >>>ue Mar 24 10:58:03 2009
> >>>1237906683.046323: driver: critical (fatal): Did not get DATE line from 
> >>>planner
> >>>
> >>>
> >>>
> >>>      
> >>>>more planner.20090324105006.debug 
> >>>>        
> >>>1237906206.393923: planner: pid 3172 ruid 110 euid 110 version 2.6.1: 
> >>>start at T
> >>>ue Mar 24 10:50:06 2009
> >>>1237906206.411351: planner: pid 3172 ruid 110 euid 110 version 2.6.1: 
> >>>rename at Tue Mar 24 10:50:06 2009
> >>>1237906206.412674: planner: critical (fatal): no DLE to backup
> >>>
> >>>extracted from # df
> >>>
> >>>griffyp/export       5242880 4433505  809374    85%    /griffyp/export
> >>>griffyp/grifadmin    2097152 1531780  565372    74%    /griffyp/grifadmin
> >>>griffyp/hiu          68157440 63972746 4184694    94%    /griffyp/hiu
> >>>griffyp/hiu2         16777216 14825408 1951808    89%    /griffyp/hiu2
> >>>griffyp/ivcp         41943040 39100290 2842750    94%    /griffyp/ivcp
> >>>
> >>>
> >>>----- Forwarded message from Amanda/Griffy <amanda AT wadsworth DOT org> 
> >>>-----
> >>>
> >>>FAILURE DUMP SUMMARY:
> >>>   grifserv /griffyp/hiu lev 0  FAILED [too many dumper retry: "[request 
> >>>   failed: timeout waiting for REP]"]
> >>>
> >>>NOTES:
> >>>  planner: Full dump of grifserv:/griffyp/hiu promoted from 3 days ahead.
> >>>  taper: tape GRIFSERV03 kb 38749958 fm 11 [OK]
> >>>
> >>>
> >>>DUMP SUMMARY:
> >>>                                       DUMPER STATS               TAPER 
> >>>                                       STATS HOSTNAME     DISK        L 
> >>> ORIG-MB  OUT-MB  COMP%  MMM:SS   KB/s 
> >>>MMM:SS   KB/s
> >>>-------------------------- ------------------------------------- 
> >>>-------------
> >>>grifserv     -fyp/export 1    4150    2682   64.6   48:26  945.2   2:44 
> >>>16717.8
> >>>grifserv     -/grifadmin 1    1423    1184   83.2   22:03  916.8   1:09 
> >>>17667.0
> >>>grifserv     -riffyp/hiu 0 FAILED 
> >>>--------------------------------------------
> >>>grifserv     -iffyp/hiu2 1   13040    7825   60.0  107:40 1240.3   8:04 
> >>>16558.1
> >>>grifserv     /var        1     100       7    7.2    0:38  195.4   0:00 
> >>>52710.7
> >>>
> >>>(brought to you by Amanda version 2.6.1)
> >>>
> >>>                                           thank you,
> >>>
> >>>                                           Brian
> >>>---
> >>>   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
> >>>   Computer Systems Support        (v) 518 486-1697
> >>>   Wadsworth Center                (f) 518 473-6384
> >>>   NYS Department of Health        Help Desk 518 473-0773
> >>>
> >>>      
> >>---
> >>   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
> >>   Computer Systems Support        (v) 518 486-1697
> >>   Wadsworth Center                (f) 518 473-6384
> >>   NYS Department of Health        Help Desk 518 473-0773
> >>
> >>    
> >---
> >   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
> >   Computer Systems Support        (v) 518 486-1697
> >   Wadsworth Center                (f) 518 473-6384
> >   NYS Department of Health        Help Desk 518 473-0773
> >
> >
> >
> >IMPORTANT NOTICE: This e-mail and any attachments may contain
> >confidential or sensitive information which is, or may be, legally
> >privileged or otherwise protected by law from further disclosure.  It
> >is intended only for the addressee.  If you received this in error or
> >from someone who was not authorized to send it to you, please do not
> >distribute, copy or use it or any attachments.  Please notify the
> >sender immediately by reply e-mail and delete this from your
> >system. Thank you for your cooperation.
> >
> >
> >  
> 
---
   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.



<Prev in Thread] Current Thread [Next in Thread>