Ok, scratched my head a little and thought about was what different
between testing in-preproduction and now, we changed the system name
and we ADDED quotas to the zfs file systems!!
Checking # df output I realized that the file system with the error
was the clostest to capacity. It was sitting at about 92% of its
quota limit, with the next nearest capacity (size issues asside) at
89% capacity.
I increased the disk quota so it fell below 90% and tried amdump
again. We failed, I believe, on etimeout, which was set to only
300 or 600 seconds. We failed again at 2600 seconds, I uppped it
to 4200 and left it for the nightly run to test.
We produced the following error
FAILURE DUMP SUMMARY:
grifserv /griffyp/hiu lev 0 FAILED [too many dumper retry:
"[request failed: timeout waiting for REP]"]
Currently my timeouts are set
etimeout 4200 # number of seconds per filesystem for estimates.
dtimeout 1800 # number of idle seconds before a dump is aborted.
ctimeout 30 # maximum number of seconds that amcheck waits
# for each client host
The particular file system (underlie a single samba share) is 61 Gig.
griffyp/hiu 70G 61G 8.9G 88% /griffyp/hiu
Don't know if I misconfigured something, had not expected issues when
moving from testing to production. Did not experience this issue on the
older/slower server, but that was a UFS file system running DUMP rather
than tar.
Any suggestions much apreciated, I really want to get this file system
back under backups.
If there are no ideas, I'll try to do a straight tar of the system
tonight and skip the snapshot for it.
If there is a limit for quota'd zfs file systems to produce snapshots,
how do I both produce snapshots and keep a lid on usage ??
thanks,
Brian
On Thu, Mar 26, 2009 at 03:09:16PM -0400, Brian Cuttler wrote:
>
> I really am confused...
>
> When we set up amanda I configured amanda to have priv to
> perform snapshots. That is, as root I issued this command.
>
> /usr/sbin/usermod -P "ZFS File System Management,ZFS Storage Management"
> amanda
>
> We are able to perform 9 out of 10 ZFS dumps each night on this
> system, one fails with an apparently planner error, but I think
> it goes a little deeper.
>
> I had expected that as user amanda I could manually create a
> snapshot, but I can't, not on any zfs mount point, even though
> we seem to be able to do so when we perfrom nightly dumps.
>
> > id
> uid=110(amanda) gid=3(sys)
>
> > zfs snapshot griffyp/hiu@amanda
> cannot create snapshot 'griffyp/hiu@amanda': permission denied
>
> I'm clearly missing a step in the methodology.
>
> Only fileysstem /griffyp/hiu fails to backup, griffyp being the
> name of the amanda pool.
>
> Can anyone help me ?
>
> thanks,
>
> Brian
>
> > df -kl
> Filesystem kbytes used avail capacity Mounted on
> /dev/dsk/c1t0d0s0 8262413 7067795 1111994 87% /
> /devices 0 0 0 0% /devices
> ctfs 0 0 0 0% /system/contract
> proc 0 0 0 0% /proc
> mnttab 0 0 0 0% /etc/mnttab
> swap 8660832 1632 8659200 1% /etc/svc/volatile
> objfs 0 0 0 0% /system/object
> sharefs 0 0 0 0% /etc/dfs/sharetab
> /platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1
> 8262413 7067795 1111994 87%
> /platform/sun4u-us3/lib/libc_psr.so.1
> /platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
> 8262413 7067795 1111994 87%
> /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
> fd 0 0 0 0% /dev/fd
> /dev/dsk/c1t0d0s6 8262413 1636317 6543472 21% /var
> swap 2097152 1624 2095528 1% /tmp
> swap 8659264 64 8659200 1% /var/run
> griffyp 173182464 475 44829887 1% /griffyp
> griffyp/amanda 173182464 74 44829887 1% /griffyp/amanda
> griffyp/climsgl 1048576 245576 802999 24% /griffyp/climsgl
> griffyp/connlab 5242880 4574973 667906 88% /griffyp/connlab
> griffyp/csssoft 1048576 38 1048537 1% /griffyp/csssoft
> griffyp/dew 2097152 1324984 772167 64% /griffyp/dew
> griffyp/encphrev 1048576 1197 1047379 1% /griffyp/encphrev
> griffyp/export 5242880 4433533 809347 85% /griffyp/export
> griffyp/grifadmin 2097152 1548594 548558 74% /griffyp/grifadmin
> griffyp/hiu 68157440 64051634 4105805 94% /griffyp/hiu
> griffyp/hiu2 16777216 14825811 1951405 89% /griffyp/hiu2
> griffyp/ivcp 41943040 37323002 4620038 89% /griffyp/ivcp
> griffyp/virologypt 1048576 16222 1032353 2% /griffyp/virologypt
>
>
>
>
>
>
> On Tue, Mar 24, 2009 at 11:16:52AM -0400, Brian Cuttler wrote:
> > I'm sure this worked before I put the system into production.
> >
> > Amanda 2.6.1, Solaris 10/x86, ZFS with snapshots.
> >
> > I don't know how I have have no DLE to backup, unless there
> > is some failure in producing the snapshot.
> >
> > Now that the system is in production we are running samba with
> > about a dozen shares, all off of the same zfs pool. They are
> > treated as separate file systems so that I'm able to quota each
> > of the shares and provide some continuity of service by preventing
> > one share from, say, filling a spindle, which we experienced when
> > we use UFS on the drives.
> >
> > The HIU share is by far the largest of the shares.
> > This is a snapshot creation timing issue ?
> > How do I compensate for it ?
> >
> >
> > > more driver.20090324105803.debug
> > 1237906683.028244: driver: pid 4415 ruid 110 euid 110 version 2.6.1: start
> > at Tu
> > e Mar 24 10:58:03 2009
> > 1237906683.046209: driver: pid 4415 ruid 110 euid 110 version 2.6.1: rename
> > at T
> > ue Mar 24 10:58:03 2009
> > 1237906683.046323: driver: critical (fatal): Did not get DATE line from
> > planner
> >
> >
> >
> > > more planner.20090324105006.debug
> > 1237906206.393923: planner: pid 3172 ruid 110 euid 110 version 2.6.1: start
> > at T
> > ue Mar 24 10:50:06 2009
> > 1237906206.411351: planner: pid 3172 ruid 110 euid 110 version 2.6.1:
> > rename at
> > Tue Mar 24 10:50:06 2009
> > 1237906206.412674: planner: critical (fatal): no DLE to backup
> >
> > extracted from # df
> >
> > griffyp/export 5242880 4433505 809374 85% /griffyp/export
> > griffyp/grifadmin 2097152 1531780 565372 74% /griffyp/grifadmin
> > griffyp/hiu 68157440 63972746 4184694 94% /griffyp/hiu
> > griffyp/hiu2 16777216 14825408 1951808 89% /griffyp/hiu2
> > griffyp/ivcp 41943040 39100290 2842750 94% /griffyp/ivcp
> >
> >
> > ----- Forwarded message from Amanda/Griffy <amanda AT wadsworth DOT org>
> > -----
> >
> > FAILURE DUMP SUMMARY:
> > grifserv /griffyp/hiu lev 0 FAILED [too many dumper retry: "[request
> > failed: timeout waiting for REP]"]
> >
> > NOTES:
> > planner: Full dump of grifserv:/griffyp/hiu promoted from 3 days ahead.
> > taper: tape GRIFSERV03 kb 38749958 fm 11 [OK]
> >
> >
> > DUMP SUMMARY:
> > DUMPER STATS TAPER
> > STATS
> > HOSTNAME DISK L ORIG-MB OUT-MB COMP% MMM:SS KB/s MMM:SS
> > KB/s
> > -------------------------- -------------------------------------
> > -------------
> > grifserv -fyp/export 1 4150 2682 64.6 48:26 945.2 2:44
> > 16717.8
> > grifserv -/grifadmin 1 1423 1184 83.2 22:03 916.8 1:09
> > 17667.0
> > grifserv -riffyp/hiu 0 FAILED
> > --------------------------------------------
> > grifserv -iffyp/hiu2 1 13040 7825 60.0 107:40 1240.3 8:04
> > 16558.1
> > grifserv /var 1 100 7 7.2 0:38 195.4 0:00
> > 52710.7
> >
> > (brought to you by Amanda version 2.6.1)
> >
> > thank you,
> >
> > Brian
> > ---
> > Brian R Cuttler brian.cuttler AT wadsworth DOT org
> > Computer Systems Support (v) 518 486-1697
> > Wadsworth Center (f) 518 473-6384
> > NYS Department of Health Help Desk 518 473-0773
> >
> ---
> Brian R Cuttler brian.cuttler AT wadsworth DOT org
> Computer Systems Support (v) 518 486-1697
> Wadsworth Center (f) 518 473-6384
> NYS Department of Health Help Desk 518 473-0773
>
---
Brian R Cuttler brian.cuttler AT wadsworth DOT org
Computer Systems Support (v) 518 486-1697
Wadsworth Center (f) 518 473-6384
NYS Department of Health Help Desk 518 473-0773
IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure. It
is intended only for the addressee. If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments. Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.
|