Amanda-Users

Re: amanda client issue, dumps failing

2009-08-12 15:05:23
Subject: Re: amanda client issue, dumps failing
From: Brian Cuttler <brian AT wadsworth DOT org>
To: Jean-Louis Martineau <martineau AT zmanda DOT com>, amanda-users AT amanda DOT org, Chris Knight <knight AT wadsworth DOT org>
Date: Wed, 12 Aug 2009 14:47:12 -0400
On Wed, Aug 12, 2009 at 02:31:17PM -0400, Jean-Louis Martineau wrote:
> Brian,
> 
> gtar read the complete directory hierarchy before sending data, it can 
> take a long time if you have a lot of files.

Ok, will keep an eye. Looking at the sendfile log we are down
to at least ./spool/imap/user/gling, but if its only headers
at this point we could be in for a long run.

Snapshots... move to ZFS, let amanda halt the mail system while
it performs the snapshot (which is very fast on ZFS) and then
result mail... So, step 1, update mailserv to Solaris 10, 2)
convert the file system 3)...all without having good backups
to start.

Apreciate your helping us out of yet another situation we
created ourselves.

> Jean-Louis
> 
> Brian Cuttler wrote:
> >cool thanks.
> >
> >its working, but its crawling.
> >
> >20 minutes and little actual data moved. Which I realized
> >is not an amanda issue.
> >
> >mailserv:/usr1 0     82499m dumping        0m (  0.00%) (13:52:17)
> >
> >
> >On Wed, Aug 12, 2009 at 01:38:21PM -0400, Jean-Louis Martineau wrote:
> >  
> >>Brian,
> >>
> >>Try "estimate calcsize" or "estimate server"
> >>
> >>Jean-Louis
> >>
> >>Brian Cuttler wrote:
> >>    
> >>>Amanda users,
> >>>
> >>>For the issue below, we have see several hundred thousand emails
> >>>move through the system system each day. UFSdump is failing because
> >>>it seems too many files come and go, queries to "continue" but can't
> >>>get a reply (I don't know of a way anyway).
> >>>
> >>>We tried to switch the problem DLE to gtar - but the estimate phase
> >>>seems to take hours to run. I haven't set etimeout high enough to get
> >>>a estimate yet, and this will push actual dumps back by hours.
> >>>
> >>>Is there a workaround I can employ, either to get quicker estimates
> >>>(ok to assume level 0 is the usage of the partition) or get ufsdump
> >>>to work ?
> >>>
> >>>I've recommended we do something to quiess the system, but our mail god
> >>>hasn't seemed to take any interested in that suggestion. Nor do we 
> >>>currently have a mechanism to snapshot or replicate (rsync, break a
> >>>mirror, etc) the partition.
> >>>
> >>>                                           thanks,
> >>>
> >>>                                           Brian
> >>>
> >>>----- Forwarded message from Brian Cuttler <brian AT wadsworth DOT org> 
> >>>-----
> >>>
> >>>Date: Tue, 11 Aug 2009 11:37:35 -0400
> >>>From: Brian Cuttler <brian AT wadsworth DOT org>
> >>>To: daver <daver AT wadsworth DOT org>, amanda-users AT amanda DOT org,
> >>>   Chris Knight <knight AT wadsworth DOT org>
> >>>Cc: Ivan Auger <ivan.auger AT wadsworth DOT org>
> >>>Subject: Re: amanda probelm
> >>>In-Reply-To: <4A818AA0.5070300 AT wadsworth DOT org>
> >>>User-Agent: Mutt/1.4.1i
> >>>
> >>>
> >>>Reviewing the issue.
> >>>
> >>>Server, Solaris 10x86, Amanda 2.6.1 (with patches)
> >>>Client, Solaris 9, Amanda  2.4.4
> >>>
> >>>The problem performing level 0 dumps is that there are a large
> >>>number of files in flux -- its the mailhost system -- so ufsdump
> >>>eventually asks for help, to continue or quit.
> >>>
> >>>There is no help in non-interactive mode, and I don't know if there
> >>>is a mechanism to get amanda to respond to ufsdump's query. So the
> >>>level 0 of /usr1 usually fails.
> >>>
> >>>Warnings - Dave's suggestion, that the history of the error be more
> >>>explicite is a good one. Can # amdump report last successful level 0,
> >>>ie due date, if current level zero fails ? Put it in the notes 
> >>>section or something ?
> >>>
> >>>Non-solutions - a snapshot of an open file is an open file. All things
> >>>being equal, you will get as many open files in a snapshot as in a
> >>>live system. This will not resolve the problem.
> >>>
> >>>Work arounds
> >>>   1) quiess the mail server for a period of time, force a level 0
> >>>      of /usr1 during that interval.
> >>>
> >>>   2) Quiess mail delivery long enough to snapshot/rsync or break
> >>>      a mirror. Backup the placid copy.
> >>>
> >>>   3) Q: will a TAR of the DLE get a backup when ufsdump can not ?
> >>>
> >>>   4) "Its not really a problem."
> >>>      By and large each individual message will be backed up "most"
> >>>      of the time, each message is its own file and if its not on
> >>>      the level 0 its on almost all of the level 1 and 2 dumps.
> >>>      What we will lose are the index files, which would probably
> >>>      require a rebuild after a restore anyway, so that are not
> >>>      that important to get only tape.
> >>>
> >>>
> >>>On Tue, Aug 11, 2009 at 11:13:36AM -0400, daver wrote:
> >>> 
> >>>      
> >>>>brian checking over all the overdue files systems on curie, I find only 
> >>>>mailserv:/usr1 is truly overdue
> >>>>
> >>>>As you mentioned., this may be due to the system being too active.  
> >>>>there are NO warnings, in this regard, with the exception of a "Can't 
> >>>>switch to degraded mode for unknown reason" in amdump
> >>>>
> >>>>as we just about never read these amanda files and use the email 
> >>>>generated by the system to notify us of problems, this would seem to be 
> >>>>a significant problem with Amanda.  I agree that the developers should 
> >>>>be contacted in this regard.
> >>>>
> >>>>as to getting /usr1 backed up. if amanda can't do it, perhaps we need 
> >>>>to consider an alternative. like tar
> >>>>
> >>>>
> >>>>
> >>>>   
> >>>>        
> >>>---
> >>>  Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
> >>>  Computer Systems Support        (v) 518 486-1697
> >>>  Wadsworth Center                (f) 518 473-6384
> >>>  NYS Department of Health        Help Desk 518 473-0773
> >>>
> >>>
> >>>----- End forwarded message -----
> >>>---
> >>>  Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
> >>>  Computer Systems Support        (v) 518 486-1697
> >>>  Wadsworth Center                (f) 518 473-6384
> >>>  NYS Department of Health        Help Desk 518 473-0773
> >>>
> >>>
> >>>
> >>>IMPORTANT NOTICE: This e-mail and any attachments may contain
> >>>confidential or sensitive information which is, or may be, legally
> >>>privileged or otherwise protected by law from further disclosure.  It
> >>>is intended only for the addressee.  If you received this in error or
> >>>      
> >>>from someone who was not authorized to send it to you, please do not
> >>    
> >>>distribute, copy or use it or any attachments.  Please notify the
> >>>sender immediately by reply e-mail and delete this from your
> >>>system. Thank you for your cooperation.
> >>>
> >>>
> >>> 
> >>>      
> >---
> >   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
> >   Computer Systems Support        (v) 518 486-1697
> >   Wadsworth Center                (f) 518 473-6384
> >   NYS Department of Health        Help Desk 518 473-0773
> >
> >
> >
> >IMPORTANT NOTICE: This e-mail and any attachments may contain
> >confidential or sensitive information which is, or may be, legally
> >privileged or otherwise protected by law from further disclosure.  It
> >is intended only for the addressee.  If you received this in error or
> >from someone who was not authorized to send it to you, please do not
> >distribute, copy or use it or any attachments.  Please notify the
> >sender immediately by reply e-mail and delete this from your
> >system. Thank you for your cooperation.
> >
> >
> >  
> 
---
   Brian R Cuttler                 brian.cuttler AT wadsworth DOT org
   Computer Systems Support        (v) 518 486-1697
   Wadsworth Center                (f) 518 473-6384
   NYS Department of Health        Help Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.



<Prev in Thread] Current Thread [Next in Thread>