Amanda-Users

Re: amanda client issue, dumps failing

2009-08-12 14:35:13
Subject: Re: amanda client issue, dumps failing
From: Chris Hoogendyk <hoogendyk AT bio.umass DOT edu>
To: Brian Cuttler <brian AT wadsworth DOT org>
Date: Wed, 12 Aug 2009 14:21:36 -0400
yes, /var/mail does tend to be a problem. My mail.log can get to a quarter million lines a day, but that's not the same as getting that number of messages. Typically, my level 1 backups aren't much different from the level 0, because basically everything changes every day. I get by with fssnap. That isn't the same as quiescing the system, but it reduces exposure to issues to the time it takes to do the snapshot. The thing is, with an active mail system you need a place for the backing store that is nearly as large as the mail partition, because things are going to change while you are backing up, so they will have to get copied.

If your mail admin isn't cooperative, ask your mail admin, or your mail admin's boss, whether they care about having backups. Ask them if it is alright not to have a backup. You could even ask them if there are any legal requirements relating to discovery, etc. (NYS Dept. of Health? I'll bet there are!) That might get you in deeper than you want, but hey, it should get some attention.

For details on how I do it (on Solaris 9), see http://wiki.zmanda.com/index.php/Backup_client#Chris_Hoogendyk.27s_Example.

If they would allow it, you could set up a sudo entry for the amanda user to stop sendmail, take a snapshot, and start sendmail again (or whatever mail software you are using). It would be sort of like what I have for stopping xntpd to snapshot the root partition, but you would special case it for the /var/mail partition. Presumably, they have the /etc/init.d/sendmail or similar script properly set up so that you won't be in danger of screwing things up. Presumably, they would work with you to get it going.


---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk AT bio.umass DOT edu>

---------------
Erdös 4




Brian Cuttler wrote:
Amanda users,

For the issue below, we have see several hundred thousand emails
move through the system system each day. UFSdump is failing because
it seems too many files come and go, queries to "continue" but can't
get a reply (I don't know of a way anyway).

We tried to switch the problem DLE to gtar - but the estimate phase
seems to take hours to run. I haven't set etimeout high enough to get
a estimate yet, and this will push actual dumps back by hours.

Is there a workaround I can employ, either to get quicker estimates
(ok to assume level 0 is the usage of the partition) or get ufsdump
to work ?

I've recommended we do something to quiess the system, but our mail god
hasn't seemed to take any interested in that suggestion. Nor do we currently have a mechanism to snapshot or replicate (rsync, break a
mirror, etc) the partition.

                                                thanks,

                                                Brian

----- Forwarded message from Brian Cuttler <brian AT wadsworth DOT org> -----

Date: Tue, 11 Aug 2009 11:37:35 -0400
From: Brian Cuttler <brian AT wadsworth DOT org>
To: daver <daver AT wadsworth DOT org>, amanda-users AT amanda DOT org,
        Chris Knight <knight AT wadsworth DOT org>
Cc: Ivan Auger <ivan.auger AT wadsworth DOT org>
Subject: Re: amanda probelm
In-Reply-To: <4A818AA0.5070300 AT wadsworth DOT org>
User-Agent: Mutt/1.4.1i


Reviewing the issue.

Server, Solaris 10x86, Amanda 2.6.1 (with patches)
Client, Solaris 9, Amanda  2.4.4

The problem performing level 0 dumps is that there are a large
number of files in flux -- its the mailhost system -- so ufsdump
eventually asks for help, to continue or quit.

There is no help in non-interactive mode, and I don't know if there
is a mechanism to get amanda to respond to ufsdump's query. So the
level 0 of /usr1 usually fails.

Warnings - Dave's suggestion, that the history of the error be more
explicite is a good one. Can # amdump report last successful level 0,
ie due date, if current level zero fails ? Put it in the notes section or something ?

Non-solutions - a snapshot of an open file is an open file. All things
being equal, you will get as many open files in a snapshot as in a
live system. This will not resolve the problem.

Work arounds
    1) quiess the mail server for a period of time, force a level 0
       of /usr1 during that interval.

    2) Quiess mail delivery long enough to snapshot/rsync or break
       a mirror. Backup the placid copy.

    3) Q: will a TAR of the DLE get a backup when ufsdump can not ?

    4) "Its not really a problem."
       By and large each individual message will be backed up "most"
       of the time, each message is its own file and if its not on
       the level 0 its on almost all of the level 1 and 2 dumps.
       What we will lose are the index files, which would probably
       require a rebuild after a restore anyway, so that are not
       that important to get only tape.


On Tue, Aug 11, 2009 at 11:13:36AM -0400, daver wrote:
brian checking over all the overdue files systems on curie, I find only mailserv:/usr1 is truly overdue

As you mentioned., this may be due to the system being too active. there are NO warnings, in this regard, with the exception of a "Can't switch to degraded mode for unknown reason" in amdump

as we just about never read these amanda files and use the email generated by the system to notify us of problems, this would seem to be a significant problem with Amanda. I agree that the developers should be contacted in this regard.

as to getting /usr1 backed up. if amanda can't do it, perhaps we need to consider an alternative. like tar



<Prev in Thread] Current Thread [Next in Thread>