Re: amanda client issue, dumps failing

yes, /var/mail does tend to be a problem. My mail.log can get to aquarter million lines a day, but that's not the same as getting thatnumber of messages. Typically, my level 1 backups aren't much differentfrom the level 0, because basically everything changes every day. I getby with fssnap. That isn't the same as quiescing the system, but itreduces exposure to issues to the time it takes to do the snapshot. Thething is, with an active mail system you need a place for the backingstore that is nearly as large as the mail partition, because things aregoing to change while you are backing up, so they will have to get copied.

If your mail admin isn't cooperative, ask your mail admin, or your mailadmin's boss, whether they care about having backups. Ask them if it isalright not to have a backup. You could even ask them if there are anylegal requirements relating to discovery, etc. (NYS Dept. of Health?I'll bet there are!) That might get you in deeper than you want, buthey, it should get some attention.

For details on how I do it (on Solaris 9), seehttp://wiki.zmanda.com/index.php/Backup_client#Chris_Hoogendyk.27s_Example.

If they would allow it, you could set up a sudo entry for the amandauser to stop sendmail, take a snapshot, and start sendmail again (orwhatever mail software you are using). It would be sort of like what Ihave for stopping xntpd to snapshot the root partition, but you wouldspecial case it for the /var/mail partition. Presumably, they have the/etc/init.d/sendmail or similar script properly set up so that you won'tbe in danger of screwing things up. Presumably, they would work with youto get it going.



---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center

~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogendyk AT bio.umass DOT edu>

---------------

Erdös 4




Brian Cuttler wrote:

Amanda users,

For the issue below, we have see several hundred thousand emails
move through the system system each day. UFSdump is failing because
it seems too many files come and go, queries to "continue" but can't
get a reply (I don't know of a way anyway).

We tried to switch the problem DLE to gtar - but the estimate phase
seems to take hours to run. I haven't set etimeout high enough to get
a estimate yet, and this will push actual dumps back by hours.

Is there a workaround I can employ, either to get quicker estimates
(ok to assume level 0 is the usage of the partition) or get ufsdump
to work ?

I've recommended we do something to quiess the system, but our mail god

hasn't seemed to take any interested in that suggestion. Nor do wecurrently have a mechanism to snapshot or replicate (rsync, break a

mirror, etc) the partition.

                                                thanks,

                                                Brian

----- Forwarded message from Brian Cuttler <brian AT wadsworth DOT org> -----

Date: Tue, 11 Aug 2009 11:37:35 -0400
From: Brian Cuttler <brian AT wadsworth DOT org>
To: daver <daver AT wadsworth DOT org>, amanda-users AT amanda DOT org,
        Chris Knight <knight AT wadsworth DOT org>
Cc: Ivan Auger <ivan.auger AT wadsworth DOT org>
Subject: Re: amanda probelm
In-Reply-To: <4A818AA0.5070300 AT wadsworth DOT org>
User-Agent: Mutt/1.4.1i

Reviewing the issue.

Server, Solaris 10x86, Amanda 2.6.1 (with patches)
Client, Solaris 9, Amanda  2.4.4

The problem performing level 0 dumps is that there are a large
number of files in flux -- its the mailhost system -- so ufsdump
eventually asks for help, to continue or quit.

There is no help in non-interactive mode, and I don't know if there
is a mechanism to get amanda to respond to ufsdump's query. So the
level 0 of /usr1 usually fails.

Warnings - Dave's suggestion, that the history of the error be more
explicite is a good one. Can # amdump report last successful level 0,

ie due date, if current level zero fails ? Put it in the notessection or something ?


Non-solutions - a snapshot of an open file is an open file. All things
being equal, you will get as many open files in a snapshot as in a
live system. This will not resolve the problem.

Work arounds
    1) quiess the mail server for a period of time, force a level 0
       of /usr1 during that interval.

    2) Quiess mail delivery long enough to snapshot/rsync or break
       a mirror. Backup the placid copy.

    3) Q: will a TAR of the DLE get a backup when ufsdump can not ?

    4) "Its not really a problem."
       By and large each individual message will be backed up "most"
       of the time, each message is its own file and if its not on
       the level 0 its on almost all of the level 1 and 2 dumps.
       What we will lose are the index files, which would probably
       require a rebuild after a restore anyway, so that are not
       that important to get only tape.


On Tue, Aug 11, 2009 at 11:13:36AM -0400, daver wrote:

brian checking over all the overdue files systems on curie, I find onlymailserv:/usr1 is truly overdue
As you mentioned., this may be due to the system being too active.there are NO warnings, in this regard, with the exception of a "Can'tswitch to degraded mode for unknown reason" in amdump
as we just about never read these amanda files and use the emailgenerated by the system to notify us of problems, this would seem to bea significant problem with Amanda. I agree that the developers shouldbe contacted in this regard.
as to getting /usr1 backed up. if amanda can't do it, perhaps we need toconsider an alternative. like tar