Amanda-Users

Re: putting error warnings into seperate mail

2006-05-10 11:31:07
Subject: Re: putting error warnings into seperate mail
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: listrcv <listrcv AT condor-werke DOT com>
Date: Wed, 10 May 2006 17:25:46 +0200
On 2006-05-10 14:48, listrcv wrote:
Paul Bijnens wrote:

These events can go unnoticed too easily when glancing at the daily mail from amreport, so I would like to somehow setup for getting another mail besides the report, to different addresses, in such cases.


In my reports those errors spring into sight very easily:
the first few lines are completely different layout / stars /
uppercase letters.  Quite a good imitation of Amanda screaming for
assistance.

Hm, how much they stick out probably very much depends on the MUA, screen resolution etc. you are using. I can provide a screenshot to show, if you wanna look.

The first few lines of a normal report:

:  These dumps were to tape Daily-05.
:  The next tape Amanda expects to use is: Daily-06.
:
:
:  STATISTICS:
:  [...]


The first few lines of a run with tape problems:

:  These dumps were to tape DAILY58.
:  *** A TAPE ERROR OCCURRED: [[writing file: No space left on device]].
:  Some dumps may have been left in the holding disk.
:  Run amflush to flush them to tape.
:  The next tape Amanda expects to use is: DAILY59.
:
:  FAILURE AND STRANGE DUMP SUMMARY:
:    chandon    /space/projects lev 0 FAILED [out of tape]
:
:
:  STATISTICS:
:  [...]


The first few lines of a run with some hosts failing:

:  These dumps were to tape Daily-04.
:  The next tape Amanda expects to use is: Daily-05.
:
:  FAILURE AND STRANGE DUMP SUMMARY:
:    vasov      /space lev 0 FAILED [Estimate timeout from vasov]
:    vasov      /var lev 0 FAILED [Estimate timeout from vasov]
:    vasov      / lev 0 FAILED [Estimate timeout from vasov]
:    katastrov  /export/home2 lev 1 STRANGE
:
:
:  STATISTICS:
:  [...]


When I see a "FAILURE AND STRANGE" section, it means that this
report needs more attention.  Otherwise I just move it to the normal
Amanda reports folder.

That is also why I expand my list of patterns to suppress the section
for "normal" warnings.

It's only now and then that I take a look at the NOTES section.  The
rest I look at only when needing more details ('I'm gonna reinstall
host xxx, is there a decent backup of it?")



The warning in amreport is at a place I pay usually no attention to because I don't care which tape will be used next. That's what an automatic tape changer and amanda are for ;)

The warning is at the top!  Well yes, the second line...
The first thing that my eyes see when opening the mail.



Instead, I'm looking at the statistics immediately and then scroll down. Moreover, there are days when I don't get to look at the amreport mail at all because I'm too busy. Another thing is that I may be on vacation and my work mate who takes over during that time might not see it in time or not at all.

My experience is that even red blinking text flashing "ERROR" and a
sound attachment doing "wiewieiwieiwie" isn't enough for some people
to note the error.


I'm using a seperate mail account for system reports and filter them by machine, he uses filtering to put system messages aside within his account. The extra warning I would like to set up would go into my main account and also into his, so that it _will_ be noticed.

Both times the changer failed, I only noticed by chance because I happened to look at the small display that is on the changer. My workmate won't happen to look there by chance because his office is not at the server room like mine is. Obviously, that's not the way it should be, and something more reliable is needed.

Is this amanda 2.5.0 ?  There was a bug in amcheck that did not set
the flag to failure for these things.  And you would not get a
mail from amcheck in that case, only when run manually you would see
the error it printed.
Fixed only very recently.


But if it isn't loud enough, then maybe just a script that checks
for files in the holdingdisk after the run, and warn you of the
needed flush?

Good idea, that would be something to start with :) If the changer or tape device fails, there will be files left there.

But this will not catch when Amanda fails to do
anything at all...

hmmmm

Keeping the "FAILURE AND STRANGE..." section small, i.e. handling
those errors, instead of allowing minor errors in there, helps to
spot real errors too.
I changed my patterns to categorize as normal, that certain files are
"changed are we read it" etc.   It would help if I didn't need to
recompile the Amanda binaries for this.

The 'Failure and Strange' section is something different. MOTT, I notice it when scrolling through the report.

You mean the "DETAILS".  I mean the "SUMMARY" at the top.



What if error messages would generally be seperated from the report? Amanda could send several (or two) mails, to different addresses if set up to that, or with subjects that allow filtering/redirecting.

I think that is the "FAILED AND STRANGE SUMMARY" (+ DETAILS) section
that you are after.

How about a procmail filder that inspects the body for such a section
and accordingly labels the message?



Maybe that would even make more sense than to put errors into the report. The report is a report in the first place, error messages are error messages in the first place. Abusing a report to send error messages is worse than abusing error messages to send reports because error messages in a report are much more likely to go unnoticed than a report in an error message --- you get the idea :)

The report is mostly irrelevant for daily usage, when everything works as it should. It's only the errors I'm interested in and that require attention. Amanda is about the automation of backups ... That genuinely involves not having to pay attention to them.



--
Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************