Amanda-Users

Re: amreport: ERROR unexpected log line: ...

2008-06-26 11:08:59
Subject: Re: amreport: ERROR unexpected log line: ...
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: Jukka Salmi <j+amanda AT 2008.salmi DOT ch>
Date: Thu, 26 Jun 2008 10:12:11 -0400
Amanda lock the file while writing to it.
What is the output of: amadmin vv version | grep LOCKING
Which filesystem is used, is it NFS mounted?

Jean-Louis

Jukka Salmi wrote:
Jukka Salmi --> amanda-users (2008-06-25 12:37:15 +0200):
Hello,

since I upgraded an Amanda installation from 2.4.4p4 to 2.5.2p1, backup
reports always contain lines like these:

FAILURE AND STRANGE DUMP SUMMARY:
  amreport: ERROR unexpected log line: 20080625 2 [sec 45.227 kb 41655 kps 
921.7]
  amreport: ERROR unexpected log line: 20080625 0 [sec 35.747 kb 67281 kps 
1883.0]

The log file which causes these warnings contains amongst others the
following two lines:

grouper.salmi.ch /var/spool/imap 20080625 2 [sec 45.227 kb 41655 kps 921.7]
grouper.salmi.ch /home 20080625 0 [sec 35.747 kb 67281 kps 1883.0]

(grouper is the hostname of the host running amdump.)


Any hints about what could be wrong here?

I just had a closer look at this problem and it seems to be caused by
a race condition between programs started by Amanda's driver program:
all of these programs (dumper, chunker, taper) and the driver program
itself write to the same log file. Since these programs run
simultaneously, their calls to log_add() may result in interleaved
writes. And this is exactly what happens here almost daily: e.g.
yesterday's logfile contains

SUCCESS dumper grouper.salmi.ch /home 20080625 0 [...]
SUCCESS chunker STATS driver estimate grouper.salmi.ch /home 20080625 0 [...]
grouper.salmi.ch /home 20080625 0 [...]
SUCCESS taper grouper.salmi.ch /home 20080625 0 [...]

The chunker and the driver wrote to the log file at the same time...

To fix this problem correctly, the writing of the log file should be
synchronised. As a hack, the two calls to fullwrite() in log_add()
could be replaced by a single call; this would probably cause the
problem to occur less often. I'd certainly prefer the correct
solution...

Any comments?


Regards, Jukka