Amanda-Users

Re: Backup issues with OpenBSD 4.5 machines

2009-09-01 14:07:50
Subject: Re: Backup issues with OpenBSD 4.5 machines
From: Michael Burk <burkml AT gmail DOT com>
To: Jean-Louis Martineau <martineau AT zmanda DOT com>, "Dustin J. Mitchell" <dustin AT zmanda DOT com>
Date: Tue, 1 Sep 2009 11:41:38 -0600
Since the logs indicate an error with the "index tee", I thought I'd try turning off the index generation. The backup still failed, but with a different error:

1251826429.341455: sendbackup: pid 26726 ruid 150 euid 150 version 2.6.2alpha: start at Tue Sep  1 11:33:49 2009
1251826429.342070: sendbackup: Version 2.6.2alpha
1251826429.364640: sendbackup: pid 26726 ruid 150 euid 150 version 2.6.2alpha: rename at Tue Sep  1 11:33:49 2009
1251826429.366628: sendbackup:   Parsed request as: program `DUMP'
1251826429.366665: sendbackup:                      disk `/'
1251826429.366690: sendbackup:                      device `/'
1251826429.366714: sendbackup:                      level 0
1251826429.366737: sendbackup:                      since NODATE
1251826429.366761: sendbackup:                      options `'
1251826429.367758: sendbackup: start: zirconium.example.com:/ lev 0
1251826429.369745: sendbackup: dumping device '/dev/rsd1a' with 'ffs'
1251826429.370181: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd1a" in pipeline
1251826429.375384: sendbackup: Started backup
1251826429.395855: sendbackup:  90:  normal(|):   DUMP: Date of this level 0 dump: Tue Sep  1 11:33:49 2009
1251826429.460067: sendbackup:  90:  normal(|):   DUMP: Date of last level 0 dump: the epoch
1251826429.461111: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd1a (/) to standard output
1251826429.538792: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I) [regular files]
1251826430.811052: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II) [directories]
1251826430.812848: sendbackup:  90:  normal(|):   DUMP: estimated 47600 tape blocks.
1251826430.814960: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at: Tue Sep  1 11:33:50 2009
1251826430.820737: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III) [directories]
1251826430.829936: sendbackup:  90:  normal(|):   DUMP: End of tape detected
1251826430.833563: sendbackup:  90:  normal(|):   DUMP: Volume 1 completed at: Tue Sep  1 11:33:50 2009
1251826430.836947: sendbackup:  90:  normal(|):   DUMP: Change Volumes: Mount volume #2
1251826430.838051: sendbackup:  86: strange(?):   DUMP: fopen on /dev/tty fails: Device not configured
1251826430.839081: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is aborted.
1251826430.839642: sendbackup: critical (fatal): error [dump (1011) /sbin/dump returned 3]

Hope that helps.

-- Michael


On Tue, Sep 1, 2009 at 11:31 AM, Michael Burk <burkml AT gmail DOT com> wrote:
I applied the 3-line patch to the 0831 snapshot and ran a full backup on both machines, with 4 file systems each. All 8 completed successfully with no "strange" messages.

Next, I commented out the 3 new lines and tried the backup again on one of the machines. This time all 4 file systems failed; e.g.:

1251825776.552104: sendbackup: pid 24374 ruid 150 euid 150 version 2.6.2alpha: start at Tue Sep  1 11:22:56 2009
1251825776.552688: sendbackup: Version 2.6.2alpha
1251825776.575326: sendbackup: pid 24374 ruid 150 euid 150 version 2.6.2alpha: rename at Tue Sep  1 11:22:56 2009
1251825776.577315: sendbackup:   Parsed request as: program `DUMP'
1251825776.577352: sendbackup:                      disk `/'
1251825776.577377: sendbackup:                      device `/'
1251825776.577400: sendbackup:                      level 0
1251825776.577424: sendbackup:                      since NODATE
1251825776.577446: sendbackup:                      options `'
1251825776.578210: sendbackup: start: zirconium.example.com:/ lev 0
1251825776.579426: sendbackup: dumping device '/dev/rsd1a' with 'ffs'
1251825776.582918: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd1a" in pipeline
1251825776.587089: sendbackup: Started backup
1251825776.619734: sendbackup:  90:  normal(|):   DUMP: Date of this level 0 dump: Tue Sep  1 11:22:56 2009
1251825776.632529: sendbackup: Started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '

s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
1251825776.721646: sendbackup:  90:  normal(|):   DUMP: Date of last level 0 dump: the epoch
1251825776.722686: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd1a (/) to standard output
1251825776.779507: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I) [regular files]
1251825778.010107: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II) [directories]
1251825778.012085: sendbackup:  90:  normal(|):   DUMP: estimated 47600 tape blocks.
1251825778.013367: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at: Tue Sep  1 11:22:58 2009
1251825778.020808: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable]
1251825778.020918: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III) [directories]
1251825778.022805: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable]
1251825778.042344: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251825778.047559: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is aborted.
1251825778.048234: sendbackup: critical (fatal): error [dump (21317) /sbin/dump returned 3]

So it seems reliable that those 3 lines fix the problem somehow.
Anything else you want to try before I ask for help on the OpenBSD list?

Thanks,
Michael


On Tue, Sep 1, 2009 at 9:32 AM, Michael Burk <burkml AT gmail DOT com> wrote:
I checked the errata for OpenBSD 4.5, but saw nothing that looked related.
I applied the patch to the 0831 snapshot and am building it now. After we find the minimal patch, as Jean-Louis said, I'll post on the OpenBSD-misc list to see if anyone has an explanation.

Thanks guys for working on this - I know it's a frustrating one.

-- Michael


On Tue, Sep 1, 2009 at 8:33 AM, Dustin J. Mitchell <dustin AT zmanda DOT com> wrote:
On Tue, Sep 1, 2009 at 7:45 AM, Jean-Louis
Martineau<martineau AT zmanda DOT com> wrote:
> We need to find a minimal patch that fix the problem.
> Cat you try the attached patch?

This is starting to look like a kernel bug -- is there an associated
OpenBSD bug or something that we could reference in comments in the
code to explain this strange formulation?  Also, we'll probably need
to do this everywhere we fork and plumb a new process -- in
pipespawn.c at least.

Dustin

--
Open Source Storage Engineer
http://www.zmanda.com