Amanda-Users

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 17:27:12
Subject: Re: Backup issues with OpenBSD 4.5 machines
From: Jean-Louis Martineau <martineau AT zmanda DOT com>
To: Michael Burk <burkml AT gmail DOT com>
Date: Mon, 31 Aug 2009 16:41:00 -0400
Do you have the patch I sent to stan on this system?

The patch check before and after the write if the pipe is in O_NONBLOCK or not and give an error if it is.

I'm totally lost since it is in blocking mode and you get EAGAIN, which is impossible!!!!

Jean-louis

Michael Burk wrote:
Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:31:42 2009
1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
1251693102.382940: sendbackup:                      disk `/'
1251693102.382991: sendbackup:                      device `/'
1251693102.383039: sendbackup:                      level 0
1251693102.383086: sendbackup:                      since NODATE
1251693102.383134: sendbackup:                      options `'
1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693102.398325: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd0a" in pipeline
1251693102.403120: sendbackup: Started backup
1251693102.461663: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:31:42 2009 1251693102.468669: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251693102.487536: sendbackup: Started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
1251693102.519939: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a (/) to standard output 1251693102.588086: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251693104.583392: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251693104.585824: sendbackup: 90: normal(|): DUMP: estimated 35134 tape blocks. 1251693104.595314: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Sun Aug 30 22:31:44 2009 1251693104.615444: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251693104.919691: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251693105.800795: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable] 1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable]
1251693105.827339: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693105.832068: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is aborted. 1251693105.839675: sendbackup: critical (fatal): error [dump (7257) /sbin/dump returned 3]

Second:

1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:32:03 2009
1251693123.841613: sendbackup: Version 2.6.2alpha
1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:32:03 2009
1251693123.896360: sendbackup:   Parsed request as: program `DUMP'
1251693123.896437: sendbackup:                      disk `/'
1251693123.896487: sendbackup:                      device `/'
1251693123.896535: sendbackup:                      level 0
1251693123.896584: sendbackup:                      since NODATE
1251693123.896632: sendbackup:                      options `'
1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693123.911778: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd0a" in pipeline
1251693123.916860: sendbackup: Started backup
1251693123.992261: sendbackup: Started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
1251693123.995743: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:32:03 2009 1251693124.009013: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251693124.038977: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a (/) to standard output 1251693124.143454: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251693126.173916: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251693126.176287: sendbackup: 90: normal(|): DUMP: estimated 35134 tape blocks. 1251693126.185677: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Sun Aug 30 22:32:06 2009 1251693126.205305: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251693126.505055: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251693126.797271: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable] 1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable]
1251693126.821488: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693126.825377: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is aborted. 1251693126.833573: sendbackup: critical (fatal): error [dump (22791) /sbin/dump returned 3]

Thanks,
Michael

On Mon, Aug 31, 2009 at 12:49 PM, Jean-Louis Martineau <martineau AT zmanda DOT com <mailto:martineau AT zmanda DOT com>> wrote:

    You get the error in the index pipe instead of the data path. The
    backup is correct but your index is empty. That's why you get a
    STRANGE result instead of a failure.

    Can you post a sendbackup.*.debug for a dle that failed?

    Jean-Louis

    Michael Burk wrote:

        Hello,

        I applied the patches to the 20090827 snapshot. I tried it on
        two OpenBSD 4.5 sparc64 systems, forcing both to do full
        backups. One system seemed to work on all 3 file systems, the
        other failed on all 4 file systems with the same errors as
        before (exactly like what Stan reported). I'm using bsdtcp auth.

        The system that worked, however, did give a "strange" result
        in the report on one filesystem. Here's the corresponding
        "sendbackup" debug file:

        1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150
        version 2.6.2alpha: start at Sun Aug 30 2
        2:25:00 2009
        1251692700.530411: sendbackup: Version 2.6.2alpha
        1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150
        version 2.6.2alpha: rename at Sun Aug 30
        22:25:00 2009
        1251692700.555394: sendbackup:   Parsed request as: program `DUMP'
        1251692700.555435: sendbackup:                      disk `/usr'
        1251692700.555461: sendbackup:                      device `/usr'
        1251692700.555484: sendbackup:                      level 0
        1251692700.555507: sendbackup:                      since NODATE
        1251692700.555530: sendbackup:                      options `'
        1251692700.556750: sendbackup: start:
        zirconium.example.com:/usr lev 0
        1251692700.559495: sendbackup: dumping device '/dev/rsd1d'
        with 'ffs'
        1251692700.566094: sendbackup: Spawning "/sbin/dump dump 0usf
        1048576 - /dev/rsd1d" in pipeline
        1251692700.571185: sendbackup: Started backup
        1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK
        before dup2
        1251692700.597388: sendbackup: Started index creator:
        "/sbin/restore -tvf - 2>&1 | sed -e '
        s/^leaf[        ]*[0-9]*[       ]*\.//
        t
        /^dir[  ]/ {
        s/^dir[         ]*[0-9]*[       ]*\.//
        s%$%/%
        t
        }
        d
        '"
        1251692700.611526: sendbackup:  90:  normal(|):   DUMP: Date
        of this level 0 dump: Sun Aug 30 22:25:
        00 2009
        1251692700.669703: sendbackup:  90:  normal(|):   DUMP: Date
        of last level 0 dump: the epoch
        1251692700.670724: sendbackup:  90:  normal(|):   DUMP:
        Dumping /dev/rsd1d (/usr) to standard output
        1251692700.763600: sendbackup:  90:  normal(|):   DUMP:
        mapping (Pass I) [regular files]
        1251692705.618345: sendbackup:  90:  normal(|):   DUMP:
        mapping (Pass II) [directories]
        1251692705.620040: sendbackup:  90:  normal(|):   DUMP:
        estimated 1330085 tape blocks.
        1251692705.623976: sendbackup:  90:  normal(|):   DUMP: Volume
        1 started at: Sun Aug 30 22:25:05 2009
        1251692705.650687: sendbackup:  90:  normal(|):   DUMP:
        dumping (Pass III) [directories]
        1251692745.051547: sendbackup:  90:  normal(|):   DUMP:
        dumping (Pass IV) [regular files]
        1251692747.915767: sendbackup: 115: strange(?): sed: stdout:
        Resource temporarily unavailable
        1251692954.633083: sendbackup:  43:    size(|):   DUMP:
        1437829 tape blocks
        1251692954.637414: sendbackup:  90:  normal(|):   DUMP: Date
        of this level 0 dump: Sun Aug 30 22:25:00 2009
        1251692954.641414: sendbackup:  90:  normal(|):   DUMP: Volume
        1 completed at: Sun Aug 30 22:29:14 2009
        1251692954.641858: sendbackup: Index pipe exited with status 1
        1251692954.642719: sendbackup:  90:  normal(|):   DUMP: Volume
        1 took 0:04:09
        1251692954.643918: sendbackup:  90:  normal(|):   DUMP: Volume
        1 transfer rate: 5774 KB/s
        1251692954.645095: sendbackup:  90:  normal(|):   DUMP: Date
        this dump completed:  Sun Aug 30 22:29:14 2009
        1251692954.646239: sendbackup:  90:  normal(|):   DUMP:
        Average transfer rate: 5774 KB/s
        1251692954.647430: sendbackup:  90:  normal(|):   DUMP: level
        0 dump on Sun Aug 30 22:25:00 2009
        1251692954.653480: sendbackup:  90:  normal(|):   DUMP: DUMP
        IS DONE
        1251692954.653841: sendbackup: Parsed backup messages
        1251692954.654032: sendbackup: pid 30459 finish time Sun Aug
        30 22:29:14 2009

        -- Michael

        On Fri, Aug 28, 2009 at 5:22 AM, stan <stanb AT panix DOT com
        <mailto:stanb AT panix DOT com> <mailto:stanb AT panix DOT com
        <mailto:stanb AT panix DOT com>>> wrote:

           On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote:
           > Cool - can you send me the patch? Or is it already in the
        0826
           snapshot?
           >
           > FYI - I got 2.5.0p2 working, but could never get 2.5.1p3
           working, even with
           > the same config. No idea why.
           > I started looking at the source code last night in the 0825
           snapshot. I'll
           > discontinue that search if there's a patch.
           >
           This is interesting. I sent Jean-Louis Martineau an email,
        saying
           teaks for
           fixing things, and he replied that the patch he sent me
        just added
           debugging,
           it was not intended to fix anything.

           Bit, using this snapshot 20090813, and the patch, which I will
           send to you.
           I have it working on my 3 OpenBSD 4.5 machines that I have
           upgraded so far.
           I ran several test yesterday, and a full fledged backup run
        last
           night,
           and I have not seen any issues since installing this code.

           I am going to attach the patch to this email.

           I would greatly appreciate it, if you would apply this patch to
           the above
           snapshot, test, and tell me what your results are.

           I am very puzzled at the moment.

           BTW, is your failure the same as mine? That is amcheck
        passes, and
           most
           times you get a PARTIAL backup?

           BTW, I am using bsdtcp "auth" and client compression now, but I
           have made
           this work with this code, with bsd auth, and no
        compression, so I
           don't
           think that has any affect on this issue.

           Thanks for the help!

           --
           One of the main causes of the fall of the roman empire was
        that,
           lacking
           zero, they had no way to indicate successful termination of
        their C
           programs.