Amanda-Users

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 17:51:53
Subject: Re: Backup issues with OpenBSD 4.5 machines
From: Michael Burk <burkml AT gmail DOT com>
To: Jean-Louis Martineau <martineau AT zmanda DOT com>
Date: Mon, 31 Aug 2009 15:11:44 -0600
Hi Jean-Louis,

I thought I had applied the patches on this machine also, but it turns out I didn't (sorry about that). I applied the patches and ran a new dump. This time all 4 file systems succeeded, though /usr got the same "strange" message as on the other machine:

1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: start at Mon Aug 31 14:58:02 2009
1251752282.484788: sendbackup: Version 2.6.2alpha
1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: rename at Mon Aug 31 14:58:02 2009
1251752282.527795: sendbackup:   Parsed request as: program `DUMP'
1251752282.527880: sendbackup:                      disk `/usr'
1251752282.527931: sendbackup:                      device `/usr'
1251752282.527978: sendbackup:                      level 0
1251752282.528025: sendbackup:                      since NODATE
1251752282.528072: sendbackup:                      options `'
1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0
1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs'
1251752282.543664: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd0d" in pipeline
1251752282.548532: sendbackup: Started backup
1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251752282.625055: sendbackup: Started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
1251752282.627853: sendbackup:  90:  normal(|):   DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009
1251752282.641040: sendbackup:  90:  normal(|):   DUMP: Date of last level 0 dump: the epoch
1251752282.754862: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0d (/usr) to standard output
1251752282.792706: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I) [regular files]
1251752286.777085: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II) [directories]
1251752286.779538: sendbackup:  90:  normal(|):   DUMP: estimated 653358 tape blocks.
1251752286.789190: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at: Mon Aug 31 14:58:06 2009
1251752286.827782: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III) [directories]
1251752290.646004: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV) [regular files]
1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource temporarily unavailable
1251752424.852658: sendbackup:  43:    size(|):   DUMP: 664333 tape blocks
1251752424.854957: sendbackup:  90:  normal(|):   DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009
1251752424.857083: sendbackup:  90:  normal(|):   DUMP: Volume 1 completed at: Mon Aug 31 15:00:24 2009
1251752424.859064: sendbackup:  90:  normal(|):   DUMP: Volume 1 took 0:02:18
1251752424.860994: sendbackup:  90:  normal(|):   DUMP: Volume 1 transfer rate: 4814 KB/s
1251752424.863042: sendbackup:  90:  normal(|):   DUMP: Date this dump completed:  Mon Aug 31 15:00:24 2009
1251752424.864993: sendbackup:  90:  normal(|):   DUMP: Average transfer rate: 4814 KB/s
1251752424.869138: sendbackup:  90:  normal(|):   DUMP: level 0 dump on Mon Aug 31 14:58:02 2009
1251752424.878026: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251752424.888324: sendbackup: Index pipe exited with status 1
1251752424.906986: sendbackup: Parsed backup messages
1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 15:00:24 2009

So now I've seen the same behavior that Stan noted: applying the debug patch seems to "fix" the dump.

Also, in case you didn't see it before, here is a snippet from an email thread Stan started on an OpenBSD list:
> >  I am trying to get Amanda of a recent vintage working on 4.5. The developrs
> >  don't understand how writing to a blocking pipe can return EAGAIN.
> >  Should the write just block?
>
> Yes, it should, and a quick, about-to-go-to-bed readthrough of the
> pipe write path shows that it shouldn't; is Amanda doing fcntls to
> the fd? If so, check those to be absolutely sure that they're not
> doing something hinkey.

Thanks,
Michael

On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau <martineau AT zmanda DOT com> wrote:
Do you have the patch I sent to stan on this system?

The patch check before and after the write if the pipe is in O_NONBLOCK or not and give an error if it is.

I'm totally lost since it is in blocking mode and you get EAGAIN, which is impossible!!!!

Jean-louis

Michael Burk wrote:
Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:31:42 2009
1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
1251693102.382940: sendbackup:                      disk `/'
1251693102.382991: sendbackup:                      device `/'
1251693102.383039: sendbackup:                      level 0
1251693102.383086: sendbackup:                      since NODATE
1251693102.383134: sendbackup:                      options `'
1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693102.398325: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd0a" in pipeline
1251693102.403120: sendbackup: Started backup
1251693102.461663: sendbackup:  90:  normal(|):   DUMP: Date of this level 0 dump: Sun Aug 30 22:31:42 2009
1251693102.468669: sendbackup:  90:  normal(|):   DUMP: Date of last level 0 dump: the epoch
1251693102.487536: sendbackup: Started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
1251693102.519939: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0a (/) to standard output
1251693102.588086: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I) [regular files]
1251693104.583392: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II) [directories]
1251693104.585824: sendbackup:  90:  normal(|):   DUMP: estimated 35134 tape blocks.
1251693104.595314: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at: Sun Aug 30 22:31:44 2009
1251693104.615444: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III) [directories]
1251693104.919691: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV) [regular files]
1251693105.800795: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable]
1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable]
1251693105.827339: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693105.832068: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is aborted.
1251693105.839675: sendbackup: critical (fatal): error [dump (7257) /sbin/dump returned 3]

Second:

1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:32:03 2009
1251693123.841613: sendbackup: Version 2.6.2alpha
1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:32:03 2009
1251693123.896360: sendbackup:   Parsed request as: program `DUMP'
1251693123.896437: sendbackup:                      disk `/'
1251693123.896487: sendbackup:                      device `/'
1251693123.896535: sendbackup:                      level 0
1251693123.896584: sendbackup:                      since NODATE
1251693123.896632: sendbackup:                      options `'
1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693123.911778: sendbackup: Spawning "/sbin/dump dump 0usf 1048576 - /dev/rsd0a" in pipeline
1251693123.916860: sendbackup: Started backup
1251693123.992261: sendbackup: Started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
1251693123.995743: sendbackup:  90:  normal(|):   DUMP: Date of this level 0 dump: Sun Aug 30 22:32:03 2009
1251693124.009013: sendbackup:  90:  normal(|):   DUMP: Date of last level 0 dump: the epoch
1251693124.038977: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0a (/) to standard output
1251693124.143454: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I) [regular files]
1251693126.173916: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II) [directories]
1251693126.176287: sendbackup:  90:  normal(|):   DUMP: estimated 35134 tape blocks.
1251693126.185677: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at: Sun Aug 30 22:32:06 2009
1251693126.205305: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III) [directories]
1251693126.505055: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV) [regular files]
1251693126.797271: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable]
1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable]
1251693126.821488: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693126.825377: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is aborted.
1251693126.833573: sendbackup: critical (fatal): error [dump (22791) /sbin/dump returned 3]

Thanks,
Michael

On Mon, Aug 31, 2009 at 12:49 PM, Jean-Louis Martineau <martineau AT zmanda DOT com <mailto:martineau AT zmanda DOT com>> wrote:

   You get the error in the index pipe instead of the data path. The
   backup is correct but your index is empty. That's why you get a
   STRANGE result instead of a failure.

   Can you post a sendbackup.*.debug for a dle that failed?

   Jean-Louis

   Michael Burk wrote:

       Hello,

       I applied the patches to the 20090827 snapshot. I tried it on
       two OpenBSD 4.5 sparc64 systems, forcing both to do full
       backups. One system seemed to work on all 3 file systems, the
       other failed on all 4 file systems with the same errors as
       before (exactly like what Stan reported). I'm using bsdtcp auth.

       The system that worked, however, did give a "strange" result
       in the report on one filesystem. Here's the corresponding
       "sendbackup" debug file:

       1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150
       version 2.6.2alpha: start at Sun Aug 30 2
       2:25:00 2009
       1251692700.530411: sendbackup: Version 2.6.2alpha
       1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150
       version 2.6.2alpha: rename at Sun Aug 30
       22:25:00 2009
       1251692700.555394: sendbackup:   Parsed request as: program `DUMP'
       1251692700.555435: sendbackup:                      disk `/usr'
       1251692700.555461: sendbackup:                      device `/usr'
       1251692700.555484: sendbackup:                      level 0
       1251692700.555507: sendbackup:                      since NODATE
       1251692700.555530: sendbackup:                      options `'
       1251692700.556750: sendbackup: start:
       zirconium.example.com:/usr lev 0
       1251692700.559495: sendbackup: dumping device '/dev/rsd1d'
       with 'ffs'
       1251692700.566094: sendbackup: Spawning "/sbin/dump dump 0usf
       1048576 - /dev/rsd1d" in pipeline
       1251692700.571185: sendbackup: Started backup
       1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK
       before dup2
       1251692700.597388: sendbackup: Started index creator:
       "/sbin/restore -tvf - 2>&1 | sed -e '
       s/^leaf[        ]*[0-9]*[       ]*\.//
       t
       /^dir[  ]/ {
       s/^dir[         ]*[0-9]*[       ]*\.//
       s%$%/%
       t
       }
       d
       '"
       1251692700.611526: sendbackup:  90:  normal(|):   DUMP: Date
       of this level 0 dump: Sun Aug 30 22:25:
       00 2009
       1251692700.669703: sendbackup:  90:  normal(|):   DUMP: Date
       of last level 0 dump: the epoch
       1251692700.670724: sendbackup:  90:  normal(|):   DUMP:
       Dumping /dev/rsd1d (/usr) to standard output
       1251692700.763600: sendbackup:  90:  normal(|):   DUMP:
       mapping (Pass I) [regular files]
       1251692705.618345: sendbackup:  90:  normal(|):   DUMP:
       mapping (Pass II) [directories]
       1251692705.620040: sendbackup:  90:  normal(|):   DUMP:
       estimated 1330085 tape blocks.
       1251692705.623976: sendbackup:  90:  normal(|):   DUMP: Volume
       1 started at: Sun Aug 30 22:25:05 2009
       1251692705.650687: sendbackup:  90:  normal(|):   DUMP:
       dumping (Pass III) [directories]
       1251692745.051547: sendbackup:  90:  normal(|):   DUMP:
       dumping (Pass IV) [regular files]
       1251692747.915767: sendbackup: 115: strange(?): sed: stdout:
       Resource temporarily unavailable
       1251692954.633083: sendbackup:  43:    size(|):   DUMP:
       1437829 tape blocks
       1251692954.637414: sendbackup:  90:  normal(|):   DUMP: Date
       of this level 0 dump: Sun Aug 30 22:25:00 2009
       1251692954.641414: sendbackup:  90:  normal(|):   DUMP: Volume
       1 completed at: Sun Aug 30 22:29:14 2009
       1251692954.641858: sendbackup: Index pipe exited with status 1
       1251692954.642719: sendbackup:  90:  normal(|):   DUMP: Volume
       1 took 0:04:09
       1251692954.643918: sendbackup:  90:  normal(|):   DUMP: Volume
       1 transfer rate: 5774 KB/s
       1251692954.645095: sendbackup:  90:  normal(|):   DUMP: Date
       this dump completed:  Sun Aug 30 22:29:14 2009
       1251692954.646239: sendbackup:  90:  normal(|):   DUMP:
       Average transfer rate: 5774 KB/s
       1251692954.647430: sendbackup:  90:  normal(|):   DUMP: level
       0 dump on Sun Aug 30 22:25:00 2009
       1251692954.653480: sendbackup:  90:  normal(|):   DUMP: DUMP
       IS DONE
       1251692954.653841: sendbackup: Parsed backup messages
       1251692954.654032: sendbackup: pid 30459 finish time Sun Aug
       30 22:29:14 2009

       -- Michael

       On Fri, Aug 28, 2009 at 5:22 AM, stan <stanb AT panix DOT com
       <mailto:stanb AT panix DOT com> <mailto:stanb AT panix DOT com

       <mailto:stanb AT panix DOT com>>> wrote:

          On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote:
          > Cool - can you send me the patch? Or is it already in the
       0826
          snapshot?
          >
          > FYI - I got 2.5.0p2 working, but could never get 2.5.1p3
          working, even with
          > the same config. No idea why.
          > I started looking at the source code last night in the 0825
          snapshot. I'll
          > discontinue that search if there's a patch.
          >
          This is interesting. I sent Jean-Louis Martineau an email,
       saying
          teaks for
          fixing things, and he replied that the patch he sent me
       just added
          debugging,
          it was not intended to fix anything.

          Bit, using this snapshot 20090813, and the patch, which I will
          send to you.
          I have it working on my 3 OpenBSD 4.5 machines that I have
          upgraded so far.
          I ran several test yesterday, and a full fledged backup run
       last
          night,
          and I have not seen any issues since installing this code.

          I am going to attach the patch to this email.

          I would greatly appreciate it, if you would apply this patch to
          the above
          snapshot, test, and tell me what your results are.

          I am very puzzled at the moment.

          BTW, is your failure the same as mine? That is amcheck
       passes, and
          most
          times you get a PARTIAL backup?

          BTW, I am using bsdtcp "auth" and client compression now, but I
          have made
          this work with this code, with bsd auth, and no
       compression, so I
          don't
          think that has any affect on this issue.

          Thanks for the help!

          --
          One of the main causes of the fall of the roman empire was
       that,
          lacking
          zero, they had no way to indicate successful termination of
       their C
          programs.






<Prev in Thread] Current Thread [Next in Thread>