amanda doesn't do fcntl on these file descriptor.
But my patch do: fcntl(fd, F_GETFL, 0) to check if it have O_NONBLOCK
set, the patch doesn't change it.
Can you try the attached patch, it do the same trick for the index file
descriptor (remove previous patch before applying).
Jean-Louis
Michael Burk wrote:
Hi Jean-Louis,
I thought I had applied the patches on this machine also, but it turns
out I didn't (sorry about that). I applied the patches and ran a new
dump. This time all 4 file systems succeeded, though /usr got the same
"strange" message as on the other machine:
1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: start at Mon Aug 31 14:58:02 2009
1251752282.484788: sendbackup: Version 2.6.2alpha
1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: rename at Mon Aug 31 14:58:02 2009
1251752282.527795: sendbackup: Parsed request as: program `DUMP'
1251752282.527880: sendbackup: disk `/usr'
1251752282.527931: sendbackup: device `/usr'
1251752282.527978: sendbackup: level 0
1251752282.528025: sendbackup: since NODATE
1251752282.528072: sendbackup: options `'
1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0
1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs'
1251752282.543664: sendbackup: Spawning "/sbin/dump dump 0usf 1048576
- /dev/rsd0d" in pipeline
1251752282.548532: sendbackup: Started backup
1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251752282.625055: sendbackup: Started index creator: "/sbin/restore
-tvf - 2>&1 | sed -e '
s/^leaf[ ]*[0-9]*[ ]*\.//
t
/^dir[ ]/ {
s/^dir[ ]*[0-9]*[ ]*\.//
s%$%/%
t
}
d
'"
1251752282.627853: sendbackup: 90: normal(|): DUMP: Date of this
level 0 dump: Mon Aug 31 14:58:02 2009
1251752282.641040: sendbackup: 90: normal(|): DUMP: Date of last
level 0 dump: the epoch
1251752282.754862: sendbackup: 90: normal(|): DUMP: Dumping
/dev/rsd0d (/usr) to standard output
1251752282.792706: sendbackup: 90: normal(|): DUMP: mapping (Pass
I) [regular files]
1251752286.777085: sendbackup: 90: normal(|): DUMP: mapping (Pass
II) [directories]
1251752286.779538: sendbackup: 90: normal(|): DUMP: estimated
653358 tape blocks.
1251752286.789190: sendbackup: 90: normal(|): DUMP: Volume 1
started at: Mon Aug 31 14:58:06 2009
1251752286.827782: sendbackup: 90: normal(|): DUMP: dumping (Pass
III) [directories]
1251752290.646004: sendbackup: 90: normal(|): DUMP: dumping (Pass
IV) [regular files]
1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource
temporarily unavailable
1251752424.852658: sendbackup: 43: size(|): DUMP: 664333 tape blocks
1251752424.854957: sendbackup: 90: normal(|): DUMP: Date of this
level 0 dump: Mon Aug 31 14:58:02 2009
1251752424.857083: sendbackup: 90: normal(|): DUMP: Volume 1
completed at: Mon Aug 31 15:00:24 2009
1251752424.859064: sendbackup: 90: normal(|): DUMP: Volume 1 took
0:02:18
1251752424.860994: sendbackup: 90: normal(|): DUMP: Volume 1
transfer rate: 4814 KB/s
1251752424.863042: sendbackup: 90: normal(|): DUMP: Date this dump
completed: Mon Aug 31 15:00:24 2009
1251752424.864993: sendbackup: 90: normal(|): DUMP: Average
transfer rate: 4814 KB/s
1251752424.869138: sendbackup: 90: normal(|): DUMP: level 0 dump
on Mon Aug 31 14:58:02 2009
1251752424.878026: sendbackup: 90: normal(|): DUMP: DUMP IS DONE
1251752424.888324: sendbackup: Index pipe exited with status 1
1251752424.906986: sendbackup: Parsed backup messages
1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31
15:00:24 2009
So now I've seen the same behavior that Stan noted: applying the debug
patch seems to "fix" the dump.
Also, in case you didn't see it before, here is a snippet from an
email thread Stan started on an OpenBSD list:
> > I am trying to get Amanda of a recent vintage working on 4.5. The
developrs
> > don't understand how writing to a blocking pipe can return EAGAIN.
> > Should the write just block?
>
> Yes, it should, and a quick, about-to-go-to-bed readthrough of the
> pipe write path shows that it shouldn't; is Amanda doing fcntls to
> the fd? If so, check those to be absolutely sure that they're not
> doing something hinkey.
Thanks,
Michael
On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau
<martineau AT zmanda DOT com <mailto:martineau AT zmanda DOT com>> wrote:
Do you have the patch I sent to stan on this system?
The patch check before and after the write if the pipe is in
O_NONBLOCK or not and give an error if it is.
I'm totally lost since it is in blocking mode and you get EAGAIN,
which is impossible!!!!
Jean-louis
Michael Burk wrote:
Here are the two sendbackup.*.debug files for the / fs.
First:
1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150
version 2.6.2alpha: start at Sun Aug 30 22:31:42 2009
1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150
version 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
1251693102.382860: sendbackup: Parsed request as: program `DUMP'
1251693102.382940: sendbackup: disk `/'
1251693102.382991: sendbackup: device `/'
1251693102.383039: sendbackup: level 0
1251693102.383086: sendbackup: since NODATE
1251693102.383134: sendbackup: options `'
1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
1251693102.391475: sendbackup: dumping device '/dev/rsd0a'
with 'ffs'
1251693102.398325: sendbackup: Spawning "/sbin/dump dump 0usf
1048576 - /dev/rsd0a" in pipeline
1251693102.403120: sendbackup: Started backup
1251693102.461663: sendbackup: 90: normal(|): DUMP: Date
of this level 0 dump: Sun Aug 30 22:31:42 2009
1251693102.468669: sendbackup: 90: normal(|): DUMP: Date
of last level 0 dump: the epoch
1251693102.487536: sendbackup: Started index creator:
"/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[ ]*[0-9]*[ ]*\.//
t
/^dir[ ]/ {
s/^dir[ ]*[0-9]*[ ]*\.//
s%$%/%
t
}
d
'"
1251693102.519939: sendbackup: 90: normal(|): DUMP:
Dumping /dev/rsd0a (/) to standard output
1251693102.588086: sendbackup: 90: normal(|): DUMP:
mapping (Pass I) [regular files]
1251693104.583392: sendbackup: 90: normal(|): DUMP:
mapping (Pass II) [directories]
1251693104.585824: sendbackup: 90: normal(|): DUMP:
estimated 35134 tape blocks.
1251693104.595314: sendbackup: 90: normal(|): DUMP: Volume
1 started at: Sun Aug 30 22:31:44 2009
1251693104.615444: sendbackup: 90: normal(|): DUMP:
dumping (Pass III) [directories]
1251693104.919691: sendbackup: 90: normal(|): DUMP:
dumping (Pass IV) [regular files]
1251693105.800795: sendbackup: critical (fatal): index tee
cannot write [Resource temporarily unavailable]
1251693105.803486: sendbackup: 115: strange(?): sendbackup:
index tee cannot write [Resource temporarily unavailable]
1251693105.827339: sendbackup: 90: normal(|): DUMP: Broken
pipe
1251693105.832068: sendbackup: 90: normal(|): DUMP: The
ENTIRE dump is aborted.
1251693105.839675: sendbackup: critical (fatal): error [dump
(7257) /sbin/dump returned 3]
Second:
1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150
version 2.6.2alpha: start at Sun Aug 30 22:32:03 2009
1251693123.841613: sendbackup: Version 2.6.2alpha
1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150
version 2.6.2alpha: rename at Sun Aug 30 22:32:03 2009
1251693123.896360: sendbackup: Parsed request as: program `DUMP'
1251693123.896437: sendbackup: disk `/'
1251693123.896487: sendbackup: device `/'
1251693123.896535: sendbackup: level 0
1251693123.896584: sendbackup: since NODATE
1251693123.896632: sendbackup: options `'
1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
1251693123.905012: sendbackup: dumping device '/dev/rsd0a'
with 'ffs'
1251693123.911778: sendbackup: Spawning "/sbin/dump dump 0usf
1048576 - /dev/rsd0a" in pipeline
1251693123.916860: sendbackup: Started backup
1251693123.992261: sendbackup: Started index creator:
"/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[ ]*[0-9]*[ ]*\.//
t
/^dir[ ]/ {
s/^dir[ ]*[0-9]*[ ]*\.//
s%$%/%
t
}
d
'"
1251693123.995743: sendbackup: 90: normal(|): DUMP: Date
of this level 0 dump: Sun Aug 30 22:32:03 2009
1251693124.009013: sendbackup: 90: normal(|): DUMP: Date
of last level 0 dump: the epoch
1251693124.038977: sendbackup: 90: normal(|): DUMP:
Dumping /dev/rsd0a (/) to standard output
1251693124.143454: sendbackup: 90: normal(|): DUMP:
mapping (Pass I) [regular files]
1251693126.173916: sendbackup: 90: normal(|): DUMP:
mapping (Pass II) [directories]
1251693126.176287: sendbackup: 90: normal(|): DUMP:
estimated 35134 tape blocks.
1251693126.185677: sendbackup: 90: normal(|): DUMP: Volume
1 started at: Sun Aug 30 22:32:06 2009
1251693126.205305: sendbackup: 90: normal(|): DUMP:
dumping (Pass III) [directories]
1251693126.505055: sendbackup: 90: normal(|): DUMP:
dumping (Pass IV) [regular files]
1251693126.797271: sendbackup: critical (fatal): index tee
cannot write [Resource temporarily unavailable]
1251693126.799929: sendbackup: 115: strange(?): sendbackup:
index tee cannot write [Resource temporarily unavailable]
1251693126.821488: sendbackup: 90: normal(|): DUMP: Broken
pipe
1251693126.825377: sendbackup: 90: normal(|): DUMP: The
ENTIRE dump is aborted.
1251693126.833573: sendbackup: critical (fatal): error [dump
(22791) /sbin/dump returned 3]
Thanks,
Michael
On Mon, Aug 31, 2009 at 12:49 PM, Jean-Louis Martineau
<martineau AT zmanda DOT com <mailto:martineau AT zmanda DOT com>
<mailto:martineau AT zmanda DOT com <mailto:martineau AT zmanda DOT
com>>>
wrote:
You get the error in the index pipe instead of the data
path. The
backup is correct but your index is empty. That's why you get a
STRANGE result instead of a failure.
Can you post a sendbackup.*.debug for a dle that failed?
Jean-Louis
Michael Burk wrote:
Hello,
I applied the patches to the 20090827 snapshot. I tried
it on
two OpenBSD 4.5 sparc64 systems, forcing both to do full
backups. One system seemed to work on all 3 file
systems, the
other failed on all 4 file systems with the same errors as
before (exactly like what Stan reported). I'm using
bsdtcp auth.
The system that worked, however, did give a "strange"
result
in the report on one filesystem. Here's the corresponding
"sendbackup" debug file:
1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150
version 2.6.2alpha: start at Sun Aug 30 2
2:25:00 2009
1251692700.530411: sendbackup: Version 2.6.2alpha
1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150
version 2.6.2alpha: rename at Sun Aug 30
22:25:00 2009
1251692700.555394: sendbackup: Parsed request as:
program `DUMP'
1251692700.555435: sendbackup:
disk `/usr'
1251692700.555461: sendbackup:
device `/usr'
1251692700.555484: sendbackup: level 0
1251692700.555507: sendbackup:
since NODATE
1251692700.555530: sendbackup:
options `'
1251692700.556750: sendbackup: start:
zirconium.example.com:/usr lev 0
1251692700.559495: sendbackup: dumping device '/dev/rsd1d'
with 'ffs'
1251692700.566094: sendbackup: Spawning "/sbin/dump
dump 0usf
1048576 - /dev/rsd1d" in pipeline
1251692700.571185: sendbackup: Started backup
1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK
before dup2
1251692700.597388: sendbackup: Started index creator:
"/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[ ]*[0-9]*[ ]*\.//
t
/^dir[ ]/ {
s/^dir[ ]*[0-9]*[ ]*\.//
s%$%/%
t
}
d
'"
1251692700.611526: sendbackup: 90: normal(|): DUMP:
Date
of this level 0 dump: Sun Aug 30 22:25:
00 2009
1251692700.669703: sendbackup: 90: normal(|): DUMP:
Date
of last level 0 dump: the epoch
1251692700.670724: sendbackup: 90: normal(|): DUMP:
Dumping /dev/rsd1d (/usr) to standard output
1251692700.763600: sendbackup: 90: normal(|): DUMP:
mapping (Pass I) [regular files]
1251692705.618345: sendbackup: 90: normal(|): DUMP:
mapping (Pass II) [directories]
1251692705.620040: sendbackup: 90: normal(|): DUMP:
estimated 1330085 tape blocks.
1251692705.623976: sendbackup: 90: normal(|): DUMP:
Volume
1 started at: Sun Aug 30 22:25:05 2009
1251692705.650687: sendbackup: 90: normal(|): DUMP:
dumping (Pass III) [directories]
1251692745.051547: sendbackup: 90: normal(|): DUMP:
dumping (Pass IV) [regular files]
1251692747.915767: sendbackup: 115: strange(?): sed:
stdout:
Resource temporarily unavailable
1251692954.633083: sendbackup: 43: size(|): DUMP:
1437829 tape blocks
1251692954.637414: sendbackup: 90: normal(|): DUMP:
Date
of this level 0 dump: Sun Aug 30 22:25:00 2009
1251692954.641414: sendbackup: 90: normal(|): DUMP:
Volume
1 completed at: Sun Aug 30 22:29:14 2009
1251692954.641858: sendbackup: Index pipe exited with
status 1
1251692954.642719: sendbackup: 90: normal(|): DUMP:
Volume
1 took 0:04:09
1251692954.643918: sendbackup: 90: normal(|): DUMP:
Volume
1 transfer rate: 5774 KB/s
1251692954.645095: sendbackup: 90: normal(|): DUMP:
Date
this dump completed: Sun Aug 30 22:29:14 2009
1251692954.646239: sendbackup: 90: normal(|): DUMP:
Average transfer rate: 5774 KB/s
1251692954.647430: sendbackup: 90: normal(|): DUMP:
level
0 dump on Sun Aug 30 22:25:00 2009
1251692954.653480: sendbackup: 90: normal(|): DUMP:
DUMP
IS DONE
1251692954.653841: sendbackup: Parsed backup messages
1251692954.654032: sendbackup: pid 30459 finish time
Sun Aug
30 22:29:14 2009
-- Michael
On Fri, Aug 28, 2009 at 5:22 AM, stan <stanb AT panix DOT com
<mailto:stanb AT panix DOT com>
<mailto:stanb AT panix DOT com <mailto:stanb AT panix DOT com>>
<mailto:stanb AT panix DOT com <mailto:stanb AT panix DOT com>
<mailto:stanb AT panix DOT com <mailto:stanb AT panix DOT
com>>>> wrote:
On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael
Burk wrote:
> Cool - can you send me the patch? Or is it already
in the
0826
snapshot?
>
> FYI - I got 2.5.0p2 working, but could never get
2.5.1p3
working, even with
> the same config. No idea why.
> I started looking at the source code last night in
the 0825
snapshot. I'll
> discontinue that search if there's a patch.
>
This is interesting. I sent Jean-Louis Martineau an
email,
saying
teaks for
fixing things, and he replied that the patch he sent me
just added
debugging,
it was not intended to fix anything.
Bit, using this snapshot 20090813, and the patch,
which I will
send to you.
I have it working on my 3 OpenBSD 4.5 machines that
I have
upgraded so far.
I ran several test yesterday, and a full fledged
backup run
last
night,
and I have not seen any issues since installing this
code.
I am going to attach the patch to this email.
I would greatly appreciate it, if you would apply
this patch to
the above
snapshot, test, and tell me what your results are.
I am very puzzled at the moment.
BTW, is your failure the same as mine? That is amcheck
passes, and
most
times you get a PARTIAL backup?
BTW, I am using bsdtcp "auth" and client compression
now, but I
have made
this work with this code, with bsd auth, and no
compression, so I
don't
think that has any affect on this issue.
Thanks for the help!
--
One of the main causes of the fall of the roman
empire was
that,
lacking
zero, they had no way to indicate successful
termination of
their C
programs.
Index: client-src/sendbackup.c
===================================================================
--- client-src/sendbackup.c (revision 2126)
+++ client-src/sendbackup.c (working copy)
@@ -122,6 +122,7 @@
char *err_extra = NULL;
char *s;
int i;
+ int r;
int ch;
GSList *errlist;
FILE *mesgstream;
@@ -406,8 +407,21 @@
mesgfd = DATA_FD_OFFSET + 2;
indexfd = DATA_FD_OFFSET + 4;
}
- if (!dle->create_index)
- indexfd = -1;
+ r = fcntl(datafd, F_GETFL, 0);
+ if (r == -1)
+ dbprintf("fd datafd (%d) error: %s\n", datafd, strerror(errno));
+ else if (r & O_NONBLOCK)
+ dbprintf(_("fd datafd(%d) is set with O_NONBLOCK before dup2\n"),
datafd);
+ r = fcntl(mesgfd, F_GETFL, 0);
+ if (r == -1)
+ dbprintf("fd mesgfd (%d) error: %s\n", datafd, strerror(errno));
+ else if (r & O_NONBLOCK)
+ dbprintf(_("fd mesgfd(%d) is set with O_NONBLOCK before dup2\n"),
datafd);
+ r = fcntl(indexfd, F_GETFL, 0);
+ if (r == -1)
+ dbprintf("fd indexfd (%d) error: %s\n", datafd, strerror(errno));
+ else if (r & O_NONBLOCK)
+ dbprintf(_("fd indexfd(%d) is set with O_NONBLOCK before dup2\n"),
datafd);
if (dle->auth && amandad_auth) {
if(strcasecmp(dle->auth, amandad_auth) != 0) {
@@ -417,6 +431,9 @@
}
}
+ if (!dle->create_index)
+ indexfd = -1;
+
if (dle->kencrypt) {
g_printf("KENCRYPT\n");
}
@@ -1219,6 +1236,7 @@
int pipefd[2];
FILE *pipe_fp;
int exitcode;
+ int r;
if (!createindex)
return;
@@ -1252,9 +1270,44 @@
save_fd(&mesg, 4);
save_fd(&input, 4);
dup2(pipefd[0], 0);
+ r = fcntl(1, F_GETFL, 0);
+ if (r == -1)
+ dbprintf("fd 1 before dup2 error: %s\n", strerror(errno));
+ else if (r & O_NONBLOCK)
+ dbprintf(_("fd 1 is set with O_NONBLOCK before dup2\n"));
dup2(index, 1);
+ r = fcntl(2, F_GETFL, 0);
+ if (r == -1)
+ dbprintf("fd 2 before dup2 error: %s\n", strerror(errno));
+ else if (r & O_NONBLOCK)
+ dbprintf(_("fd 2 is set with O_NONBLOCK before dup2\n"));
dup2(mesg, 2);
+ r = fcntl(3, F_GETFL, 0);
+ if (r == -1)
+ dbprintf("fd 3 before dup2 error: %s\n", strerror(errno));
+ else if (r & O_NONBLOCK)
+ dbprintf(_("fd 3 is set with O_NONBLOCK before dup2\n"));
+ r = fcntl(input, F_GETFL, 0);
+ if (r == -1)
+ error("fd input (%d) error: %s", input, strerror(errno));
+ else if (r & O_NONBLOCK)
+ error(_("fd input (%d) is set with O_NONBLOCK before dup2"), input);
dup2(input, 3);
+ r = fcntl(1, F_GETFL, 0);
+ if (r == -1)
+ error("fd 1 after dup2 error: %s", strerror(errno));
+ else if (r & O_NONBLOCK)
+ error(_("fd 1 is set with O_NONBLOCK after dup2"));
+ r = fcntl(2, F_GETFL, 0);
+ if (r == -1)
+ error("fd 2 after dup2 error: %s", strerror(errno));
+ else if (r & O_NONBLOCK)
+ error(_("fd 2 is set with O_NONBLOCK after dup2"));
+ r = fcntl(3, F_GETFL, 0);
+ if (r == -1)
+ error("fd 3 after dup2 error: %s", strerror(errno));
+ else if (r & O_NONBLOCK)
+ error(_("fd 3 is set with O_NONBLOCK after dup2"));
for(index = 4; index < (int)FD_SETSIZE; index++) {
if (index != dbfd()) {
close(index);
@@ -1306,7 +1359,16 @@
occurs */
ptr = buffer;
bytes_written = 0;
+
+ r = fcntl(3, F_GETFL, 0);
+ if (r & O_NONBLOCK) {
+ error(_("fd 3 is set with O_NONBLOCK before the write"));
+ }
just_written = full_write(3, ptr, bytes_read);
+ r = fcntl(3, F_GETFL, 0);
+ if (r & O_NONBLOCK) {
+ error(_("fd 3 is set with O_NONBLOCK after the write"));
+ }
if (just_written < (size_t)bytes_read) {
error(_("index tee cannot write [%s]"), strerror(errno));
/*NOTREACHED*/
Index: client-src/sendbackup-dump.c
===================================================================
--- client-src/sendbackup-dump.c (revision 2126)
+++ client-src/sendbackup-dump.c (working copy)
@@ -144,6 +144,7 @@
char *encryptopt = skip_argument;
char *qdisk;
char *config;
+ int r;
g_snprintf(level_str, SIZEOF(level_str), "%d",
GPOINTER_TO_INT(dle->level->data));
@@ -154,6 +155,11 @@
get_pname(), host, qdisk, GPOINTER_TO_INT(dle->level->data));
amfree(qdisk);
+ r = fcntl(dataf, F_GETFL, 0);
+ if (r == -1)
+ error("fd dataf (%d) error: %s", dataf, strerror(errno));
+ else if (r & O_NONBLOCK)
+ error(_("fd dataf (%d) is set with O_NONBLOCK before dup2"), dataf);
/* apply client-side encryption here */
if (dle->encrypt == ENCRYPT_CUST ) {
encpid = pipespawn(dle->clnt_encrypt, STDIN_PIPE, 0,
@@ -250,6 +256,11 @@
NULL);
info_tapeheader(dle);
+ r = fcntl(dumpout, F_GETFL, 0);
+ if (r == -1)
+ error("fd dumpout (%d) error: %s", dumpout, strerror(errno));
+ else if (r & O_NONBLOCK)
+ error(_("fd dumpout (%d) is set with O_NONBLOCK before dup2"),
dataf);
start_index(dle->create_index, dumpout, mesgf, indexf, indexcmd);
dumpkeys = stralloc(level_str);
|