Amanda-Users

Error redirecting stderr to fd 52

2009-08-24 10:46:24
Subject: Error redirecting stderr to fd 52
From: John Hein <jhein AT timing DOT com>
To: amanda-users AT amanda DOT org
Date: Mon, 24 Aug 2009 07:05:48 -0600
On a 2.6.1b1 client ...

1251090802.506210: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: 
start at Sun Aug 23 23:13:22 2009
1251090802.506278: sendbackup: Version 2.6.1b1
1251090802.511032: sendbackup: pid 61161 ruid 5001 euid 5001 version 2.6.1b1: 
rename at Sun Aug 23 23:13:22 2009
1251090802.511055: sendbackup:   sendbackup req: <GNUTAR /data  1 
2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;>
1251090802.511102: sendbackup:   Parsed request as: program `GNUTAR'
1251090802.511109: sendbackup:                      disk `/data'
1251090802.511113: sendbackup:                      device `/data'
1251090802.511118: sendbackup:                      level 1
1251090802.511123: sendbackup:                      since 2009:8:21:6:55:50
1251090802.511128: sendbackup:                      options 
`|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;'
1251090802.511203: sendbackup: Error redirecting stderr to fd 52: Bad file 
descriptor
1251090802.511215: sendbackup: pid 61161 finish time Sun Aug 23 23:13:22 2009


Has anyone ever seen that?

This is on a client with about a dozen DLEs with possibly 3
dumping in parallel at a time.  The DLE in question is
not small - certainly not so small to complete in 5 ms.

This DLE sometimes works.  Sometimes a different one fails the same
way.

It looks like the mesgfd in client-src/sendbackup.c is getting
closed before dup2(2) runs.  Perhaps a race.

amandad log (below) shows no obvious trouble other than it is continuing to do
work after the child sendbackup has failed (as seen in the log output above).

The two security_stream_close messages seem to be different than the
log messages associated with DLEs that worked.  The working ones have
three security_stream_close messages.  But, oddly, the DLE that worked
(and was small) right _before_ the failed DLE did _not_ have any
security_stream_close messages.  Possibly a clue.  In fact, it
seems all the failures are happening right after a small (< 10 MB),
and thus quick, dump.  Could just be a coincidence.

1251090802.228997: amandad: dgram_recv(dgram=0x280c2a04, timeout=0, 
fromaddr=0x280d29f0)
1251090802.229037: amandad: (sockaddr_in *)0x280d29f0 = { 2, 703, 
206.168.13.161 }
1251090802.229055: amandad: security_handleinit(handle=0x8052600, 
driver=0x280bc520 (BSD))
1251090802.235787: amandad: accept recv REQ pkt:
<<<<<
SERVICE sendbackup
OPTIONS features=ffffffff9ffeffffffff00;hostname=bunny;config=test;
GNUTAR /data  1 2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;
>>>>>
1251090802.237043: amandad: creating new service: sendbackup
OPTIONS features=ffffffff9ffeffffffff00;hostname=bunny;config=test;
GNUTAR /data  1 2009:8:21:6:55:50 OPTIONS 
|;auth=bsd;compress-fast;index;exclude-list=/site/etc/amanda/exclude-gtar;

1251090802.237710: amandad: sending ACK pkt:
<<<<<
>>>>>
1251090802.237764: amandad: dgram_send_addr(addr=0x8052620, dgram=0x280c2a04)
1251090802.237772: amandad: (sockaddr_in *)0x8052620 = { 2, 703, 206.168.13.161 
}
1251090802.237779: amandad: dgram_send_addr: 0x280c2a04->socket = 0
1251090802.511364: amandad: security_streaminit(stream=0x81dd000, 
driver=0x280bc520 (BSD))
1251090802.511719: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.511736: amandad: try_socksize: send buffer size is 65536
1251090802.511743: amandad: try_socksize: receive buffer size is 65536
1251090802.512604: amandad: bind_portrange2: Try  port 6108: Available - Success
1251090802.512617: amandad: stream_server: waiting for connection: 0.0.0.0.6108
1251090802.512643: amandad: security_streaminit(stream=0x81e6000, 
driver=0x280bc520 (BSD))
1251090802.512658: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.512669: amandad: try_socksize: send buffer size is 65536
1251090802.512677: amandad: try_socksize: receive buffer size is 65536
1251090802.513496: amandad: bind_portrange2: Try  port 6108: Available - 
Address already in use
1251090802.514300: amandad: bind_portrange2: Try  port 6109: Available - Success
1251090802.514311: amandad: stream_server: waiting for connection: 0.0.0.0.6109
1251090802.514319: amandad: security_streaminit(stream=0x81ef000, 
driver=0x280bc520 (BSD))
1251090802.514333: amandad: stream_server opening socket with family 2 
(requested family was 2)
1251090802.514344: amandad: try_socksize: send buffer size is 65536
1251090802.514351: amandad: try_socksize: receive buffer size is 65536
1251090802.515189: amandad: bind_portrange2: Try  port 6108: Available - 
Address already in use
1251090802.515991: amandad: bind_portrange2: Try  port 6109: Available - 
Address already in use
1251090802.516772: amandad: bind_portrange2: Skip port 6110: Owned by softcm.
1251090802.517540: amandad: bind_portrange2: Skip port 6111: Owned by spc.
1251090802.518336: amandad: bind_portrange2: Try  port 6112: Available - Success
1251090802.518348: amandad: stream_server: waiting for connection: 0.0.0.0.6112
1251090802.518355: amandad: sending REP pkt:
<<<<<
CONNECT DATA 6108 MESG 6109 INDEX 6112
OPTIONS features=ffffffff9ffeffffffff7f;
>>>>>
1251090802.518364: amandad: dgram_send_addr(addr=0x8052620, dgram=0x280c2a04)
1251090802.518370: amandad: (sockaddr_in *)0x8052620 = { 2, 703, 206.168.13.161 
}
1251090802.518376: amandad: dgram_send_addr: 0x280c2a04->socket = 0
1251090802.518672: amandad: dgram_recv(dgram=0x280c2a04, timeout=0, 
fromaddr=0x280d29f0)
1251090802.518689: amandad: (sockaddr_in *)0x280d29f0 = { 2, 703, 
206.168.13.161 }
1251090802.518740: amandad: received ACK pkt:
<<<<<
>>>>>
1251090802.760096: amandad: stream_accept: connection from 206.168.13.161.11002
1251090802.760106: amandad: try_socksize: send buffer size is 65536
1251090802.760113: amandad: try_socksize: receive buffer size is 65536
1251090802.772110: amandad: stream_accept: connection from 206.168.13.161.11002
1251090802.772119: amandad: try_socksize: send buffer size is 65536
1251090802.772126: amandad: try_socksize: receive buffer size is 65536
1251090802.783837: amandad: stream_accept: connection from 206.168.13.161.11002
1251090802.783846: amandad: try_socksize: send buffer size is 65536
1251090802.783854: amandad: try_socksize: receive buffer size is 65536
1251090802.783864: amandad: security_close(handle=0x8052600, driver=0x280bc520 
(BSD))
1251090802.783885: amandad: security_stream_close(0x81e6000)
1251090802.783905: amandad: security_stream_close(0x81ef000)


<Prev in Thread] Current Thread [Next in Thread>