You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2014/06/04 22:23:02 UTC
[jira] [Created] (MESOS-1455) Segfault in libprocess during Process
linking.
Benjamin Mahler created MESOS-1455:
--------------------------------------
Summary: Segfault in libprocess during Process linking.
Key: MESOS-1455
URL: https://issues.apache.org/jira/browse/MESOS-1455
Project: Mesos
Issue Type: Bug
Components: libprocess
Affects Versions: 0.19.0
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
Priority: Blocker
Fix For: 0.19.0
Here is a backtrace:
{noformat}
======= Backtrace: =========
/lib64/libc.so.6[0x7f916acc274f]
/lib64/libc.so.6(cfree+0x4b)[0x7f916acc6a4b]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process17receiving_connectEP7ev_loopP5ev_ioi+0xc5)[0x7f9146a64d55]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_invoke_pending+0x55)[0x7f9146b65105]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_run+0x937)[0x7f9146b680b7]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process5serveEPv+0xb)[0x7f9146a4c1cb]
/lib64/libpthread.so.0[0x7f916b3c283d]
/lib64/libc.so.6(clone+0x6d)[0x7f916ad2626d]
{noformat}
The bug was introduced as we added support for pure language bindings communicating with libprocess:
{code: title=see XXX comments}
@@ -1930,13 +1991,13 @@ void SocketManager::link(ProcessBase* process, const UPID& to)
persists[node] = s;
- // Allocate and initialize the decoder and watcher (we really
- // only "receive" on this socket so that we can react when it
- // gets closed and generate appropriate lost events).
- DataDecoder* decoder = new DataDecoder(sockets[s]);
-
+ // Allocate and initialize a watcher for reading data from this
+ // socket. Note that we don't expect to receive anything other
+ // than HTTP '202 Accepted' responses which we anyway ignore.
+ // We do, however, want to react when it gets closed so we can
+ // generate appropriate lost events (since this is a 'link').
ev_io* watcher = new ev_io();
- watcher->data = decoder;
+ watcher->data = new Socket(sockets[s]); // XXX receiving_connect expects watcher->data to be a Decoder* !!!
// Try and connect to the node using this socket.
sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = PF_INET;
addr.sin_port = htons(to.port);
addr.sin_addr.s_addr = to.ip;
if (connect(s, (sockaddr*) &addr, sizeof(addr)) < 0) {
if (errno != EINPROGRESS) {
PLOG(FATAL) << "Failed to link, connect";
}
// Wait for socket to be connected.
ev_io_init(watcher, receiving_connect, s, EV_WRITE); // XXX: watcher->data is a Socket*, not a Decoder*!
} else {
ev_io_init(watcher, ignore_data, s, EV_READ);
}
{code}
{code: title=receiving_connect expects Decoder*}
void receiving_connect(struct ev_loop* loop, ev_io* watcher, int revents)
{
int s = watcher->fd;
// Now check that a successful connection was made.
int opt;
socklen_t optlen = sizeof(opt);
if (getsockopt(s, SOL_SOCKET, SO_ERROR, &opt, &optlen) < 0 || opt != 0) {
// Connect failure.
VLOG(1) << "Socket error while connecting";
socket_manager->close(s);
DataDecoder* decoder = (DataDecoder*) watcher->data; // XXX A Socket* in the case above !!
delete decoder;
ev_io_stop(loop, watcher);
delete watcher;
} else {
// We're connected! Now let's do some receiving.
ev_io_stop(loop, watcher);
ev_io_init(watcher, ignore_data, s, EV_READ);
ev_io_start(loop, watcher);
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)