You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2014/06/04 22:29:02 UTC
[jira] [Commented] (MESOS-1455) Segfault in libprocess during
Process linking.
[ https://issues.apache.org/jira/browse/MESOS-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018111#comment-14018111 ]
Benjamin Mahler commented on MESOS-1455:
----------------------------------------
https://reviews.apache.org/r/22247/
> Segfault in libprocess during Process linking.
> ----------------------------------------------
>
> Key: MESOS-1455
> URL: https://issues.apache.org/jira/browse/MESOS-1455
> Project: Mesos
> Issue Type: Bug
> Components: libprocess
> Affects Versions: 0.19.0
> Reporter: Benjamin Mahler
> Assignee: Benjamin Mahler
> Priority: Blocker
> Fix For: 0.19.0
>
>
> Here is a backtrace:
> {noformat}
> ======= Backtrace: =========
> /lib64/libc.so.6[0x7f916acc274f]
> /lib64/libc.so.6(cfree+0x4b)[0x7f916acc6a4b]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process17receiving_connectEP7ev_loopP5ev_ioi+0xc5)[0x7f9146a64d55]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_invoke_pending+0x55)[0x7f9146b65105]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_run+0x937)[0x7f9146b680b7]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process5serveEPv+0xb)[0x7f9146a4c1cb]
> /lib64/libpthread.so.0[0x7f916b3c283d]
> /lib64/libc.so.6(clone+0x6d)[0x7f916ad2626d]
> {noformat}
> The bug was introduced as we added support for pure language bindings communicating with libprocess:
> {code: title=see XXX comments}
> @@ -1930,13 +1991,13 @@ void SocketManager::link(ProcessBase* process, const UPID& to)
> persists[node] = s;
> - // Allocate and initialize the decoder and watcher (we really
> - // only "receive" on this socket so that we can react when it
> - // gets closed and generate appropriate lost events).
> - DataDecoder* decoder = new DataDecoder(sockets[s]);
> -
> + // Allocate and initialize a watcher for reading data from this
> + // socket. Note that we don't expect to receive anything other
> + // than HTTP '202 Accepted' responses which we anyway ignore.
> + // We do, however, want to react when it gets closed so we can
> + // generate appropriate lost events (since this is a 'link').
> ev_io* watcher = new ev_io();
> - watcher->data = decoder;
> + watcher->data = new Socket(sockets[s]); // XXX receiving_connect expects watcher->data to be a Decoder* !!!
> // Try and connect to the node using this socket.
> sockaddr_in addr;
> memset(&addr, 0, sizeof(addr));
> addr.sin_family = PF_INET;
> addr.sin_port = htons(to.port);
> addr.sin_addr.s_addr = to.ip;
> if (connect(s, (sockaddr*) &addr, sizeof(addr)) < 0) {
> if (errno != EINPROGRESS) {
> PLOG(FATAL) << "Failed to link, connect";
> }
> // Wait for socket to be connected.
> ev_io_init(watcher, receiving_connect, s, EV_WRITE); // XXX: watcher->data is a Socket*, not a Decoder*!
> } else {
> ev_io_init(watcher, ignore_data, s, EV_READ);
> }
> {code}
> {code: title=receiving_connect expects Decoder*}
> void receiving_connect(struct ev_loop* loop, ev_io* watcher, int revents)
> {
> int s = watcher->fd;
> // Now check that a successful connection was made.
> int opt;
> socklen_t optlen = sizeof(opt);
> if (getsockopt(s, SOL_SOCKET, SO_ERROR, &opt, &optlen) < 0 || opt != 0) {
> // Connect failure.
> VLOG(1) << "Socket error while connecting";
> socket_manager->close(s);
> DataDecoder* decoder = (DataDecoder*) watcher->data; // XXX A Socket* in the case above !!
> delete decoder;
> ev_io_stop(loop, watcher);
> delete watcher;
> } else {
> // We're connected! Now let's do some receiving.
> ev_io_stop(loop, watcher);
> ev_io_init(watcher, ignore_data, s, EV_READ);
> ev_io_start(loop, watcher);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)