You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2014/06/04 22:29:02 UTC

[jira] [Commented] (MESOS-1455) Segfault in libprocess during Process linking.

    [ https://issues.apache.org/jira/browse/MESOS-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018111#comment-14018111 ] 

Benjamin Mahler commented on MESOS-1455:
----------------------------------------

https://reviews.apache.org/r/22247/

> Segfault in libprocess during Process linking.
> ----------------------------------------------
>
>                 Key: MESOS-1455
>                 URL: https://issues.apache.org/jira/browse/MESOS-1455
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 0.19.0
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Here is a backtrace:
> {noformat}
> ======= Backtrace: =========
> /lib64/libc.so.6[0x7f916acc274f]
> /lib64/libc.so.6(cfree+0x4b)[0x7f916acc6a4b]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process17receiving_connectEP7ev_loopP5ev_ioi+0xc5)[0x7f9146a64d55]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_invoke_pending+0x55)[0x7f9146b65105]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_run+0x937)[0x7f9146b680b7]
> /usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process5serveEPv+0xb)[0x7f9146a4c1cb]
> /lib64/libpthread.so.0[0x7f916b3c283d]
> /lib64/libc.so.6(clone+0x6d)[0x7f916ad2626d]
> {noformat}
> The bug was introduced as we added support for pure language bindings communicating with libprocess:
> {code: title=see XXX comments}
> @@ -1930,13 +1991,13 @@ void SocketManager::link(ProcessBase* process, const UPID& to)
>        persists[node] = s;
> -      // Allocate and initialize the decoder and watcher (we really
> -      // only "receive" on this socket so that we can react when it
> -      // gets closed and generate appropriate lost events).
> -      DataDecoder* decoder = new DataDecoder(sockets[s]);
> -
> +      // Allocate and initialize a watcher for reading data from this
> +      // socket. Note that we don't expect to receive anything other
> +      // than HTTP '202 Accepted' responses which we anyway ignore.
> +      // We do, however, want to react when it gets closed so we can
> +      // generate appropriate lost events (since this is a 'link').
>        ev_io* watcher = new ev_io();
> -      watcher->data = decoder;
> +      watcher->data = new Socket(sockets[s]); // XXX receiving_connect expects watcher->data to be a Decoder* !!!
>       // Try and connect to the node using this socket.
>       sockaddr_in addr;
>       memset(&addr, 0, sizeof(addr));
>       addr.sin_family = PF_INET;
>       addr.sin_port = htons(to.port);
>       addr.sin_addr.s_addr = to.ip;
>       if (connect(s, (sockaddr*) &addr, sizeof(addr)) < 0) {
>         if (errno != EINPROGRESS) {
>           PLOG(FATAL) << "Failed to link, connect";
>         }
>         // Wait for socket to be connected.
>         ev_io_init(watcher, receiving_connect, s, EV_WRITE); // XXX: watcher->data is a Socket*, not a Decoder*!
>       } else {
>         ev_io_init(watcher, ignore_data, s, EV_READ);
>       }
> {code}
> {code: title=receiving_connect expects Decoder*}
> void receiving_connect(struct ev_loop* loop, ev_io* watcher, int revents)
> {
>   int s = watcher->fd;
>   // Now check that a successful connection was made.
>   int opt;
>   socklen_t optlen = sizeof(opt);
>   if (getsockopt(s, SOL_SOCKET, SO_ERROR, &opt, &optlen) < 0 || opt != 0) {
>     // Connect failure.
>     VLOG(1) << "Socket error while connecting";
>     socket_manager->close(s);
>     DataDecoder* decoder = (DataDecoder*) watcher->data; // XXX A Socket* in the case above !!
>     delete decoder;
>     ev_io_stop(loop, watcher);
>     delete watcher;
>   } else {
>     // We're connected! Now let's do some receiving.
>     ev_io_stop(loop, watcher);
>     ev_io_init(watcher, ignore_data, s, EV_READ);
>     ev_io_start(loop, watcher);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)