You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2014/06/04 22:23:02 UTC

[jira] [Created] (MESOS-1455) Segfault in libprocess during Process linking.

Benjamin Mahler created MESOS-1455:
--------------------------------------

             Summary: Segfault in libprocess during Process linking.
                 Key: MESOS-1455
                 URL: https://issues.apache.org/jira/browse/MESOS-1455
             Project: Mesos
          Issue Type: Bug
          Components: libprocess
    Affects Versions: 0.19.0
            Reporter: Benjamin Mahler
            Assignee: Benjamin Mahler
            Priority: Blocker
             Fix For: 0.19.0


Here is a backtrace:
{noformat}
======= Backtrace: =========
/lib64/libc.so.6[0x7f916acc274f]
/lib64/libc.so.6(cfree+0x4b)[0x7f916acc6a4b]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process17receiving_connectEP7ev_loopP5ev_ioi+0xc5)[0x7f9146a64d55]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_invoke_pending+0x55)[0x7f9146b65105]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(ev_run+0x937)[0x7f9146b680b7]
/usr/local/lib64/libmesos-0.19.0-tw6_rc1.so(_ZN7process5serveEPv+0xb)[0x7f9146a4c1cb]
/lib64/libpthread.so.0[0x7f916b3c283d]
/lib64/libc.so.6(clone+0x6d)[0x7f916ad2626d]
{noformat}

The bug was introduced as we added support for pure language bindings communicating with libprocess:
{code: title=see XXX comments}
@@ -1930,13 +1991,13 @@ void SocketManager::link(ProcessBase* process, const UPID& to)

       persists[node] = s;

-      // Allocate and initialize the decoder and watcher (we really
-      // only "receive" on this socket so that we can react when it
-      // gets closed and generate appropriate lost events).
-      DataDecoder* decoder = new DataDecoder(sockets[s]);
-
+      // Allocate and initialize a watcher for reading data from this
+      // socket. Note that we don't expect to receive anything other
+      // than HTTP '202 Accepted' responses which we anyway ignore.
+      // We do, however, want to react when it gets closed so we can
+      // generate appropriate lost events (since this is a 'link').
       ev_io* watcher = new ev_io();
-      watcher->data = decoder;
+      watcher->data = new Socket(sockets[s]); // XXX receiving_connect expects watcher->data to be a Decoder* !!!

      // Try and connect to the node using this socket.
      sockaddr_in addr;
      memset(&addr, 0, sizeof(addr));
      addr.sin_family = PF_INET;
      addr.sin_port = htons(to.port);
      addr.sin_addr.s_addr = to.ip;

      if (connect(s, (sockaddr*) &addr, sizeof(addr)) < 0) {
        if (errno != EINPROGRESS) {
          PLOG(FATAL) << "Failed to link, connect";
        }

        // Wait for socket to be connected.
        ev_io_init(watcher, receiving_connect, s, EV_WRITE); // XXX: watcher->data is a Socket*, not a Decoder*!
      } else {
        ev_io_init(watcher, ignore_data, s, EV_READ);
      }
{code}

{code: title=receiving_connect expects Decoder*}
void receiving_connect(struct ev_loop* loop, ev_io* watcher, int revents)
{
  int s = watcher->fd;

  // Now check that a successful connection was made.
  int opt;
  socklen_t optlen = sizeof(opt);

  if (getsockopt(s, SOL_SOCKET, SO_ERROR, &opt, &optlen) < 0 || opt != 0) {
    // Connect failure.
    VLOG(1) << "Socket error while connecting";
    socket_manager->close(s);
    DataDecoder* decoder = (DataDecoder*) watcher->data; // XXX A Socket* in the case above !!
    delete decoder;
    ev_io_stop(loop, watcher);
    delete watcher;
  } else {
    // We're connected! Now let's do some receiving.
    ev_io_stop(loop, watcher);
    ev_io_init(watcher, ignore_data, s, EV_READ);
    ev_io_start(loop, watcher);
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)