You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by bm...@apache.org on 2017/09/02 03:30:08 UTC

[1/5] mesos git commit: Clarified some issues in the LibeventSSLSocket event_callback logic.

Repository: mesos
Updated Branches:
  refs/heads/master 1ae308c2f -> 899950df2


Clarified some issues in the LibeventSSLSocket event_callback logic.

This updates the comments to include my findings after having looked
into using BEV_EVENT_READING / BEV_EVENT_WRITING to handle read and
write path errors separately. Unfortunately, we cannot do this using
libevent 2.0.x (see the comments).

This also clarifies the "dirty" SSL shutdown case and the case where
futher sends are performed on a shut down socket.

Review: https://reviews.apache.org/r/62051


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/7f0c48f7
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/7f0c48f7
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/7f0c48f7

Branch: refs/heads/master
Commit: 7f0c48f759832c5fed1f4ea4cce667881cad9a49
Parents: eb8c2c7
Author: Benjamin Mahler <bm...@apache.org>
Authored: Fri Sep 1 17:27:19 2017 -0700
Committer: Benjamin Mahler <bm...@apache.org>
Committed: Fri Sep 1 20:20:58 2017 -0700

----------------------------------------------------------------------
 3rdparty/libprocess/src/libevent_ssl_socket.cpp | 51 ++++++++++++--------
 1 file changed, 32 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/7f0c48f7/3rdparty/libprocess/src/libevent_ssl_socket.cpp
----------------------------------------------------------------------
diff --git a/3rdparty/libprocess/src/libevent_ssl_socket.cpp b/3rdparty/libprocess/src/libevent_ssl_socket.cpp
index c5d7bf4..5767ce0 100644
--- a/3rdparty/libprocess/src/libevent_ssl_socket.cpp
+++ b/3rdparty/libprocess/src/libevent_ssl_socket.cpp
@@ -337,19 +337,34 @@ void LibeventSSLSocketImpl::event_callback(short events)
 {
   CHECK(__in_event_loop__);
 
+  // TODO(bmahler): Libevent's invariant is that `events` contains:
+  //
+  //   (1) one of BEV_EVENT_READING or BEV_EVENT_WRITING to
+  //       indicate whether the event was on the read or write path.
+  //
+  //   (2) one of BEV_EVENT_EOF, BEV_EVENT_ERROR, BEV_EVENT_TIMEOUT,
+  //       BEV_EVENT_CONNECTED.
+  //
+  // (1) allows us to handle read and write errors separately.
+  // HOWEVER, for SSL bufferevents in 2.0.x, libevent never seems
+  // to tell us about BEV_EVENT_READING or BEV_EVENT_WRITING,
+  // which forces us to write incorrect logic here by treating all
+  // events as affecting both reads and writes.
+  //
+  // This has been fixed in 2.1.x:
+  //   2.1 "What's New":
+  //     https://github.com/libevent/libevent/blob/release-2.1.8-stable/whatsnew-2.1.txt#L333-L335 // NOLINT
+  //   Commit:
+  //     https://github.com/libevent/libevent/commit/f7eb69ace
+  //
+  // We should require 2.1.x so that we can correctly distinguish
+  // between the read and write errors, and not have two code paths
+  // depending on the libevent version, see MESOS-5999, MESOS-6770.
+
   Owned<RecvRequest> current_recv_request;
   Owned<SendRequest> current_send_request;
   Owned<ConnectRequest> current_connect_request;
 
-  // In all of the following conditions, we're interested in swapping
-  // the value of the requests with null (if they are already null,
-  // then there's no harm).
-  //
-  // TODO(bmahler): If we receive an EOF because the receiving
-  // side only shutdown writes on its socket, we can technically
-  // still send data on the socket!
-  //   See: http://www.unixguide.net/network/socketfaq/2.6.shtml
-  //   Related JIRA: MESOS-5999
   if (events & BEV_EVENT_EOF ||
       events & BEV_EVENT_CONNECTED ||
       events & BEV_EVENT_ERROR) {
@@ -360,21 +375,19 @@ void LibeventSSLSocketImpl::event_callback(short events)
     }
   }
 
-  // If a request below is null, then no such request is in progress,
-  // either because it was never created, it has already been
-  // completed, or it has been discarded.
-
-  // The case below where `EVUTIL_SOCKET_ERROR() == 0` will catch
-  // unclean shutdowns of the socket.
+  // First handle EOF, we also look for `BEV_EVENT_ERROR` with
+  // `EVUTIL_SOCKET_ERROR() == 0` since this occurs as a result
+  // of a "dirty" SSL shutdown (i.e. TCP close before SSL close)
+  // or when this socket has been shut down and further sends
+  // are performed.
   //
-  // TODO(greggomann): We should make use of the `BEV_EVENT_READING`
-  // and `BEV_EVENT_WRITING` flags to handle read and write errors
-  // differently. Related JIRA: MESOS-6770
+  // TODO(bmahler): We don't expose "dirty" SSL shutdowns as
+  // recv errors, but perhaps we should?
   if (events & BEV_EVENT_EOF ||
      (events & BEV_EVENT_ERROR && EVUTIL_SOCKET_ERROR() == 0)) {
-    // At end of file, close the connection.
     if (current_recv_request.get() != nullptr) {
       received_eof = true;
+
       // Drain any remaining data from the bufferevent or complete the
       // promise with 0 to signify EOF. Because we set `received_eof`,
       // subsequent calls to `recv` will return 0 if there is no data


[3/5] mesos git commit: Fixed an OOM due to a send loop for SSL sockets.

Posted by bm...@apache.org.
Fixed an OOM due to a send loop for SSL sockets.

Per MESOS-7934, the LibeventSSLSocket incorrectly returns 0 to the
sender when an EOF, or "dirty" SSL shutdown (i.e. TCP close before
SSL close), or a send is performed on a socket after it has been
shut down. Not only is this incorrect due to the caller re-sending
the same data again, in the case that the socket has been shut down,
the caller of send will enter an infinite loop of retrying the send
which will rapidly lead to an OOM in libprocess.

The fix here is to fail the send instead. Note that with libevent
2.0.x the 'events' will not contain BEV_EVENT_READING or
BEV_EVENT_WRITING for SSL buffevents. With libevent 2.1.x, we can
update our logic to deal with the read and write side events
separately.

https://github.com/libevent/libevent/commit/f7eb69ace

Comments are added in a follow up change to explain this for
posterity, and MESOS-7930 tracks the additional tech debt that
needs to be addressed for SSL socket support.

Review: https://reviews.apache.org/r/62049


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/9d1b812d
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/9d1b812d
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/9d1b812d

Branch: refs/heads/master
Commit: 9d1b812d92592f5d0ff8e000d022837de89ac9ba
Parents: 1ae308c
Author: Benjamin Mahler <bm...@apache.org>
Authored: Fri Sep 1 17:17:59 2017 -0700
Committer: Benjamin Mahler <bm...@apache.org>
Committed: Fri Sep 1 20:20:58 2017 -0700

----------------------------------------------------------------------
 3rdparty/libprocess/src/libevent_ssl_socket.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/9d1b812d/3rdparty/libprocess/src/libevent_ssl_socket.cpp
----------------------------------------------------------------------
diff --git a/3rdparty/libprocess/src/libevent_ssl_socket.cpp b/3rdparty/libprocess/src/libevent_ssl_socket.cpp
index 0fa7565..c5d7bf4 100644
--- a/3rdparty/libprocess/src/libevent_ssl_socket.cpp
+++ b/3rdparty/libprocess/src/libevent_ssl_socket.cpp
@@ -394,7 +394,7 @@ void LibeventSSLSocketImpl::event_callback(short events)
     }
 
     if (current_send_request.get() != nullptr) {
-      current_send_request->promise.set(0);
+      current_send_request->promise.fail("Failed send: connection closed");
     }
 
     if (current_connect_request.get() != nullptr) {


[2/5] mesos git commit: Added a test to reproduce the OOM issue in MESOS-7934.

Posted by bm...@apache.org.
Added a test to reproduce the OOM issue in MESOS-7934.

Review: https://reviews.apache.org/r/62050


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/eb8c2c7e
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/eb8c2c7e
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/eb8c2c7e

Branch: refs/heads/master
Commit: eb8c2c7e418dc19bde456517ab44ed78b4406719
Parents: 9d1b812
Author: Benjamin Mahler <bm...@apache.org>
Authored: Fri Sep 1 17:26:18 2017 -0700
Committer: Benjamin Mahler <bm...@apache.org>
Committed: Fri Sep 1 20:20:58 2017 -0700

----------------------------------------------------------------------
 3rdparty/libprocess/src/tests/ssl_tests.cpp | 32 ++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/eb8c2c7e/3rdparty/libprocess/src/tests/ssl_tests.cpp
----------------------------------------------------------------------
diff --git a/3rdparty/libprocess/src/tests/ssl_tests.cpp b/3rdparty/libprocess/src/tests/ssl_tests.cpp
index 6affda8..b1a0ea6 100644
--- a/3rdparty/libprocess/src/tests/ssl_tests.cpp
+++ b/3rdparty/libprocess/src/tests/ssl_tests.cpp
@@ -890,4 +890,36 @@ TEST_F(SSLTest, SilentSocket)
   EXPECT_EQ(data, response->body);
 }
 
+
+// This test was added due to an OOM issue: MESOS-7934.
+TEST_F(SSLTest, ShutdownThenSend)
+{
+  Clock::pause();
+
+  Try<Socket> server = setup_server({
+      {"LIBPROCESS_SSL_ENABLED", "true"},
+      {"LIBPROCESS_SSL_KEY_FILE", key_path().string()},
+      {"LIBPROCESS_SSL_CERT_FILE", certificate_path().string()}});
+
+  ASSERT_SOME(server);
+  ASSERT_SOME(server.get().address());
+  ASSERT_SOME(server.get().address().get().hostname());
+
+  Future<Socket> socket = server.get().accept();
+
+  Clock::settle();
+  EXPECT_TRUE(socket.isPending());
+
+  Try<Socket> client = Socket::create(SocketImpl::Kind::SSL);
+  ASSERT_SOME(client);
+  AWAIT_ASSERT_READY(client->connect(server->address().get()));
+
+  AWAIT_ASSERT_READY(socket);
+
+  EXPECT_SOME(Socket(socket.get()).shutdown());
+
+  // This send should fail now that the socket is shut down.
+  AWAIT_FAILED(Socket(socket.get()).send("Hello World"));
+}
+
 #endif // USE_SSL_SOCKET


[4/5] mesos git commit: Added MESOS-7934 to the 1.2.3 CHANGELOG.

Posted by bm...@apache.org.
Added MESOS-7934 to the 1.2.3 CHANGELOG.


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/42b68033
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/42b68033
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/42b68033

Branch: refs/heads/master
Commit: 42b68033479a39955303c8ac5d13e6a0f04107c6
Parents: 7f0c48f
Author: Benjamin Mahler <bm...@apache.org>
Authored: Fri Sep 1 20:22:56 2017 -0700
Committer: Benjamin Mahler <bm...@apache.org>
Committed: Fri Sep 1 20:22:56 2017 -0700

----------------------------------------------------------------------
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/42b68033/CHANGELOG
----------------------------------------------------------------------
diff --git a/CHANGELOG b/CHANGELOG
index 0885360..0afa67d 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -414,6 +414,7 @@ All Issues:
   * [MESOS-7872] - Scheduler hang when registration fails.
   * [MESOS-7909] - Ordering dependency between 'linux/capabilities' and 'docker/runtime' isolator.
   * [MESOS-7926] - Abnormal termination of default executor can cause MesosContainerizer::destroy to fail.
+  * [MESOS-7934] - OOM due to LibeventSSLSocket send incorrectly returning 0 after shutdown.
 
 
 Release Notes - Mesos - Version 1.2.2


[5/5] mesos git commit: Added MESOS-7934 to the 1.3.2 CHANGELOG.

Posted by bm...@apache.org.
Added MESOS-7934 to the 1.3.2 CHANGELOG.


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/899950df
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/899950df
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/899950df

Branch: refs/heads/master
Commit: 899950df2ae6cfc48f0c845176605367cd52c845
Parents: 42b6803
Author: Benjamin Mahler <bm...@apache.org>
Authored: Fri Sep 1 20:24:27 2017 -0700
Committer: Benjamin Mahler <bm...@apache.org>
Committed: Fri Sep 1 20:24:27 2017 -0700

----------------------------------------------------------------------
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/899950df/CHANGELOG
----------------------------------------------------------------------
diff --git a/CHANGELOG b/CHANGELOG
index 0afa67d..afa1ed8 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -95,6 +95,7 @@ All Issues:
   * [MESOS-7909] - Ordering dependency between 'linux/capabilities' and 'docker/runtime' isolator.
   * [MESOS-7912] - Master WebUI not working in Chrome.
   * [MESOS-7926] - Abnormal termination of default executor can cause MesosContainerizer::destroy to fail.
+  * [MESOS-7934] - OOM due to LibeventSSLSocket send incorrectly returning 0 after shutdown.
 
 
 Release Notes - Mesos - Version 1.3.1