You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by qi...@apache.org on 2018/10/30 22:48:52 UTC

[mesos] branch 1.4.x updated (3071ff7 -> 9c6c65e)

This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a change to branch 1.4.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


    from 3071ff7  Added MESOS-9231 to the 1.4.3 CHANGELOG and updated upgrades.md.
     new 1d61e4f  Fixed an early fd close in the cgroups event notifier.
     new ebabcd1  Ensured failed / discarded cgroups OOM notification is logged.
     new 9c6c65e  Added MESOS-9334 to the 1.4.3 CHANGELOG.

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG                                          |  1 +
 src/linux/cgroups.cpp                              | 41 ++++++++++++++--------
 .../mesos/isolators/cgroups/subsystems/memory.cpp  |  2 +-
 3 files changed, 29 insertions(+), 15 deletions(-)


[mesos] 01/03: Fixed an early fd close in the cgroups event notifier.

Posted by qi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a commit to branch 1.4.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 1d61e4f2fff6534abe935d5ed087122c485aea88
Author: Benjamin Mahler <bm...@apache.org>
AuthorDate: Fri Oct 26 10:50:54 2018 +0800

    Fixed an early fd close in the cgroups event notifier.
    
    The cgroups event notifier was closing the eventfd while an
    `io::read()` operation may be in progress. This can lead to
    bugs where the fd gets re-used and read from a stale io::read.
    
    Review: https://reviews.apache.org/r/69123/
---
 src/linux/cgroups.cpp | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/src/linux/cgroups.cpp b/src/linux/cgroups.cpp
index ab594e0..7693ce4 100644
--- a/src/linux/cgroups.cpp
+++ b/src/linux/cgroups.cpp
@@ -1239,7 +1239,7 @@ public:
       // uint64_t) from the event file, it indicates that an event has
       // occurred.
       reading = io::read(eventfd.get(), &data, sizeof(data));
-      reading.onAny(defer(self(), &Listener::_listen));
+      reading->onAny(defer(self(), &Listener::_listen, lambda::_1));
     }
 
     return promise.get()->future();
@@ -1261,14 +1261,23 @@ protected:
   virtual void finalize()
   {
     // Discard the nonblocking read.
-    reading.discard();
+    if (reading.isSome()) {
+      reading->discard();
+    }
 
-    // Unregister the eventfd if needed.
+    // Unregister the eventfd if needed. If there's a pending read,
+    // we must wait for it to finish.
     if (eventfd.isSome()) {
-      Try<Nothing> unregister = unregisterNotifier(eventfd.get());
-      if (unregister.isError()) {
-        LOG(ERROR) << "Failed to unregister eventfd: " << unregister.error();
-      }
+      int fd = eventfd.get();
+
+      reading.getOrElse(Future<size_t>(0))
+        .onAny([fd]() {
+          Try<Nothing> unregister = unregisterNotifier(fd);
+          if (unregister.isError()) {
+            LOG(ERROR) << "Failed to unregister eventfd '" << fd << "'"
+                       << ": " << unregister.error();
+          }
+      });
     }
 
     // TODO(chzhcn): Fail our promise only after 'reading' has
@@ -1281,11 +1290,15 @@ protected:
 private:
   // This function is called when the nonblocking read on the eventfd has
   // result, either because the event has happened, or an error has occurred.
-  void _listen()
+  void _listen(Future<size_t> read)
   {
     CHECK_SOME(promise);
+    CHECK_SOME(reading);
+
+    // Reset to none since we're no longer reading.
+    reading = None();
 
-    if (reading.isReady() && reading.get() == sizeof(data)) {
+    if (read.isReady() && read.get() == sizeof(data)) {
       promise.get()->set(data);
 
       // After fulfilling the promise, reset to get ready for the next one.
@@ -1293,14 +1306,14 @@ private:
       return;
     }
 
-    if (reading.isDiscarded()) {
+    if (read.isDiscarded()) {
       error = Error("Reading eventfd stopped unexpectedly");
-    } else if (reading.isFailed()) {
-      error = Error("Failed to read eventfd: " + reading.failure());
+    } else if (read.isFailed()) {
+      error = Error("Failed to read eventfd: " + read.failure());
     } else {
       error = Error("Read less than expected. Expect " +
                     stringify(sizeof(data)) + " bytes; actual " +
-                    stringify(reading.get()) + " bytes");
+                    stringify(read.get()) + " bytes");
     }
 
     // Inform failure and not listen again.
@@ -1313,7 +1326,7 @@ private:
   const Option<string> args;
 
   Option<Owned<Promise<uint64_t>>> promise;
-  Future<size_t> reading;
+  Option<Future<size_t>> reading;
   Option<Error> error;
   Option<int> eventfd;
   uint64_t data;                // The data read from the eventfd last time.


[mesos] 02/03: Ensured failed / discarded cgroups OOM notification is logged.

Posted by qi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a commit to branch 1.4.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit ebabcd1811d5b8cecb2488f1aee6b5929cfbe6fa
Author: Qian Zhang <zh...@gmail.com>
AuthorDate: Fri Oct 26 10:57:20 2018 +0800

    Ensured failed / discarded cgroups OOM notification is logged.
    
    Failed or discarded OOM notificaitions in the cgroups memory
    subsystem were not being logged, due to the continuation being
    accidentally set up using `onReady` rather than `onAny`.
    
    Review: https://reviews.apache.org/r/69188
---
 src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp b/src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
index dd2d24c..719d5a5 100644
--- a/src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
+++ b/src/slave/containerizer/mesos/isolators/cgroups/subsystems/memory.cpp
@@ -479,7 +479,7 @@ void MemorySubsystemProcess::oomListen(
   LOG(INFO) << "Started listening for OOM events for container "
             << containerId;
 
-  info->oomNotifier.onReady(
+  info->oomNotifier.onAny(
       defer(PID<MemorySubsystemProcess>(this),
             &MemorySubsystemProcess::oomWaited,
             containerId,


[mesos] 03/03: Added MESOS-9334 to the 1.4.3 CHANGELOG.

Posted by qi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a commit to branch 1.4.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 9c6c65e501b2bf31fc761fb7b98b55b6b95512db
Author: Qian Zhang <zh...@gmail.com>
AuthorDate: Tue Oct 30 15:43:55 2018 -0700

    Added MESOS-9334 to the 1.4.3 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 731eb17..9183418 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -18,6 +18,7 @@ Release Notes - Mesos - Version 1.4.3 (WIP)
   * [MESOS-9231] - `docker inspect` may return an unexpected result to Docker executor due to a race condition.
   * [MESOS-9279] - Docker Containerizer 'usage' call might be expensive if mount table is big.
   * [MESOS-9283] - Docker containerizer actor can get backlogged with large number of containers.
+  * [MESOS-9334] - Container stuck at ISOLATING state due to libevent poll never returns.
 
 
 Release Notes - Mesos - Version 1.4.2