You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by qi...@apache.org on 2018/09/05 21:36:51 UTC

[mesos] branch 1.7.x updated (a2e2f97 -> 09ee686)

This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a change to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git.


    from a2e2f97  Added MESOS-9189 to the 1.7.0 CHANGELOG.
     new 54a4c1e  Made command check always waits before removing the nested container.
     new 6499aaf  Made checker library retry to remove the previous check container.
     new 09ee686  Added MESOS-8568 to the 1.7.0 CHANGELOG.

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGELOG                      |  1 +
 src/checks/checker_process.cpp | 25 ++++++++++++++++++++-----
 2 files changed, 21 insertions(+), 5 deletions(-)


[mesos] 02/03: Made checker library retry to remove the previous check container.

Posted by qi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 6499aaf34890690e0ea3606b19fdb83d608bca35
Author: Qian Zhang <zh...@gmail.com>
AuthorDate: Wed Aug 29 11:22:41 2018 +0800

    Made checker library retry to remove the previous check container.
    
    Previously when checker library fails to remove the previous check
    container, it will discard the promise and launch a new check container
    which will cause two problems:
      1. The discarded promise is used to launch the new check container,
         that means even the new check container is launched successfully,
         we still have no chance to process its check result since the
         promise has already been discarded.
      2. The previous check container will never get a chance to be removed
         which is leak, i.e., its runtime directory and sandbox directory
         will not be removed.
    
    Now in this patch, when checker library fails to remove the previous
    check container, we make it remove the previous check container again.
    
    Review: https://reviews.apache.org/r/68555
---
 src/checks/checker_process.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/checks/checker_process.cpp b/src/checks/checker_process.cpp
index 21af9b6..c214bd1 100644
--- a/src/checks/checker_process.cpp
+++ b/src/checks/checker_process.cpp
@@ -658,10 +658,10 @@ Future<int> CheckerProcess::nestedCommandCheck(
                        << " the " << name << " for task '" << taskId << "'";
 
           promise->discard();
+        } else {
+          previousCheckContainerId = None();
+          _nestedCommandCheck(promise, cmd, nested);
         }
-
-        previousCheckContainerId = None();
-        _nestedCommandCheck(promise, cmd, nested);
       }));
   } else {
     _nestedCommandCheck(promise, cmd, nested);


[mesos] 01/03: Made command check always waits before removing the nested container.

Posted by qi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 54a4c1e6306339a59884438b3fd9752704b77362
Author: Qian Zhang <zh...@gmail.com>
AuthorDate: Thu Aug 23 17:44:53 2018 +0800

    Made command check always waits before removing the nested container.
    
    Review: https://reviews.apache.org/r/68495
---
 src/checks/checker_process.cpp | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/checks/checker_process.cpp b/src/checks/checker_process.cpp
index 77a76f4..21af9b6 100644
--- a/src/checks/checker_process.cpp
+++ b/src/checks/checker_process.cpp
@@ -795,7 +795,19 @@ void CheckerProcess::___nestedCommandCheck(
                  << launchResponse.body << ") while launching " << name
                  << " for task '" << taskId << "'";
 
-    promise->discard();
+    // We'll try to remove the container created for the check at the
+    // beginning of the next check. In order to prevent a failure, the
+    // promise should only be completed once we're sure that the
+    // container has terminated.
+    waitNestedContainer(checkContainerId, nested)
+      .onAny([promise](const Future<Option<int>>&) {
+        // We assume that once `WaitNestedContainer` returns,
+        // irrespective of whether the response contains a failure, the
+        // container will be in a terminal state, and that it will be
+        // possible to remove it.
+        promise->discard();
+    });
+
     return;
   }
 
@@ -881,7 +893,10 @@ void CheckerProcess::nestedCommandCheckFailure(
     //
     // This will allow us to recover from a blip. The executor will
     // pause the checker when it detects that the agent is not
-    // available.
+    // available. Here we do not need to wait the check container since
+    // the agent may have been unavailable, and when the agent is back,
+    // it will destroy the check container as orphan container, and we
+    // will eventually remove it in `nestedCommandCheck()`.
     LOG(WARNING) << "Connection to the agent to launch " << name
                  << " for task '" << taskId << "' failed: " << failure;
 


[mesos] 03/03: Added MESOS-8568 to the 1.7.0 CHANGELOG.

Posted by qi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

qianzhang pushed a commit to branch 1.7.x
in repository https://gitbox.apache.org/repos/asf/mesos.git

commit 09ee686b24cb87153cd135f1a1432a8ee7deff01
Author: Qian Zhang <zh...@gmail.com>
AuthorDate: Wed Sep 5 14:19:34 2018 -0700

    Added MESOS-8568 to the 1.7.0 CHANGELOG.
---
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGELOG b/CHANGELOG
index 2ded89e..3b55352 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -172,6 +172,7 @@ All Resolved Issues:
   * [MESOS-8429] - Clean up endpoint socket if the container daemon is destroyed while waiting.
   * [MESOS-8499] - Change docker health check image to the new nanoserver one
   * [MESOS-8567] - Test UriDiskProfileTest.FetchFromHTTP is flaky.
+  * [MESOS-8568] - Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`
   * [MESOS-8613] - Test `MasterAllocatorTest/*.TaskFinished` is flaky.
   * [MESOS-8626] - The 'allocatable' check in the allocator is problematic with multi-role frameworks
   * [MESOS-8686] - Mesos build failed with /permissive- + MSVC on windows