You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by al...@apache.org on 2017/12/22 13:48:34 UTC

[1/6] mesos git commit: Promoted log level to warning for disconnected events in exec.cpp.

Repository: mesos
Updated Branches:
  refs/heads/1.4.x a6a8b1ca3 -> 6199905ec


Promoted log level to warning for disconnected events in exec.cpp.

When the executor library receives messages while being disconnected,
it might indicate an out-of-order message delivery or lost messages.
This should be logged at the warning level to simplify triaging.

Review: https://reviews.apache.org/r/64032/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/2e7a772f
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/2e7a772f
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/2e7a772f

Branch: refs/heads/1.4.x
Commit: 2e7a772f171498874a4e5a56e3066fd5e95e2bec
Parents: a6a8b1c
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:09:58 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:23 2017 +0100

----------------------------------------------------------------------
 src/exec/exec.cpp | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/2e7a772f/src/exec/exec.cpp
----------------------------------------------------------------------
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index 65c4575..ea0b118 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -209,8 +209,7 @@ public:
 protected:
   virtual void initialize()
   {
-    VLOG(1) << "Executor started at: " << self()
-            << " with pid " << getpid();
+    VLOG(1) << "Executor started at: " << self() << " with pid " << getpid();
 
     link(slave);
 
@@ -318,8 +317,8 @@ protected:
     }
 
     if (!connected) {
-      VLOG(1) << "Ignoring run task message for task " << task.task_id()
-              << " because the driver is disconnected!";
+      LOG(WARNING) << "Ignoring run task message for task " << task.task_id()
+                   << " because the driver is disconnected!";
       return;
     }
 
@@ -378,10 +377,10 @@ protected:
     }
 
     if (!connected) {
-      VLOG(1) << "Ignoring status update acknowledgement "
-              << uuid_.get() << " for task " << taskId
-              << " of framework " << frameworkId
-              << " because the driver is disconnected!";
+      LOG(WARNING) << "Ignoring status update acknowledgement "
+                   << uuid_.get() << " for task " << taskId
+                   << " of framework " << frameworkId
+                   << " because the driver is disconnected!";
       return;
     }
 
@@ -408,8 +407,8 @@ protected:
     }
 
     if (!connected) {
-      VLOG(1) << "Ignoring framework message because "
-              << "the driver is disconnected!";
+      LOG(WARNING) << "Ignoring framework message because"
+                   << " the driver is disconnected!";
       return;
     }
 


[2/6] mesos git commit: Ensured command executor always honors shutdown request.

Posted by al...@apache.org.
Ensured command executor always honors shutdown request.

Review: https://reviews.apache.org/r/64069/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/1bc8f5dc
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/1bc8f5dc
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/1bc8f5dc

Branch: refs/heads/1.4.x
Commit: 1bc8f5dcf761a5c54fa1777a76ba4b1f77b9c521
Parents: 2e7a772
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:10:15 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:38 2017 +0100

----------------------------------------------------------------------
 src/launcher/executor.cpp | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/1bc8f5dc/src/launcher/executor.cpp
----------------------------------------------------------------------
diff --git a/src/launcher/executor.cpp b/src/launcher/executor.cpp
index 12b0326..e5d5595 100644
--- a/src/launcher/executor.cpp
+++ b/src/launcher/executor.cpp
@@ -760,6 +760,8 @@ protected:
     if (launched) {
       CHECK_SOME(taskId);
       kill(taskId.get(), gracePeriod);
+    } else {
+      terminate(self());
     }
   }
 


[4/6] mesos git commit: Terminated driver-based executors if kill arrives before launch task.

Posted by al...@apache.org.
Terminated driver-based executors if kill arrives before launch task.

`ExecutorRegisteredMessage` or `RunTaskMessage` may not be delivered
to a driver-based executor. Since these messages are not retried,
without this patch an executor never starts a task and remains idle,
ignoring kill task request. This patch ensures all built-in driver-
based executors eventually shut down if kill task arrives before
the task has been started.

Review: https://reviews.apache.org/r/64033/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/9d8502cc
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/9d8502cc
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/9d8502cc

Branch: refs/heads/1.4.x
Commit: 9d8502cc4b916eaaa9c7aaa458d5fef46931a37d
Parents: 09aaf33
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:10:35 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:50 2017 +0100

----------------------------------------------------------------------
 src/docker/executor.cpp   |  6 ++++++
 src/exec/exec.cpp         | 11 +++++++++++
 src/launcher/executor.cpp |  6 ++++++
 3 files changed, 23 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/9d8502cc/src/docker/executor.cpp
----------------------------------------------------------------------
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 5c430dc..7a50e66 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -359,6 +359,12 @@ private:
       return;
     }
 
+    // Terminate if a kill task request is received before the task is launched.
+    // This can happen, for example, if `RunTaskMessage` has not been delivered.
+    // See MESOS-8297.
+    CHECK_SOME(run) << "Terminating because kill task message has been"
+                    << " received before the task has been launched";
+
     // TODO(alexr): If a kill is in progress, consider adjusting
     // the grace period if a new one is provided.
 

http://git-wip-us.apache.org/repos/asf/mesos/blob/9d8502cc/src/exec/exec.cpp
----------------------------------------------------------------------
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index ea0b118..33a460d 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -347,6 +347,17 @@ protected:
       return;
     }
 
+    // A kill task request is received when the driver is not connected. This
+    // can happen, for example, if `ExecutorRegisteredMessage` has not been
+    // delivered. We do not shutdown the driver because there might be other
+    // still running tasks and the executor might eventually reconnect, e.g.,
+    // after the agent failover. We do not drop ignore the message because the
+    // actual executor may still want to react, e.g., commit suicide.
+    if (!connected) {
+      LOG(WARNING) << "Executor received kill task message for task " << taskId
+                   << " while disconnected from the agent!";
+    }
+
     VLOG(1) << "Executor asked to kill task '" << taskId << "'";
 
     Stopwatch stopwatch;

http://git-wip-us.apache.org/repos/asf/mesos/blob/9d8502cc/src/launcher/executor.cpp
----------------------------------------------------------------------
diff --git a/src/launcher/executor.cpp b/src/launcher/executor.cpp
index e5d5595..b518030 100644
--- a/src/launcher/executor.cpp
+++ b/src/launcher/executor.cpp
@@ -772,6 +772,12 @@ private:
       return;
     }
 
+    // Terminate if a kill task request is received before the task is launched.
+    // This can happen, for example, if `RunTaskMessage` has not been delivered.
+    // See MESOS-8297.
+    CHECK(launched) << "Terminating because kill task message has been"
+                    << " received before the task has been launched";
+
     // If the task is being killed but has not terminated yet and
     // we receive another kill request. Check if we need to adjust
     // the remaining grace period.


[5/6] mesos git commit: Fixed 1.4.x CHANGELOG.

Posted by al...@apache.org.
Fixed 1.4.x CHANGELOG.


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/2677606b
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/2677606b
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/2677606b

Branch: refs/heads/1.4.x
Commit: 2677606b9b6ddc67638d000de247a80c3ae0222c
Parents: 9d8502c
Author: Alexander Rukletsov <al...@apache.org>
Authored: Fri Dec 22 12:52:01 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:52:01 2017 +0100

----------------------------------------------------------------------
 CHANGELOG | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/2677606b/CHANGELOG
----------------------------------------------------------------------
diff --git a/CHANGELOG b/CHANGELOG
index df1410b..bdab625 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,17 +1,10 @@
 Release Notes - Mesos - Version 1.4.2 (WIP)
--------------------------------------
+-------------------------------------------
 * This is a bug fix release.
 
 ** Bug
  * [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
  * [MESOS-8159] - ns::clone uses an async signal unsafe stack.
-
-
-Release Notes - Mesos - Version 1.4.2
--------------------------------------------
-* This is a bug fix release.
-
-** Bug
  * [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
 
 


[3/6] mesos git commit: Ensured executor adapter propagates error and shutdown messages.

Posted by al...@apache.org.
Ensured executor adapter propagates error and shutdown messages.

Prior to this patch, if an error, kill, or shutdown occurred during
subscription / registration with the agent, it was not propagated back
to the executor if the v0_v1 executor adapter was used. This happened
because the adapter did not call the `connected` callback until after
successful registration and hence the executor did not even try to
send the `SUBSCRIBE` call, without which the adapter did not send any
events to the executor.

A fix is to call the `connected` callback if an error occurred or
shutdown / kill event arrived before the executor had subscribed.

Review: https://reviews.apache.org/r/64070/


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/09aaf339
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/09aaf339
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/09aaf339

Branch: refs/heads/1.4.x
Commit: 09aaf3390e0eb2fa7e96f92605943057774ac624
Parents: 1bc8f5d
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:10:28 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:44 2017 +0100

----------------------------------------------------------------------
 src/executor/v0_v1executor.cpp | 41 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/09aaf339/src/executor/v0_v1executor.cpp
----------------------------------------------------------------------
diff --git a/src/executor/v0_v1executor.cpp b/src/executor/v0_v1executor.cpp
index 61d5919..086cfc7 100644
--- a/src/executor/v0_v1executor.cpp
+++ b/src/executor/v0_v1executor.cpp
@@ -52,6 +52,7 @@ public:
       const function<void(const queue<Event>&)>& received)
     : ProcessBase(process::ID::generate("v0-to-v1-adapter")),
       callbacks {connected, disconnected, received},
+      connected(false),
       subscribeCall(false) {}
 
   virtual ~V0ToV1AdapterProcess() = default;
@@ -61,7 +62,10 @@ public:
       const mesos::FrameworkInfo& _frameworkInfo,
       const mesos::SlaveInfo& slaveInfo)
   {
-    callbacks.connected();
+    if (!connected) {
+      callbacks.connected();
+      connected = true;
+    }
 
     // We need these copies to populate the fields in `Event::Subscribed` upon
     // receiving a `reregistered()` callback later.
@@ -92,6 +96,7 @@ public:
     // disconnection from the agent.
     callbacks.disconnected();
     callbacks.connected();
+    connected = true;
 
     Event event;
     event.set_type(Event::SUBSCRIBED);
@@ -111,6 +116,17 @@ public:
 
   void killTask(const mesos::TaskID& taskId)
   {
+    // Logically an executor cannot receive any response from an agent if it
+    // is not connected. Since we have received `killTask`, we assume we are
+    // connected and trigger the `connected` callback to enable event delivery.
+    // This satisfies the invariant of the v1 interface that an executor can
+    // receive an event only after successfully connecting with the agent.
+    if (!connected) {
+      LOG(INFO) << "Implicitly connecting the executor to kill a task";
+      callbacks.connected();
+      connected = true;
+    }
+
     Event event;
     event.set_type(Event::KILL);
 
@@ -147,6 +163,17 @@ public:
 
   void shutdown()
   {
+    // Logically an executor cannot receive any response from an agent if it
+    // is not connected. Since we have received `shutdown`, we assume we are
+    // connected and trigger the `connected` callback to enable event delivery.
+    // This satisfies the invariant of the v1 interface that an executor can
+    // receive an event only after successfully connecting with the agent.
+    if (!connected) {
+      LOG(INFO) << "Implicitly connecting the executor to shut it down";
+      callbacks.connected();
+      connected = true;
+    }
+
     Event event;
     event.set_type(Event::SHUTDOWN);
 
@@ -155,6 +182,17 @@ public:
 
   void error(const string& message)
   {
+    // Logically an executor cannot receive any response from an agent if it
+    // is not connected. Since we have received `error`, we assume we are
+    // connected and trigger the `connected` callback to enable event delivery.
+    // This satisfies the invariant of the v1 interface that an executor can
+    // receive an event only after successfully connecting with the agent.
+    if (!connected) {
+      LOG(INFO) << "Implicitly connecting the executor to send an error";
+      callbacks.connected();
+      connected = true;
+    }
+
     Event event;
     event.set_type(Event::ERROR);
 
@@ -232,6 +270,7 @@ private:
   };
 
   Callbacks callbacks;
+  bool connected;
   bool subscribeCall;
   queue<Event> pending;
   Option<mesos::ExecutorInfo> executorInfo;


[6/6] mesos git commit: Added MESOS-8297 to the 1.4.2 CHANGELOG.

Posted by al...@apache.org.
Added MESOS-8297 to the 1.4.2 CHANGELOG.


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/6199905e
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/6199905e
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/6199905e

Branch: refs/heads/1.4.x
Commit: 6199905ec080aa3e73b81e090f1b11d9b3803788
Parents: 2677606
Author: Alexander Rukletsov <al...@apache.org>
Authored: Fri Dec 22 12:53:29 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:53:29 2017 +0100

----------------------------------------------------------------------
 CHANGELOG | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/6199905e/CHANGELOG
----------------------------------------------------------------------
diff --git a/CHANGELOG b/CHANGELOG
index bdab625..e49a66b 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -6,6 +6,7 @@ Release Notes - Mesos - Version 1.4.2 (WIP)
  * [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
  * [MESOS-8159] - ns::clone uses an async signal unsafe stack.
  * [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
+ * [MESOS-8297] - Built-in driver-based executors ignore kill task if the task has not been launched.
 
 
 Release Notes - Mesos - Version 1.4.1