You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by al...@apache.org on 2017/12/22 13:48:34 UTC
[1/6] mesos git commit: Promoted log level to warning for
disconnected events in exec.cpp.
Repository: mesos
Updated Branches:
refs/heads/1.4.x a6a8b1ca3 -> 6199905ec
Promoted log level to warning for disconnected events in exec.cpp.
When the executor library receives messages while being disconnected,
it might indicate an out-of-order message delivery or lost messages.
This should be logged at the warning level to simplify triaging.
Review: https://reviews.apache.org/r/64032/
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/2e7a772f
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/2e7a772f
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/2e7a772f
Branch: refs/heads/1.4.x
Commit: 2e7a772f171498874a4e5a56e3066fd5e95e2bec
Parents: a6a8b1c
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:09:58 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:23 2017 +0100
----------------------------------------------------------------------
src/exec/exec.cpp | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/2e7a772f/src/exec/exec.cpp
----------------------------------------------------------------------
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index 65c4575..ea0b118 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -209,8 +209,7 @@ public:
protected:
virtual void initialize()
{
- VLOG(1) << "Executor started at: " << self()
- << " with pid " << getpid();
+ VLOG(1) << "Executor started at: " << self() << " with pid " << getpid();
link(slave);
@@ -318,8 +317,8 @@ protected:
}
if (!connected) {
- VLOG(1) << "Ignoring run task message for task " << task.task_id()
- << " because the driver is disconnected!";
+ LOG(WARNING) << "Ignoring run task message for task " << task.task_id()
+ << " because the driver is disconnected!";
return;
}
@@ -378,10 +377,10 @@ protected:
}
if (!connected) {
- VLOG(1) << "Ignoring status update acknowledgement "
- << uuid_.get() << " for task " << taskId
- << " of framework " << frameworkId
- << " because the driver is disconnected!";
+ LOG(WARNING) << "Ignoring status update acknowledgement "
+ << uuid_.get() << " for task " << taskId
+ << " of framework " << frameworkId
+ << " because the driver is disconnected!";
return;
}
@@ -408,8 +407,8 @@ protected:
}
if (!connected) {
- VLOG(1) << "Ignoring framework message because "
- << "the driver is disconnected!";
+ LOG(WARNING) << "Ignoring framework message because"
+ << " the driver is disconnected!";
return;
}
[2/6] mesos git commit: Ensured command executor always honors
shutdown request.
Posted by al...@apache.org.
Ensured command executor always honors shutdown request.
Review: https://reviews.apache.org/r/64069/
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/1bc8f5dc
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/1bc8f5dc
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/1bc8f5dc
Branch: refs/heads/1.4.x
Commit: 1bc8f5dcf761a5c54fa1777a76ba4b1f77b9c521
Parents: 2e7a772
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:10:15 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:38 2017 +0100
----------------------------------------------------------------------
src/launcher/executor.cpp | 2 ++
1 file changed, 2 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/1bc8f5dc/src/launcher/executor.cpp
----------------------------------------------------------------------
diff --git a/src/launcher/executor.cpp b/src/launcher/executor.cpp
index 12b0326..e5d5595 100644
--- a/src/launcher/executor.cpp
+++ b/src/launcher/executor.cpp
@@ -760,6 +760,8 @@ protected:
if (launched) {
CHECK_SOME(taskId);
kill(taskId.get(), gracePeriod);
+ } else {
+ terminate(self());
}
}
[4/6] mesos git commit: Terminated driver-based executors if kill
arrives before launch task.
Posted by al...@apache.org.
Terminated driver-based executors if kill arrives before launch task.
`ExecutorRegisteredMessage` or `RunTaskMessage` may not be delivered
to a driver-based executor. Since these messages are not retried,
without this patch an executor never starts a task and remains idle,
ignoring kill task request. This patch ensures all built-in driver-
based executors eventually shut down if kill task arrives before
the task has been started.
Review: https://reviews.apache.org/r/64033/
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/9d8502cc
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/9d8502cc
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/9d8502cc
Branch: refs/heads/1.4.x
Commit: 9d8502cc4b916eaaa9c7aaa458d5fef46931a37d
Parents: 09aaf33
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:10:35 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:50 2017 +0100
----------------------------------------------------------------------
src/docker/executor.cpp | 6 ++++++
src/exec/exec.cpp | 11 +++++++++++
src/launcher/executor.cpp | 6 ++++++
3 files changed, 23 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/9d8502cc/src/docker/executor.cpp
----------------------------------------------------------------------
diff --git a/src/docker/executor.cpp b/src/docker/executor.cpp
index 5c430dc..7a50e66 100644
--- a/src/docker/executor.cpp
+++ b/src/docker/executor.cpp
@@ -359,6 +359,12 @@ private:
return;
}
+ // Terminate if a kill task request is received before the task is launched.
+ // This can happen, for example, if `RunTaskMessage` has not been delivered.
+ // See MESOS-8297.
+ CHECK_SOME(run) << "Terminating because kill task message has been"
+ << " received before the task has been launched";
+
// TODO(alexr): If a kill is in progress, consider adjusting
// the grace period if a new one is provided.
http://git-wip-us.apache.org/repos/asf/mesos/blob/9d8502cc/src/exec/exec.cpp
----------------------------------------------------------------------
diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp
index ea0b118..33a460d 100644
--- a/src/exec/exec.cpp
+++ b/src/exec/exec.cpp
@@ -347,6 +347,17 @@ protected:
return;
}
+ // A kill task request is received when the driver is not connected. This
+ // can happen, for example, if `ExecutorRegisteredMessage` has not been
+ // delivered. We do not shutdown the driver because there might be other
+ // still running tasks and the executor might eventually reconnect, e.g.,
+ // after the agent failover. We do not drop ignore the message because the
+ // actual executor may still want to react, e.g., commit suicide.
+ if (!connected) {
+ LOG(WARNING) << "Executor received kill task message for task " << taskId
+ << " while disconnected from the agent!";
+ }
+
VLOG(1) << "Executor asked to kill task '" << taskId << "'";
Stopwatch stopwatch;
http://git-wip-us.apache.org/repos/asf/mesos/blob/9d8502cc/src/launcher/executor.cpp
----------------------------------------------------------------------
diff --git a/src/launcher/executor.cpp b/src/launcher/executor.cpp
index e5d5595..b518030 100644
--- a/src/launcher/executor.cpp
+++ b/src/launcher/executor.cpp
@@ -772,6 +772,12 @@ private:
return;
}
+ // Terminate if a kill task request is received before the task is launched.
+ // This can happen, for example, if `RunTaskMessage` has not been delivered.
+ // See MESOS-8297.
+ CHECK(launched) << "Terminating because kill task message has been"
+ << " received before the task has been launched";
+
// If the task is being killed but has not terminated yet and
// we receive another kill request. Check if we need to adjust
// the remaining grace period.
[5/6] mesos git commit: Fixed 1.4.x CHANGELOG.
Posted by al...@apache.org.
Fixed 1.4.x CHANGELOG.
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/2677606b
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/2677606b
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/2677606b
Branch: refs/heads/1.4.x
Commit: 2677606b9b6ddc67638d000de247a80c3ae0222c
Parents: 9d8502c
Author: Alexander Rukletsov <al...@apache.org>
Authored: Fri Dec 22 12:52:01 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:52:01 2017 +0100
----------------------------------------------------------------------
CHANGELOG | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/2677606b/CHANGELOG
----------------------------------------------------------------------
diff --git a/CHANGELOG b/CHANGELOG
index df1410b..bdab625 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,17 +1,10 @@
Release Notes - Mesos - Version 1.4.2 (WIP)
--------------------------------------
+-------------------------------------------
* This is a bug fix release.
** Bug
* [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
* [MESOS-8159] - ns::clone uses an async signal unsafe stack.
-
-
-Release Notes - Mesos - Version 1.4.2
--------------------------------------------
-* This is a bug fix release.
-
-** Bug
* [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
[3/6] mesos git commit: Ensured executor adapter propagates error and
shutdown messages.
Posted by al...@apache.org.
Ensured executor adapter propagates error and shutdown messages.
Prior to this patch, if an error, kill, or shutdown occurred during
subscription / registration with the agent, it was not propagated back
to the executor if the v0_v1 executor adapter was used. This happened
because the adapter did not call the `connected` callback until after
successful registration and hence the executor did not even try to
send the `SUBSCRIBE` call, without which the adapter did not send any
events to the executor.
A fix is to call the `connected` callback if an error occurred or
shutdown / kill event arrived before the executor had subscribed.
Review: https://reviews.apache.org/r/64070/
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/09aaf339
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/09aaf339
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/09aaf339
Branch: refs/heads/1.4.x
Commit: 09aaf3390e0eb2fa7e96f92605943057774ac624
Parents: 1bc8f5d
Author: Alexander Rukletsov <ru...@gmail.com>
Authored: Fri Dec 22 12:10:28 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:29:44 2017 +0100
----------------------------------------------------------------------
src/executor/v0_v1executor.cpp | 41 ++++++++++++++++++++++++++++++++++++-
1 file changed, 40 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/09aaf339/src/executor/v0_v1executor.cpp
----------------------------------------------------------------------
diff --git a/src/executor/v0_v1executor.cpp b/src/executor/v0_v1executor.cpp
index 61d5919..086cfc7 100644
--- a/src/executor/v0_v1executor.cpp
+++ b/src/executor/v0_v1executor.cpp
@@ -52,6 +52,7 @@ public:
const function<void(const queue<Event>&)>& received)
: ProcessBase(process::ID::generate("v0-to-v1-adapter")),
callbacks {connected, disconnected, received},
+ connected(false),
subscribeCall(false) {}
virtual ~V0ToV1AdapterProcess() = default;
@@ -61,7 +62,10 @@ public:
const mesos::FrameworkInfo& _frameworkInfo,
const mesos::SlaveInfo& slaveInfo)
{
- callbacks.connected();
+ if (!connected) {
+ callbacks.connected();
+ connected = true;
+ }
// We need these copies to populate the fields in `Event::Subscribed` upon
// receiving a `reregistered()` callback later.
@@ -92,6 +96,7 @@ public:
// disconnection from the agent.
callbacks.disconnected();
callbacks.connected();
+ connected = true;
Event event;
event.set_type(Event::SUBSCRIBED);
@@ -111,6 +116,17 @@ public:
void killTask(const mesos::TaskID& taskId)
{
+ // Logically an executor cannot receive any response from an agent if it
+ // is not connected. Since we have received `killTask`, we assume we are
+ // connected and trigger the `connected` callback to enable event delivery.
+ // This satisfies the invariant of the v1 interface that an executor can
+ // receive an event only after successfully connecting with the agent.
+ if (!connected) {
+ LOG(INFO) << "Implicitly connecting the executor to kill a task";
+ callbacks.connected();
+ connected = true;
+ }
+
Event event;
event.set_type(Event::KILL);
@@ -147,6 +163,17 @@ public:
void shutdown()
{
+ // Logically an executor cannot receive any response from an agent if it
+ // is not connected. Since we have received `shutdown`, we assume we are
+ // connected and trigger the `connected` callback to enable event delivery.
+ // This satisfies the invariant of the v1 interface that an executor can
+ // receive an event only after successfully connecting with the agent.
+ if (!connected) {
+ LOG(INFO) << "Implicitly connecting the executor to shut it down";
+ callbacks.connected();
+ connected = true;
+ }
+
Event event;
event.set_type(Event::SHUTDOWN);
@@ -155,6 +182,17 @@ public:
void error(const string& message)
{
+ // Logically an executor cannot receive any response from an agent if it
+ // is not connected. Since we have received `error`, we assume we are
+ // connected and trigger the `connected` callback to enable event delivery.
+ // This satisfies the invariant of the v1 interface that an executor can
+ // receive an event only after successfully connecting with the agent.
+ if (!connected) {
+ LOG(INFO) << "Implicitly connecting the executor to send an error";
+ callbacks.connected();
+ connected = true;
+ }
+
Event event;
event.set_type(Event::ERROR);
@@ -232,6 +270,7 @@ private:
};
Callbacks callbacks;
+ bool connected;
bool subscribeCall;
queue<Event> pending;
Option<mesos::ExecutorInfo> executorInfo;
[6/6] mesos git commit: Added MESOS-8297 to the 1.4.2 CHANGELOG.
Posted by al...@apache.org.
Added MESOS-8297 to the 1.4.2 CHANGELOG.
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/6199905e
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/6199905e
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/6199905e
Branch: refs/heads/1.4.x
Commit: 6199905ec080aa3e73b81e090f1b11d9b3803788
Parents: 2677606
Author: Alexander Rukletsov <al...@apache.org>
Authored: Fri Dec 22 12:53:29 2017 +0100
Committer: Alexander Rukletsov <al...@apache.org>
Committed: Fri Dec 22 12:53:29 2017 +0100
----------------------------------------------------------------------
CHANGELOG | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/6199905e/CHANGELOG
----------------------------------------------------------------------
diff --git a/CHANGELOG b/CHANGELOG
index bdab625..e49a66b 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -6,6 +6,7 @@ Release Notes - Mesos - Version 1.4.2 (WIP)
* [MESOS-7975] - The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed.
* [MESOS-8159] - ns::clone uses an async signal unsafe stack.
* [MESOS-8237] - Strip (Offer|Resource).allocation_info for non-MULTI_ROLE schedulers.
+ * [MESOS-8297] - Built-in driver-based executors ignore kill task if the task has not been launched.
Release Notes - Mesos - Version 1.4.1