You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mesos.apache.org by vi...@apache.org on 2017/01/24 22:58:47 UTC
mesos git commit: Fixed bug allowing IOSwitchboard::connect() after
container destruction.
Repository: mesos
Updated Branches:
refs/heads/master 26923a1de -> 7f83b6e34
Fixed bug allowing IOSwitchboard::connect() after container destruction.
Review: https://reviews.apache.org/r/55810/
Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/7f83b6e3
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/7f83b6e3
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/7f83b6e3
Branch: refs/heads/master
Commit: 7f83b6e34a2adef79a427e54f7a2dd4d5afbc1f5
Parents: 26923a1
Author: Kevin Klues <kl...@gmail.com>
Authored: Tue Jan 24 14:58:33 2017 -0800
Committer: Vinod Kone <vi...@gmail.com>
Committed: Tue Jan 24 14:58:33 2017 -0800
----------------------------------------------------------------------
src/slave/containerizer/mesos/io/switchboard.cpp | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/mesos/blob/7f83b6e3/src/slave/containerizer/mesos/io/switchboard.cpp
----------------------------------------------------------------------
diff --git a/src/slave/containerizer/mesos/io/switchboard.cpp b/src/slave/containerizer/mesos/io/switchboard.cpp
index 4b134f8..1b8f490 100644
--- a/src/slave/containerizer/mesos/io/switchboard.cpp
+++ b/src/slave/containerizer/mesos/io/switchboard.cpp
@@ -695,7 +695,7 @@ Future<http::Connection> IOSwitchboard::connect(
})
.then(defer(self(), [=]() -> Future<http::Connection> {
if (!infos.contains(containerId)) {
- return Failure("Container has or is being destroyed");
+ return Failure("I/O switchboard has shutdown");
}
// TODO(jieyu): We might still get a connection refused error
@@ -758,8 +758,6 @@ Future<Nothing> IOSwitchboard::cleanup(
Option<pid_t> pid = infos[containerId]->pid;
Future<Option<int>> status = infos[containerId]->status;
- infos.erase(containerId);
-
// If we have a pid, then we attempt to send it a SIGTERM to have it
// shutdown gracefully. This is best effort, as it's likely that the
// switchboard has already shutdown in the common case.
@@ -794,6 +792,19 @@ Future<Nothing> IOSwitchboard::cleanup(
// DISCARDED cases as well.
return await(list<Future<Option<int>>>{status}).then(
defer(self(), [this, containerId]() -> Future<Nothing> {
+ // We only remove the 'containerId from our info struct once
+ // we are sure that the I/O switchboard has shutdown. If we
+ // removed it any earlier, attempts to connect to the I/O
+ // switchboard would fail.
+ //
+ // NOTE: One caveat of this approach is that this lambda will
+ // be invoked multiple times if `cleanup()` is called multiple
+ // times before the first instance of it is triggered. This is
+ // OK for now because the logic below has no side effects. If
+ // the logic below gets more complicated, we may need to
+ // revisit this approach.
+ infos.erase(containerId);
+
// Best effort removal of the unix domain socket file created for
// this container's `IOSwitchboardServer`. If it hasn't been
// checkpointed yet, or the socket file itself hasn't been created,
Re: mesos git commit: Fixed bug allowing IOSwitchboard::connect()
after container destruction.
Posted by Jie Yu <yu...@gmail.com>.
If 'cleanup' is invoked twice, that might cause signal being sent twice?
- Jie
On Tue, Jan 24, 2017 at 2:58 PM, <vi...@apache.org> wrote:
> Repository: mesos
> Updated Branches:
> refs/heads/master 26923a1de -> 7f83b6e34
>
>
> Fixed bug allowing IOSwitchboard::connect() after container destruction.
>
> Review: https://reviews.apache.org/r/55810/
>
>
> Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
> Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/7f83b6e3
> Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/7f83b6e3
> Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/7f83b6e3
>
> Branch: refs/heads/master
> Commit: 7f83b6e34a2adef79a427e54f7a2dd4d5afbc1f5
> Parents: 26923a1
> Author: Kevin Klues <kl...@gmail.com>
> Authored: Tue Jan 24 14:58:33 2017 -0800
> Committer: Vinod Kone <vi...@gmail.com>
> Committed: Tue Jan 24 14:58:33 2017 -0800
>
> ----------------------------------------------------------------------
> src/slave/containerizer/mesos/io/switchboard.cpp | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
> ----------------------------------------------------------------------
>
>
> http://git-wip-us.apache.org/repos/asf/mesos/blob/7f83b6e3/
> src/slave/containerizer/mesos/io/switchboard.cpp
> ----------------------------------------------------------------------
> diff --git a/src/slave/containerizer/mesos/io/switchboard.cpp
> b/src/slave/containerizer/mesos/io/switchboard.cpp
> index 4b134f8..1b8f490 100644
> --- a/src/slave/containerizer/mesos/io/switchboard.cpp
> +++ b/src/slave/containerizer/mesos/io/switchboard.cpp
> @@ -695,7 +695,7 @@ Future<http::Connection> IOSwitchboard::connect(
> })
> .then(defer(self(), [=]() -> Future<http::Connection> {
> if (!infos.contains(containerId)) {
> - return Failure("Container has or is being destroyed");
> + return Failure("I/O switchboard has shutdown");
> }
>
> // TODO(jieyu): We might still get a connection refused error
> @@ -758,8 +758,6 @@ Future<Nothing> IOSwitchboard::cleanup(
> Option<pid_t> pid = infos[containerId]->pid;
> Future<Option<int>> status = infos[containerId]->status;
>
> - infos.erase(containerId);
> -
> // If we have a pid, then we attempt to send it a SIGTERM to have it
> // shutdown gracefully. This is best effort, as it's likely that the
> // switchboard has already shutdown in the common case.
> @@ -794,6 +792,19 @@ Future<Nothing> IOSwitchboard::cleanup(
> // DISCARDED cases as well.
> return await(list<Future<Option<int>>>{status}).then(
> defer(self(), [this, containerId]() -> Future<Nothing> {
> + // We only remove the 'containerId from our info struct once
> + // we are sure that the I/O switchboard has shutdown. If we
> + // removed it any earlier, attempts to connect to the I/O
> + // switchboard would fail.
> + //
> + // NOTE: One caveat of this approach is that this lambda will
> + // be invoked multiple times if `cleanup()` is called multiple
> + // times before the first instance of it is triggered. This is
> + // OK for now because the logic below has no side effects. If
> + // the logic below gets more complicated, we may need to
> + // revisit this approach.
> + infos.erase(containerId);
> +
> // Best effort removal of the unix domain socket file created for
> // this container's `IOSwitchboardServer`. If it hasn't been
> // checkpointed yet, or the socket file itself hasn't been
> created,
>
>