You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@mesos.apache.org by Megha Sharma <ms...@apple.com> on 2017/05/01 23:53:10 UTC

Review Request 58898: Send task kill for non-Partition Aware frameworks.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/
-----------------------------------------------------------

Review request for mesos, Neil Conway and Jiang Yan Xu.


Bugs: MESOS-7215
    https://issues.apache.org/jira/browse/MESOS-7215


Repository: mesos


Description
-------

Mesos is now sending ShutdownFrameworkMessages to the agent for all
non-partition-aware frameworks (including the ones that are still
registered). This is problematic. The offer from this agent can
still go to the same framework which can then launch new tasks.
The agent then receives tasks of the same framework and ignores
them because it thinks the framework is shutting down. The framework
is not shutting down of course, so from the master and the scheduler's
perspective the task is pending in STAGING forever until the next agent
reregistration, which could happen much later. This commit fixes
the problem by sending a task kill instead of ShutdownFrameworkMessage,
when the agent re-registers after being unreachable, for non-partition
aware framewworks.


Diffs
-----

  src/master/master.cpp 31a7a2fcf905c0c35e80692a69c290d4094deded 
  src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 


Diff: https://reviews.apache.org/r/58898/diff/1/


Testing
-------

make check


Thanks,

Megha Sharma

Re: Review Request 58898: Send task kill for non-Partition Aware frameworks.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review173624
-----------------------------------------------------------



Patch looks great!

Reviews applied: [58898]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On May 1, 2017, 11:58 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated May 1, 2017, 11:58 p.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Mesos is now sending ShutdownFrameworkMessages to the agent for all
> non-partition-aware frameworks (including the ones that are still
> registered). This is problematic. The offer from this agent can
> still go to the same framework which can then launch new tasks.
> The agent then receives tasks of the same framework and ignores
> them because it thinks the framework is shutting down. The framework
> is not shutting down of course, so from the master and the scheduler's
> perspective the task is pending in STAGING forever until the next agent
> reregistration, which could happen much later. This commit fixes
> the problem by sending a task kill instead of ShutdownFrameworkMessage,
> when the agent re-registers after being unreachable, for non-partition
> aware framewworks.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 31a7a2fcf905c0c35e80692a69c290d4094deded 
>   src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Do not kill non partition aware tasks.

Posted by Megha Sharma <ms...@apple.com>.


> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > My apologies for the delay in reviewing this.
> > 
> > High-level comments:
> > 
> > (a) Can we improve the description of the problem in the commit summary? It took me quite a while to figure out what is actually going on here. My understanding is:
> > 
> > (1) Agent re-registers
> > (2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the agent
> > (3) Master offers agent resources to framework
> > (4) Framework launches new task on agent _before the agent has finished shutting down the framework_
> > (5) Agent ignores launch task because it believes the framework is still terminating.
> > 
> > This is a race condition, because if the agent finished shutting down the framework (in #4) before the launch task was received, there would not be a problem. Is my understanding correct?
> > 
> > (2) I'd prefer a new unit test that constructs exactly this scenario, rather than changing existing unit tests.
> > 
> > (3) The new behavior is that the framework will receive `TASK_KILLED` for any non-PA tasks running on a partitioned agent that re-registers. This does not seem ideal, because `TASK_KILLED` _normally_ corresponds to a framework-initiated `killTask` operation.
> > 
> > (4) Thinking out loud, what if a custom executor ignores the `killTask` request? When shutting down a framework, it seems we forcibly destroy the container (via `containerizer->destroy()`), if the executor doesn't exit promptly upon receiving the framework shutdown request. AFAIK we don't have similar logic for `killTask` requests.
> > 
> > 3 + 4 suggests that maybe we want a different way to terminate the task on the agent -- let's discuss.
> 
> Megha Sharma wrote:
>     Summarizing the 2 approaches we talked about.
>     
>     Approach 1: ShutdownFrameworkMessage
>     
>     1. Upon agent re-registration, master will add tasks even for non-PA frameworks on this agent. This is needed by the master to do correct resource accounting and not offer resources already in use on this agent. We need to mutate the TaskState on the Task before adding them to the master's data structures since the TaskState might be non-terminal when the agent sends these tasks with ReregisterSlaveMessage. And the master has already sent TASK_LOST for these tasks to the frameworks so we need to set the TaskState to TASK_LOST so that any future reconciliations with the framework doesn't have this task transitioning from TASK_LOST to TASK_RUNNNG/TASK_STARTING. This is to avoid unnecessary confusion about task state as observed by the framework but indeed this could have happened with non-strict registry as well where the framework can actually receive a non terminal task state update after receiving a TASK_LOST for the same task in the past.
>     
>     2. When the agent re-registers, the master will continue to send a ShutdownFrameworkMessage to the agent to kill the tasks pertaining to non-PA frameworks on the agent as it does today. An additional optional field will be added to the ShutdownFrameworkMessage to indicate whether or not the shutdown was initiated internally.
>     
>     3. During framework shutdown the state of the framework is set to Framework::TERMINATING which prevents it from launching new tasks. Here, since the framework is not really terminating so in order to allow it to launch new tasks, the agent will not set the state to terminating if the ShutdownFrameworkMessage is generated internally.
>     
>     4. The framework shutdown today doesn't generate any status updates which needs to change. The status updates will be sent if the framework shutdown is triggered internally, this is needed to remove the tasks of non-PA frameworks that got added when the agent re-registered (1).
>     
>     Approach 2: Do not shutdown non-PA framework when agent re-registers and let the frameworks make the decision on what needs to be done when they receive non-terminal status updates for tasks for which they have already received a TASK_LOST. This hopefully won't break any frameworks since it could have happened in the past with non-strict registry as well and frameworks should be resilient enough to handle this scenario.
>     
>     Let me know if I have missed anything. Personally, I am in favor of approach 1 as it seems less catastrophic for the frameworks. What do you think?
> 
> Megha Sharma wrote:
>     The proposed solution to fix the race between new task launches and shutdown framework on the agent, was to not kill the non-partition aware tasks when an unreachable agent re-registers with the master. So now as far as Mesos internals are concerned, the tasks from non-partition aware frameworks are to be treated the same way as partition aware tasks and one way to do it cleanly in Mesos was to transition such non-partition aware tasks to TASK_UNREACHABLE state instead of TASK_LOST when the agent becomes unreachable. Internally such tasks will be in Framework#unreachableTasks cache instead of Framework#completedTasks and their state would be TASK_UNREACHABLE but to be backwards compatible we will do some transformations when the data is being exposed to the users so there is no difference in how these tasks are presented before and after this patch. I will post the patch soon but do let me know what do you think about the approach.
> 
> Megha Sharma wrote:
>     Since we are now adding the tasks from non-partition aware frameworks to unreachableTasks cache instead of completedTasks when the agent goes unreachable so we needed to do some transformations when the completed_tasks and unreachable_tasks metric are being presented to the user. To achieve this we needed an additional cache nonParitionAwareTasks to remember the taskIDs for non-partition aware tasks which have been added to unreachableTasks so they can be filtered out when the unreachable_tasks are shown and added when we present the completed_tasks (code snippet below).
>     ```    
>         writer->field("completed_tasks", [this](JSON::ArrayWriter* writer) {
>           foreach (const Owned<Task>& task, framework_->completedTasks) {
>             // Skip unauthorized tasks.
>             if (!authorizeTask_->accept(*task.get(), framework_->info)) {
>               continue;
>             }
>     
>             writer->element(*task.get());
>           }
>           foreach (const TaskID& taskID, framework_->nonParitionAwareTasks) {
>             if (framework_->unreachableTasks.contains(taskID)) {
>               Task task =
>                 framework_->transformNonPartitionAwareTask(
>                     *framework_->unreachableTasks.at(taskID).get());
>               // Skip unauthorized tasks.
>               if (!authorizeTask_->accept(task, framework_->info)) {
>                 continue;
>               }
>               writer->element(task);
>             }
>           }
>         });
>     ```

I have created a new Review Request https://reviews.apache.org/r/61473. We can continue discussing the approach here.


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------


On Aug. 6, 2017, 4:35 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2017, 4:35 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Do not kill non partition aware tasks.

Posted by Megha Sharma <ms...@apple.com>.


> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > My apologies for the delay in reviewing this.
> > 
> > High-level comments:
> > 
> > (a) Can we improve the description of the problem in the commit summary? It took me quite a while to figure out what is actually going on here. My understanding is:
> > 
> > (1) Agent re-registers
> > (2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the agent
> > (3) Master offers agent resources to framework
> > (4) Framework launches new task on agent _before the agent has finished shutting down the framework_
> > (5) Agent ignores launch task because it believes the framework is still terminating.
> > 
> > This is a race condition, because if the agent finished shutting down the framework (in #4) before the launch task was received, there would not be a problem. Is my understanding correct?
> > 
> > (2) I'd prefer a new unit test that constructs exactly this scenario, rather than changing existing unit tests.
> > 
> > (3) The new behavior is that the framework will receive `TASK_KILLED` for any non-PA tasks running on a partitioned agent that re-registers. This does not seem ideal, because `TASK_KILLED` _normally_ corresponds to a framework-initiated `killTask` operation.
> > 
> > (4) Thinking out loud, what if a custom executor ignores the `killTask` request? When shutting down a framework, it seems we forcibly destroy the container (via `containerizer->destroy()`), if the executor doesn't exit promptly upon receiving the framework shutdown request. AFAIK we don't have similar logic for `killTask` requests.
> > 
> > 3 + 4 suggests that maybe we want a different way to terminate the task on the agent -- let's discuss.
> 
> Megha Sharma wrote:
>     Summarizing the 2 approaches we talked about.
>     
>     Approach 1: ShutdownFrameworkMessage
>     
>     1. Upon agent re-registration, master will add tasks even for non-PA frameworks on this agent. This is needed by the master to do correct resource accounting and not offer resources already in use on this agent. We need to mutate the TaskState on the Task before adding them to the master's data structures since the TaskState might be non-terminal when the agent sends these tasks with ReregisterSlaveMessage. And the master has already sent TASK_LOST for these tasks to the frameworks so we need to set the TaskState to TASK_LOST so that any future reconciliations with the framework doesn't have this task transitioning from TASK_LOST to TASK_RUNNNG/TASK_STARTING. This is to avoid unnecessary confusion about task state as observed by the framework but indeed this could have happened with non-strict registry as well where the framework can actually receive a non terminal task state update after receiving a TASK_LOST for the same task in the past.
>     
>     2. When the agent re-registers, the master will continue to send a ShutdownFrameworkMessage to the agent to kill the tasks pertaining to non-PA frameworks on the agent as it does today. An additional optional field will be added to the ShutdownFrameworkMessage to indicate whether or not the shutdown was initiated internally.
>     
>     3. During framework shutdown the state of the framework is set to Framework::TERMINATING which prevents it from launching new tasks. Here, since the framework is not really terminating so in order to allow it to launch new tasks, the agent will not set the state to terminating if the ShutdownFrameworkMessage is generated internally.
>     
>     4. The framework shutdown today doesn't generate any status updates which needs to change. The status updates will be sent if the framework shutdown is triggered internally, this is needed to remove the tasks of non-PA frameworks that got added when the agent re-registered (1).
>     
>     Approach 2: Do not shutdown non-PA framework when agent re-registers and let the frameworks make the decision on what needs to be done when they receive non-terminal status updates for tasks for which they have already received a TASK_LOST. This hopefully won't break any frameworks since it could have happened in the past with non-strict registry as well and frameworks should be resilient enough to handle this scenario.
>     
>     Let me know if I have missed anything. Personally, I am in favor of approach 1 as it seems less catastrophic for the frameworks. What do you think?
> 
> Megha Sharma wrote:
>     The proposed solution to fix the race between new task launches and shutdown framework on the agent, was to not kill the non-partition aware tasks when an unreachable agent re-registers with the master. So now as far as Mesos internals are concerned, the tasks from non-partition aware frameworks are to be treated the same way as partition aware tasks and one way to do it cleanly in Mesos was to transition such non-partition aware tasks to TASK_UNREACHABLE state instead of TASK_LOST when the agent becomes unreachable. Internally such tasks will be in Framework#unreachableTasks cache instead of Framework#completedTasks and their state would be TASK_UNREACHABLE but to be backwards compatible we will do some transformations when the data is being exposed to the users so there is no difference in how these tasks are presented before and after this patch. I will post the patch soon but do let me know what do you think about the approach.

Since we are now adding the tasks from non-partition aware frameworks to unreachableTasks cache instead of completedTasks when the agent goes unreachable so we needed to do some transformations when the completed_tasks and unreachable_tasks metric are being presented to the user. To achieve this we needed an additional cache nonParitionAwareTasks to remember the taskIDs for non-partition aware tasks which have been added to unreachableTasks so they can be filtered out when the unreachable_tasks are shown and added when we present the completed_tasks (code snippet below).
```    
    writer->field("completed_tasks", [this](JSON::ArrayWriter* writer) {
      foreach (const Owned<Task>& task, framework_->completedTasks) {
        // Skip unauthorized tasks.
        if (!authorizeTask_->accept(*task.get(), framework_->info)) {
          continue;
        }

        writer->element(*task.get());
      }
      foreach (const TaskID& taskID, framework_->nonParitionAwareTasks) {
        if (framework_->unreachableTasks.contains(taskID)) {
          Task task =
            framework_->transformNonPartitionAwareTask(
                *framework_->unreachableTasks.at(taskID).get());
          // Skip unauthorized tasks.
          if (!authorizeTask_->accept(task, framework_->info)) {
            continue;
          }
          writer->element(task);
        }
      }
    });
```


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------


On Aug. 6, 2017, 4:35 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2017, 4:35 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Do not kill non partition aware tasks.

Posted by Megha Sharma <ms...@apple.com>.


> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > src/master/master.cpp
> > Lines 6034 (patched)
> > <https://reviews.apache.org/r/58898/diff/2/?file=1704920#file1704920line6034>
> >
> >     `tasksToKill`

Code changed, not relevant anymore


> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > src/master/master.cpp
> > Lines 6057 (patched)
> > <https://reviews.apache.org/r/58898/diff/2/?file=1704920#file1704920line6057>
> >
> >     "Keep"

Code changed, not relevant anymore


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------


On Aug. 6, 2017, 4:35 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2017, 4:35 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Send task kill for non-Partition Aware frameworks.

Posted by Megha Sharma <ms...@apple.com>.

> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > My apologies for the delay in reviewing this.
> > 
> > High-level comments:
> > 
> > (a) Can we improve the description of the problem in the commit summary? It took me quite a while to figure out what is actually going on here. My understanding is:
> > 
> > (1) Agent re-registers
> > (2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the agent
> > (3) Master offers agent resources to framework
> > (4) Framework launches new task on agent _before the agent has finished shutting down the framework_
> > (5) Agent ignores launch task because it believes the framework is still terminating.
> > 
> > This is a race condition, because if the agent finished shutting down the framework (in #4) before the launch task was received, there would not be a problem. Is my understanding correct?
> > 
> > (2) I'd prefer a new unit test that constructs exactly this scenario, rather than changing existing unit tests.
> > 
> > (3) The new behavior is that the framework will receive `TASK_KILLED` for any non-PA tasks running on a partitioned agent that re-registers. This does not seem ideal, because `TASK_KILLED` _normally_ corresponds to a framework-initiated `killTask` operation.
> > 
> > (4) Thinking out loud, what if a custom executor ignores the `killTask` request? When shutting down a framework, it seems we forcibly destroy the container (via `containerizer->destroy()`), if the executor doesn't exit promptly upon receiving the framework shutdown request. AFAIK we don't have similar logic for `killTask` requests.
> > 
> > 3 + 4 suggests that maybe we want a different way to terminate the task on the agent -- let's discuss.

Summarizing the 2 approaches we talked about.

Approach 1: ShutdownFrameworkMessage

1. Upon agent re-registration, master will add tasks even for non-PA frameworks on this agent. This is needed by the master to do correct resource accounting and not offer resources already in use on this agent. We need to mutate the TaskState on the Task before adding them to the master's data structures since the TaskState might be non-terminal when the agent sends these tasks with ReregisterSlaveMessage. And the master has already sent TASK_LOST for these tasks to the frameworks so we need to set the TaskState to TASK_LOST so that any future reconciliations with the framework doesn't have this task transitioning from TASK_LOST to TASK_RUNNNG/TASK_STARTING. This is to avoid unnecessary confusion about task state as observed by the framework but indeed this could have happened with non-strict registry as well where the framework can actually receive a non terminal task state update after receiving a TASK_LOST for the same task in the past.

2. When the agent re-registers, the master will continue to send a ShutdownFrameworkMessage to the agent to kill the tasks pertaining to non-PA frameworks on the agent as it does today. An additional optional field will be added to the ShutdownFrameworkMessage to indicate whether or not the shutdown was initiated internally.

3. During framework shutdown the state of the framework is set to Framework::TERMINATING which prevents it from launching new tasks. Here, since the framework is not really terminating so in order to allow it to launch new tasks, the agent will not set the state to terminating if the ShutdownFrameworkMessage is generated internally.

4. The framework shutdown today doesn't generate any status updates which needs to change. The status updates will be sent if the framework shutdown is triggered internally, this is needed to remove the tasks of non-PA frameworks that got added when the agent re-registered (1).

Approach 2: Do not shutdown non-PA framework when agent re-registers and let the frameworks make the decision on what needs to be done when they receive non-terminal status updates for tasks for which they have already received a TASK_LOST. This hopefully won't break any frameworks since it could have happened in the past with non-strict registry as well and frameworks should be resilient enough to handle this scenario.

Let me know if I have missed anything. Personally, I am in favor of approach 1 as it seems less catastrophic for the frameworks. What do you think?

- Megha

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------

On May 1, 2017, 11:58 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated May 1, 2017, 11:58 p.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Mesos is now sending ShutdownFrameworkMessages to the agent for all
> non-partition-aware frameworks (including the ones that are still
> registered). This is problematic. The offer from this agent can
> still go to the same framework which can then launch new tasks.
> The agent then receives tasks of the same framework and ignores
> them because it thinks the framework is shutting down. The framework
> is not shutting down of course, so from the master and the scheduler's
> perspective the task is pending in STAGING forever until the next agent
> reregistration, which could happen much later. This commit fixes
> the problem by sending a task kill instead of ShutdownFrameworkMessage,
> when the agent re-registers after being unreachable, for non-partition
> aware framewworks.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 31a7a2fcf905c0c35e80692a69c290d4094deded 
>   src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Send task kill for non-Partition Aware frameworks.

Posted by Megha Sharma <ms...@apple.com>.


> On May 10, 2017, 11:24 p.m., Neil Conway wrote:
> > My apologies for the delay in reviewing this.
> > 
> > High-level comments:
> > 
> > (a) Can we improve the description of the problem in the commit summary? It took me quite a while to figure out what is actually going on here. My understanding is:
> > 
> > (1) Agent re-registers
> > (2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the agent
> > (3) Master offers agent resources to framework
> > (4) Framework launches new task on agent _before the agent has finished shutting down the framework_
> > (5) Agent ignores launch task because it believes the framework is still terminating.
> > 
> > This is a race condition, because if the agent finished shutting down the framework (in #4) before the launch task was received, there would not be a problem. Is my understanding correct?
> > 
> > (2) I'd prefer a new unit test that constructs exactly this scenario, rather than changing existing unit tests.
> > 
> > (3) The new behavior is that the framework will receive `TASK_KILLED` for any non-PA tasks running on a partitioned agent that re-registers. This does not seem ideal, because `TASK_KILLED` _normally_ corresponds to a framework-initiated `killTask` operation.
> > 
> > (4) Thinking out loud, what if a custom executor ignores the `killTask` request? When shutting down a framework, it seems we forcibly destroy the container (via `containerizer->destroy()`), if the executor doesn't exit promptly upon receiving the framework shutdown request. AFAIK we don't have similar logic for `killTask` requests.
> > 
> > 3 + 4 suggests that maybe we want a different way to terminate the task on the agent -- let's discuss.
> 
> Megha Sharma wrote:
>     Summarizing the 2 approaches we talked about.
>     
>     Approach 1: ShutdownFrameworkMessage
>     
>     1. Upon agent re-registration, master will add tasks even for non-PA frameworks on this agent. This is needed by the master to do correct resource accounting and not offer resources already in use on this agent. We need to mutate the TaskState on the Task before adding them to the master's data structures since the TaskState might be non-terminal when the agent sends these tasks with ReregisterSlaveMessage. And the master has already sent TASK_LOST for these tasks to the frameworks so we need to set the TaskState to TASK_LOST so that any future reconciliations with the framework doesn't have this task transitioning from TASK_LOST to TASK_RUNNNG/TASK_STARTING. This is to avoid unnecessary confusion about task state as observed by the framework but indeed this could have happened with non-strict registry as well where the framework can actually receive a non terminal task state update after receiving a TASK_LOST for the same task in the past.
>     
>     2. When the agent re-registers, the master will continue to send a ShutdownFrameworkMessage to the agent to kill the tasks pertaining to non-PA frameworks on the agent as it does today. An additional optional field will be added to the ShutdownFrameworkMessage to indicate whether or not the shutdown was initiated internally.
>     
>     3. During framework shutdown the state of the framework is set to Framework::TERMINATING which prevents it from launching new tasks. Here, since the framework is not really terminating so in order to allow it to launch new tasks, the agent will not set the state to terminating if the ShutdownFrameworkMessage is generated internally.
>     
>     4. The framework shutdown today doesn't generate any status updates which needs to change. The status updates will be sent if the framework shutdown is triggered internally, this is needed to remove the tasks of non-PA frameworks that got added when the agent re-registered (1).
>     
>     Approach 2: Do not shutdown non-PA framework when agent re-registers and let the frameworks make the decision on what needs to be done when they receive non-terminal status updates for tasks for which they have already received a TASK_LOST. This hopefully won't break any frameworks since it could have happened in the past with non-strict registry as well and frameworks should be resilient enough to handle this scenario.
>     
>     Let me know if I have missed anything. Personally, I am in favor of approach 1 as it seems less catastrophic for the frameworks. What do you think?

The proposed solution to fix the race between new task launches and shutdown framework on the agent, was to not kill the non-partition aware tasks when an unreachable agent re-registers with the master. So now as far as Mesos internals are concerned, the tasks from non-partition aware frameworks are to be treated the same way as partition aware tasks and one way to do it cleanly in Mesos was to transition such non-partition aware tasks to TASK_UNREACHABLE state instead of TASK_LOST when the agent becomes unreachable. Internally such tasks will be in Framework#unreachableTasks cache instead of Framework#completedTasks and their state would be TASK_UNREACHABLE but to be backwards compatible we will do some transformations when the data is being exposed to the users so there is no difference in how these tasks are presented before and after this patch. I will post the patch soon but do let me know what do you think about the approach.


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------


On May 1, 2017, 11:58 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated May 1, 2017, 11:58 p.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Mesos is now sending ShutdownFrameworkMessages to the agent for all
> non-partition-aware frameworks (including the ones that are still
> registered). This is problematic. The offer from this agent can
> still go to the same framework which can then launch new tasks.
> The agent then receives tasks of the same framework and ignores
> them because it thinks the framework is shutting down. The framework
> is not shutting down of course, so from the master and the scheduler's
> perspective the task is pending in STAGING forever until the next agent
> reregistration, which could happen much later. This commit fixes
> the problem by sending a task kill instead of ShutdownFrameworkMessage,
> when the agent re-registers after being unreachable, for non-partition
> aware framewworks.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 31a7a2fcf905c0c35e80692a69c290d4094deded 
>   src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Send task kill for non-Partition Aware frameworks.

Posted by Neil Conway <ne...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review174194
-----------------------------------------------------------



My apologies for the delay in reviewing this.

High-level comments:

(a) Can we improve the description of the problem in the commit summary? It took me quite a while to figure out what is actually going on here. My understanding is:

(1) Agent re-registers
(2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the agent
(3) Master offers agent resources to framework
(4) Framework launches new task on agent _before the agent has finished shutting down the framework_
(5) Agent ignores launch task because it believes the framework is still terminating.

This is a race condition, because if the agent finished shutting down the framework (in #4) before the launch task was received, there would not be a problem. Is my understanding correct?

(2) I'd prefer a new unit test that constructs exactly this scenario, rather than changing existing unit tests.

(3) The new behavior is that the framework will receive `TASK_KILLED` for any non-PA tasks running on a partitioned agent that re-registers. This does not seem ideal, because `TASK_KILLED` _normally_ corresponds to a framework-initiated `killTask` operation.

(4) Thinking out loud, what if a custom executor ignores the `killTask` request? When shutting down a framework, it seems we forcibly destroy the container (via `containerizer->destroy()`), if the executor doesn't exit promptly upon receiving the framework shutdown request. AFAIK we don't have similar logic for `killTask` requests.

3 + 4 suggests that maybe we want a different way to terminate the task on the agent -- let's discuss.


src/master/master.cpp
Lines 6034 (patched)
<https://reviews.apache.org/r/58898/#comment247713>

    `tasksToKill`



src/master/master.cpp
Lines 6057 (patched)
<https://reviews.apache.org/r/58898/#comment247313>

    "Keep"



src/master/master.cpp
Lines 6058 (patched)
<https://reviews.apache.org/r/58898/#comment247312>

    "separate data structure"



src/master/master.cpp
Line 6097 (original), 6102 (patched)
<https://reviews.apache.org/r/58898/#comment247314>

    "launched by"



src/tests/partition_tests.cpp
Lines 693 (patched)
<https://reviews.apache.org/r/58898/#comment247742>

    Why did you move this to before the agent re-registers? The point of doing explicit reconciliation is to check the master's state after the agent re-registers.
    
    There's no harm in _also_ reconciling before the agent re-registers, but if you want to add that, let's do it in a separate review.



src/tests/partition_tests.cpp
Line 754 (original), 786 (patched)
<https://reviews.apache.org/r/58898/#comment247330>

    `EXPECT_FALSE`.



src/tests/partition_tests.cpp
Line 794 (original), 826 (patched)
<https://reviews.apache.org/r/58898/#comment247746>

    The point of this part of the test is to check that the agent's metrics are correct after re-registration (and before starting any additional tasks). I'd prefer to check the metrics, then start another task on the agent afterward.



src/tests/partition_tests.cpp
Line 2049 (original)
<https://reviews.apache.org/r/58898/#comment247745>

    Removing this results in two gmock warnings:
    
    ```
    GMOCK WARNING:
    Uninteresting mock function call - returning directly.
        Function call: killTask(0x7fbcf1c5fa20, @0x7fbcf1c54640 1)
    ```
    
    ```
    GMOCK WARNING:
    Uninteresting mock function call - returning directly.
        Function call: shutdown(0x7fbcf1c5fa20)
    ```


- Neil Conway


On May 1, 2017, 11:58 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated May 1, 2017, 11:58 p.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Mesos is now sending ShutdownFrameworkMessages to the agent for all
> non-partition-aware frameworks (including the ones that are still
> registered). This is problematic. The offer from this agent can
> still go to the same framework which can then launch new tasks.
> The agent then receives tasks of the same framework and ignores
> them because it thinks the framework is shutting down. The framework
> is not shutting down of course, so from the master and the scheduler's
> perspective the task is pending in STAGING forever until the next agent
> reregistration, which could happen much later. This commit fixes
> the problem by sending a task kill instead of ShutdownFrameworkMessage,
> when the agent re-registers after being unreachable, for non-partition
> aware framewworks.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 31a7a2fcf905c0c35e80692a69c290d4094deded 
>   src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Do not kill non partition aware tasks.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review182300
-----------------------------------------------------------



Bad patch!

Reviews applied: [58898]

Failed command: python support/apply-reviews.py -n -r 58898

Error:
Traceback (most recent call last):
  File "support/apply-reviews.py", line 417, in <module>
    main()
  File "support/apply-reviews.py", line 412, in main
    reviewboard(options)
  File "support/apply-reviews.py", line 402, in reviewboard
    apply_review(options)
  File "support/apply-reviews.py", line 160, in apply_review
    commit_patch(options)
  File "support/apply-reviews.py", line 261, in commit_patch
    message.write(data['message'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 655: ordinal not in range(128)

Full log: http://mesos-winbot.westus.cloudapp.azure.com/logs/212/console

- Mesos Reviewbot Windows


On Aug. 6, 2017, 4:35 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2017, 4:35 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>

Re: Review Request 58898: Do not kill non partition aware tasks.

Posted by Megha Sharma <ms...@apple.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/
-----------------------------------------------------------

(Updated Aug. 6, 2017, 4:35 a.m.)


Review request for mesos, Neil Conway and Jiang Yan Xu.


Summary (updated)
-----------------

Do not kill non partition aware tasks.


Bugs: MESOS-7215
    https://issues.apache.org/jira/browse/MESOS-7215


Repository: mesos


Description (updated)
-------

Master will not kill the tasks for non-Partition aware frameworks
when an unreachable agent re-registers with the master.
Master used to send a ShutdownFrameworkMessages to the agent
to kill the tasks from non partition aware frameworks including the
ones that are still registered which was problematic because the offer
from this agent could still go to the same framework which could then
launch new tasks. The agent would then receive tasks of the same
framework and ignore them because it thinks the framework is shutting
down. The framework is not shutting down of course, so from the master
and the scheduler’s perspective the task is pending in STAGING forever
until the next agent reregistration, which could happen much later.
This commit fixes the problem by not shutting down the non-partition
aware frameworks on such an agent.


Diffs (updated)
-----

  src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
  src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
  src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
  src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 


Diff: https://reviews.apache.org/r/58898/diff/3/

Changes: https://reviews.apache.org/r/58898/diff/2-3/


Testing
-------

make check


Thanks,

Megha Sharma

Re: Review Request 58898: Send task kill for non-Partition Aware frameworks.

Posted by Megha Sharma <ms...@apple.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/
-----------------------------------------------------------

(Updated May 1, 2017, 11:58 p.m.)


Review request for mesos, Neil Conway and Jiang Yan Xu.


Bugs: MESOS-7215
    https://issues.apache.org/jira/browse/MESOS-7215


Repository: mesos


Description
-------

Mesos is now sending ShutdownFrameworkMessages to the agent for all
non-partition-aware frameworks (including the ones that are still
registered). This is problematic. The offer from this agent can
still go to the same framework which can then launch new tasks.
The agent then receives tasks of the same framework and ignores
them because it thinks the framework is shutting down. The framework
is not shutting down of course, so from the master and the scheduler's
perspective the task is pending in STAGING forever until the next agent
reregistration, which could happen much later. This commit fixes
the problem by sending a task kill instead of ShutdownFrameworkMessage,
when the agent re-registers after being unreachable, for non-partition
aware framewworks.


Diffs (updated)
-----

  src/master/master.cpp 31a7a2fcf905c0c35e80692a69c290d4094deded 
  src/tests/partition_tests.cpp 4ff428564d1fa6cb96e6f8ec8edc331da88a3eb6 


Diff: https://reviews.apache.org/r/58898/diff/2/

Changes: https://reviews.apache.org/r/58898/diff/1-2/


Testing
-------

make check


Thanks,

Megha Sharma