You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2014/10/01 20:41:20 UTC

Re: Review Request 23912: Fix MESOS-947: Slave should properly handle a killTask() that arrives between runTask() and _runTask()

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23912/#review55092
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/23912/#comment95455>

    Indentation of "<<".



src/slave/slave.cpp
<https://reviews.apache.org/r/23912/#comment95469>

    See my comment for this section of code in earlier review. AFAICT, it's not addressed.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment95457>

    s/whenToKill/_runTask/
    
    that way it's immediately apparent for someone reading line #1080 what this future represents.



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment95459>

    We format DoAll like this.
    
    DoAll(arg1,
          arg2,
          arg3)



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment95467>

    s/killed/killTask/



src/tests/slave_tests.cpp
<https://reviews.apache.org/r/23912/#comment95460>

    ditto. formatting.
    
    also, this expectation should be set after driver.launchTasks().


- Vinod Kone


On Sept. 18, 2014, 1:47 a.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23912/
> -----------------------------------------------------------
> 
> (Updated Sept. 18, 2014, 1:47 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Bugs: MESOS-947
>     https://issues.apache.org/jira/browse/MESOS-947
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Fixes MESOS-947 "Slave should properly handle a killTask() that arrives between runTask() and _runTask()".
> 
> Slave::killTask() did not check for task in question combination to be "pending" (i.e. Slave::runTask had happened, but Slave::_runTask had not yet) and then erroneously assumed that Slave::runTask() had not been executed. The task was then marked "LOST" instead of "KILLED". But Slave::runTask had already scheduled Slave::_runTask to follow. Now the entry for being "pending" is removed, and the task is marked "KILLED", and _runTask gets informed about this. It checks whether the task in question is currently "pending" and if it is not, then it infers that the task has been killed and does not erroneously try to complete launching it.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 4f3df5c49a8cf72fc7153158c9eb045196b6cf13 
>   src/slave/slave.cpp 28eb02852ddcc10efe589a8069dba9c895bc160e 
>   src/tests/mesos.hpp 957e2233cc11c438fd80d3b6d1907a1223093104 
>   src/tests/mesos.cpp 3dcb2acd5ad4ab5e3a7b4fe524ee077558112773 
>   src/tests/slave_tests.cpp 69be28f6e82b99e23424bd2be8294f715d8040d4 
> 
> Diff: https://reviews.apache.org/r/23912/diff/
> 
> 
> Testing
> -------
> 
> Wrote a unit test that reliably created the situation described in the ticket. Observed that TASK_LOST and the listed log output occurred. This pointed directly to the lines in killTask() where the problem is rooted. Ran the test after fixing, it succeeded. Checked the log. It looks like a "clean kill" now :-)
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>