You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Niklas Nielsen <ni...@qni.dk> on 2015/06/03 06:29:59 UTC

Re: Review Request 34720: Added kill executor correction to slave.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/
-----------------------------------------------------------

(Updated June 2, 2015, 9:29 p.m.)


Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.


Changes
-------

Rebased


Bugs: MESOS-2653
    https://issues.apache.org/jira/browse/MESOS-2653


Repository: mesos


Description
-------

See summary


Diffs (updated)
-----

  src/slave/slave.hpp 245ea062a56461d96ee3055be1c93ec508d1bec7 
  src/slave/slave.cpp 271cb03770cd08406054dfce35d0821475e49b05 

Diff: https://reviews.apache.org/r/34720/diff/


Testing
-------

make check


Thanks,

Niklas Nielsen


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.

> On June 12, 2015, 6:43 p.m., Jie Yu wrote:
> > src/slave/slave.cpp, lines 4383-4400
> > <https://reviews.apache.org/r/34720/diff/4/?file=983235#file983235line4383>
> >
> >     First of all, we don't need to check if 'executor' is NULL or not here. If you looked at the callers, it's guaranteed that 'executor' will be non-NULL. So you need a CHECK here.
> >     
> >     Also, I thought about what the correct logic should be in this funtion (see below). Let me know your thoughts. (we have too many tech debts in this function, so we better fix this asap, instead of waiting further).
> >     
> >     ```
> >     TaskState state;
> >     TaskStatus::Reason reason;
> >     string message;
> >     
> >     // Determine the state and reason for the status update.
> >     if (termination.isReady() && termination.get().has_reason()) {
> >       // This is the case where the resource limit has been
> >       // reached and the container destroyed itself.
> >       state = TASK_FAILED;
> >       reason = termination.get().reason();
> >     } else if (executor->reason.isSome()) {
> >       // This is the case where the slave initiates the
> >       // termination of the container.
> >       
> >       // NOTE: For command executors, previously we send TASK_FAILED
> >       // and REASON_COMMAND_EXECUTOR_FAILED when the command executor
> >       // failed to launch (e.g., invalid executor path).
> >       if (executor->isCommandExecutor() &&
> >           executor->reason.get() == REASON_EXECUTOR_REGISTRATION_TIMEOUT) {
> >         state = TASK_FAILED;
> >         reason = REASON_COMMAND_EXECUTOR_FAILED;
> >       } else {
> >         state = TASK_LOST;
> >         reason = executor->reason.get();
> >       }
> >     } else {
> >       state = TASK_LOST;
> >       reason = TaskStatus::REASON_EXECUTOR_TERMINATED
> >     }
> >     
> >     // Determine the message for the status update.
> >     // TODO(nnielsen): Consider adding a message from the slave
> >     // as well, and concatenating the messages.
> >     if (termination.isReady()) {
> >       message = termination.get().message();
> >     } else {
> >       // This is case where the containerizer fails
> >       // to destroy the container.
> >       message = "Abnormal executor termination"; 
> >     }
> >     
> >     statusUpdate(...);
> >     ```

Forgot to mention that you need to set executor->reason accordingly when slave initiates a destroy of the container (e.g., containerizer->update fails, executor registration timeout, etc.).


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review87734
-----------------------------------------------------------


On June 11, 2015, 10:58 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 11, 2015, 10:58 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7 
>   src/slave/slave.cpp b523c2fce50e56f4f94d55a9488f49c53452e4d4 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.

> On June 12, 2015, 6:43 p.m., Jie Yu wrote:
> > src/slave/slave.cpp, lines 4383-4400
> > <https://reviews.apache.org/r/34720/diff/4/?file=983235#file983235line4383>
> >
> >     First of all, we don't need to check if 'executor' is NULL or not here. If you looked at the callers, it's guaranteed that 'executor' will be non-NULL. So you need a CHECK here.
> >     
> >     Also, I thought about what the correct logic should be in this funtion (see below). Let me know your thoughts. (we have too many tech debts in this function, so we better fix this asap, instead of waiting further).
> >     
> >     ```
> >     TaskState state;
> >     TaskStatus::Reason reason;
> >     string message;
> >     
> >     // Determine the state and reason for the status update.
> >     if (termination.isReady() && termination.get().has_reason()) {
> >       // This is the case where the resource limit has been
> >       // reached and the container destroyed itself.
> >       state = TASK_FAILED;
> >       reason = termination.get().reason();
> >     } else if (executor->reason.isSome()) {
> >       // This is the case where the slave initiates the
> >       // termination of the container.
> >       
> >       // NOTE: For command executors, previously we send TASK_FAILED
> >       // and REASON_COMMAND_EXECUTOR_FAILED when the command executor
> >       // failed to launch (e.g., invalid executor path).
> >       if (executor->isCommandExecutor() &&
> >           executor->reason.get() == REASON_EXECUTOR_REGISTRATION_TIMEOUT) {
> >         state = TASK_FAILED;
> >         reason = REASON_COMMAND_EXECUTOR_FAILED;
> >       } else {
> >         state = TASK_LOST;
> >         reason = executor->reason.get();
> >       }
> >     } else {
> >       state = TASK_LOST;
> >       reason = TaskStatus::REASON_EXECUTOR_TERMINATED
> >     }
> >     
> >     // Determine the message for the status update.
> >     // TODO(nnielsen): Consider adding a message from the slave
> >     // as well, and concatenating the messages.
> >     if (termination.isReady()) {
> >       message = termination.get().message();
> >     } else {
> >       // This is case where the containerizer fails
> >       // to destroy the container.
> >       message = "Abnormal executor termination"; 
> >     }
> >     
> >     statusUpdate(...);
> >     ```
> 
> Jie Yu wrote:
>     Forgot to mention that you need to set executor->reason accordingly when slave initiates a destroy of the container (e.g., containerizer->update fails, executor registration timeout, etc.).
> 
> Vinod Kone wrote:
>     Changing the command task state from TASK_FAILED to TASK_LOST is an API change! It should've a deprecation cycle with a headsup given on the dev list. I would recommend doing:
>     
>     ```
>     // Determine the state and reason for the status update.
>     if (termination.isReady() && termination.get().has_reason()) {
>       // This is the case where the resource limit has been
>       // reached and the container destroyed itself.
>       state = TASK_FAILED;
>       reason = termination.get().reason();
>     } else  if (executor->isCommandExecutor()) {
>         // TODO(jieyu): .....
>         state = TASK_FAILED;
>         reason = REASON_COMMAND_EXECUTOR_FAILED;
>     } else if (executor->reason.isSome()) {
>       // This is the case where the slave initiates the
>       // termination of the container.
>         state = TASK_LOST;
>         reason = executor->reason.get();
>     } else {
>       state = TASK_LOST;
>       reason = TaskStatus::REASON_EXECUTOR_TERMINATED
>     }
>     
>     ```

OK, that's why exactly the reason why I want this to be fixed this asap (otherwise, we'll have to deal with another API breaking change).

Vinod, your code does not work for niq. For example, if a command executor is killed due to QOS violation, we will send TASK_FAILED and REASON_COMMAND_EXECUTOR_FAILED. That's not correct.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review87734
-----------------------------------------------------------


On June 11, 2015, 10:58 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 11, 2015, 10:58 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7 
>   src/slave/slave.cpp b523c2fce50e56f4f94d55a9488f49c53452e4d4 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Vinod Kone <vi...@gmail.com>.

> On June 12, 2015, 6:43 p.m., Jie Yu wrote:
> > src/slave/slave.cpp, lines 4383-4400
> > <https://reviews.apache.org/r/34720/diff/4/?file=983235#file983235line4383>
> >
> >     First of all, we don't need to check if 'executor' is NULL or not here. If you looked at the callers, it's guaranteed that 'executor' will be non-NULL. So you need a CHECK here.
> >     
> >     Also, I thought about what the correct logic should be in this funtion (see below). Let me know your thoughts. (we have too many tech debts in this function, so we better fix this asap, instead of waiting further).
> >     
> >     ```
> >     TaskState state;
> >     TaskStatus::Reason reason;
> >     string message;
> >     
> >     // Determine the state and reason for the status update.
> >     if (termination.isReady() && termination.get().has_reason()) {
> >       // This is the case where the resource limit has been
> >       // reached and the container destroyed itself.
> >       state = TASK_FAILED;
> >       reason = termination.get().reason();
> >     } else if (executor->reason.isSome()) {
> >       // This is the case where the slave initiates the
> >       // termination of the container.
> >       
> >       // NOTE: For command executors, previously we send TASK_FAILED
> >       // and REASON_COMMAND_EXECUTOR_FAILED when the command executor
> >       // failed to launch (e.g., invalid executor path).
> >       if (executor->isCommandExecutor() &&
> >           executor->reason.get() == REASON_EXECUTOR_REGISTRATION_TIMEOUT) {
> >         state = TASK_FAILED;
> >         reason = REASON_COMMAND_EXECUTOR_FAILED;
> >       } else {
> >         state = TASK_LOST;
> >         reason = executor->reason.get();
> >       }
> >     } else {
> >       state = TASK_LOST;
> >       reason = TaskStatus::REASON_EXECUTOR_TERMINATED
> >     }
> >     
> >     // Determine the message for the status update.
> >     // TODO(nnielsen): Consider adding a message from the slave
> >     // as well, and concatenating the messages.
> >     if (termination.isReady()) {
> >       message = termination.get().message();
> >     } else {
> >       // This is case where the containerizer fails
> >       // to destroy the container.
> >       message = "Abnormal executor termination"; 
> >     }
> >     
> >     statusUpdate(...);
> >     ```
> 
> Jie Yu wrote:
>     Forgot to mention that you need to set executor->reason accordingly when slave initiates a destroy of the container (e.g., containerizer->update fails, executor registration timeout, etc.).

Changing the command task state from TASK_FAILED to TASK_LOST is an API change! It should've a deprecation cycle with a headsup given on the dev list. I would recommend doing:

```
// Determine the state and reason for the status update.
if (termination.isReady() && termination.get().has_reason()) {
  // This is the case where the resource limit has been
  // reached and the container destroyed itself.
  state = TASK_FAILED;
  reason = termination.get().reason();
} else  if (executor->isCommandExecutor()) {
    // TODO(jieyu): .....
    state = TASK_FAILED;
    reason = REASON_COMMAND_EXECUTOR_FAILED;
} else if (executor->reason.isSome()) {
  // This is the case where the slave initiates the
  // termination of the container.
    state = TASK_LOST;
    reason = executor->reason.get();
} else {
  state = TASK_LOST;
  reason = TaskStatus::REASON_EXECUTOR_TERMINATED
}

```


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review87734
-----------------------------------------------------------


On June 11, 2015, 10:58 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 11, 2015, 10:58 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7 
>   src/slave/slave.cpp b523c2fce50e56f4f94d55a9488f49c53452e4d4 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review87734
-----------------------------------------------------------


Niq, see my detailed comments. cc @tnachen, @vinodkone, @idownes, @bmahler


src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140150>

    First of all, we don't need to check if 'executor' is NULL or not here. If you looked at the callers, it's guaranteed that 'executor' will be non-NULL. So you need a CHECK here.
    
    Also, I thought about what the correct logic should be in this funtion (see below). Let me know your thoughts. (we have too many tech debts in this function, so we better fix this asap, instead of waiting further).
    
    ```
    TaskState state;
    TaskStatus::Reason reason;
    string message;
    
    // Determine the state and reason for the status update.
    if (termination.isReady() && termination.get().has_reason()) {
      // This is the case where the resource limit has been
      // reached and the container destroyed itself.
      state = TASK_FAILED;
      reason = termination.get().reason();
    } else if (executor->reason.isSome()) {
      // This is the case where the slave initiates the
      // termination of the container.
      
      // NOTE: For command executors, previously we send TASK_FAILED
      // and REASON_COMMAND_EXECUTOR_FAILED when the command executor
      // failed to launch (e.g., invalid executor path).
      if (executor->isCommandExecutor() &&
          executor->reason.get() == REASON_EXECUTOR_REGISTRATION_TIMEOUT) {
        state = TASK_FAILED;
        reason = REASON_COMMAND_EXECUTOR_FAILED;
      } else {
        state = TASK_LOST;
        reason = executor->reason.get();
      }
    } else {
      state = TASK_LOST;
      reason = TaskStatus::REASON_EXECUTOR_TERMINATED
    }
    
    // Determine the message for the status update.
    // TODO(nnielsen): Consider adding a message from the slave
    // as well, and concatenating the messages.
    if (termination.isReady()) {
      message = termination.get().message();
    } else {
      // This is case where the containerizer fails
      // to destroy the container.
      message = "Abnormal executor termination"; 
    }
    
    statusUpdate(...);
    ```


- Jie Yu


On June 11, 2015, 10:58 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 11, 2015, 10:58 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7 
>   src/slave/slave.cpp b523c2fce50e56f4f94d55a9488f49c53452e4d4 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.

> On June 16, 2015, 9:22 p.m., Jie Yu wrote:
> > LGTM. Some nits on logging.

IN fact, can you consistently use `QoS correction KILL` (instead of QoS KILL correction) in the logging?


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review88124
-----------------------------------------------------------


On June 16, 2015, 8:42 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 16, 2015, 8:42 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/flags.hpp 6c24e56d15881b0e3aeec3d4824842cf57121fc6 
>   src/slave/flags.cpp 93690cfa9dbf2658ce642829299f4adf08bb1062 
>   src/slave/slave.hpp dbed46d76cc7fbdd1a8d3ebcc2a1ff08b75da10f 
>   src/slave/slave.cpp 361433063b5cb38d8326f8247cf4fc6f4a18e5c9 
>   src/tests/mesos.hpp ecdf9109d2e46e8730754eeeb4978863679d56e7 
>   src/tests/mesos.cpp 509f9f205fdb1fa094e313b6f0da53000ffecbb3 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.

> On June 16, 2015, 2:22 p.m., Jie Yu wrote:
> > src/slave/slave.cpp, line 4210
> > <https://reviews.apache.org/r/34720/diff/6/?file=985948#file985948line4210>
> >
> >     Shouldn't be 'executorId' here? Also, can you put single quotes for executorId since it's specifed by the user (i.e., might contain spaces).

Great point! Thanks!


> On June 16, 2015, 2:22 p.m., Jie Yu wrote:
> > src/slave/slave.cpp, line 4138
> > <https://reviews.apache.org/r/34720/diff/6/?file=985948#file985948line4138>
> >
> >     I think we already have
> >     ```
> >     using mesos::slave::QoSController;
> >     ```
> >     
> >     So you can remove the namespace prefix. It probably will fit in one line.

Added 'using mesos::slave::QoSCorrection;'


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review88124
-----------------------------------------------------------


On June 16, 2015, 4:47 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 16, 2015, 4:47 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/flags.hpp 6c24e56d15881b0e3aeec3d4824842cf57121fc6 
>   src/slave/flags.cpp 93690cfa9dbf2658ce642829299f4adf08bb1062 
>   src/slave/slave.hpp dbed46d76cc7fbdd1a8d3ebcc2a1ff08b75da10f 
>   src/slave/slave.cpp 361433063b5cb38d8326f8247cf4fc6f4a18e5c9 
>   src/tests/mesos.hpp ecdf9109d2e46e8730754eeeb4978863679d56e7 
>   src/tests/mesos.cpp 509f9f205fdb1fa094e313b6f0da53000ffecbb3 
>   src/tests/oversubscription_tests.cpp 3481ad2eef43c3860642970b4c96494997de8552 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review88124
-----------------------------------------------------------

Ship it!


LGTM. Some nits on logging.


src/slave/slave.cpp (line 4137)
<https://reviews.apache.org/r/34720/#comment140520>

    I think we already have
    ```
    using mesos::slave::QoSController;
    ```
    
    So you can remove the namespace prefix. It probably will fit in one line.



src/slave/slave.cpp (line 4161)
<https://reviews.apache.org/r/34720/#comment140522>

    Ditto on removing mesos::slave:: prefix. Please do a sweep.



src/slave/slave.cpp (line 4172)
<https://reviews.apache.org/r/34720/#comment140526>

    s/kill/KILL/
    
    Here and everywhere else. Also, to be consistent, let's use the same logging format:
    
    ```
    Ignoring QoS KILL correction: framework id not specified.
    ```



src/slave/slave.cpp (lines 4182 - 4183)
<https://reviews.apache.org/r/34720/#comment140533>

    ```
    Ignoring QoS KILL correction on framework ...: executor id not specified.
    ```



src/slave/slave.cpp (line 4191)
<https://reviews.apache.org/r/34720/#comment140527>

    ```Ignoring QoS KILL correction on framework ...: framework cannot be found```



src/slave/slave.cpp (line 4202)
<https://reviews.apache.org/r/34720/#comment140528>

    Ignoring QoS KILL correction on framework ...: framework is terminating.



src/slave/slave.cpp (line 4209)
<https://reviews.apache.org/r/34720/#comment140525>

    Shouldn't be 'executorId' here? Also, can you put single quotes for executorId since it's specifed by the user (i.e., might contain spaces).



src/slave/slave.cpp (lines 4209 - 4211)
<https://reviews.apache.org/r/34720/#comment140529>

    To be consistent, here should be:
    ```
    Ignoring QoS KILL correction on executor ... of framework ...: executor cannot be found
    ```



src/slave/slave.cpp (lines 4218 - 4220)
<https://reviews.apache.org/r/34720/#comment140530>

    HOw about
    ```
    Applying QoS KILL correction on executor ... of framework ...
    ```



src/slave/slave.cpp (line 4219)
<https://reviews.apache.org/r/34720/#comment140524>

    YOu forgot the single quote here.



src/slave/slave.cpp (lines 4236 - 4238)
<https://reviews.apache.org/r/34720/#comment140531>

    Ditto on consistency:
    ```
    Ignoring QoS KILL correction on executor ... of framework ...: executor is XXX
    ```



src/slave/slave.cpp (line 4247)
<https://reviews.apache.org/r/34720/#comment140532>

    QoS correction type ... is not supported.


- Jie Yu


On June 16, 2015, 8:42 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 16, 2015, 8:42 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/flags.hpp 6c24e56d15881b0e3aeec3d4824842cf57121fc6 
>   src/slave/flags.cpp 93690cfa9dbf2658ce642829299f4adf08bb1062 
>   src/slave/slave.hpp dbed46d76cc7fbdd1a8d3ebcc2a1ff08b75da10f 
>   src/slave/slave.cpp 361433063b5cb38d8326f8247cf4fc6f4a18e5c9 
>   src/tests/mesos.hpp ecdf9109d2e46e8730754eeeb4978863679d56e7 
>   src/tests/mesos.cpp 509f9f205fdb1fa094e313b6f0da53000ffecbb3 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/
-----------------------------------------------------------

(Updated June 16, 2015, 4:47 p.m.)


Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.


Changes
-------

Addressed comments from Jie


Bugs: MESOS-2653
    https://issues.apache.org/jira/browse/MESOS-2653


Repository: mesos


Description
-------

See summary


Diffs (updated)
-----

  src/slave/flags.hpp 6c24e56d15881b0e3aeec3d4824842cf57121fc6 
  src/slave/flags.cpp 93690cfa9dbf2658ce642829299f4adf08bb1062 
  src/slave/slave.hpp dbed46d76cc7fbdd1a8d3ebcc2a1ff08b75da10f 
  src/slave/slave.cpp 361433063b5cb38d8326f8247cf4fc6f4a18e5c9 
  src/tests/mesos.hpp ecdf9109d2e46e8730754eeeb4978863679d56e7 
  src/tests/mesos.cpp 509f9f205fdb1fa094e313b6f0da53000ffecbb3 
  src/tests/oversubscription_tests.cpp 3481ad2eef43c3860642970b4c96494997de8552 

Diff: https://reviews.apache.org/r/34720/diff/


Testing
-------

make check


Thanks,

Niklas Nielsen


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/
-----------------------------------------------------------

(Updated June 16, 2015, 1:42 p.m.)


Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.


Bugs: MESOS-2653
    https://issues.apache.org/jira/browse/MESOS-2653


Repository: mesos


Description
-------

See summary


Diffs (updated)
-----

  src/slave/flags.hpp 6c24e56d15881b0e3aeec3d4824842cf57121fc6 
  src/slave/flags.cpp 93690cfa9dbf2658ce642829299f4adf08bb1062 
  src/slave/slave.hpp dbed46d76cc7fbdd1a8d3ebcc2a1ff08b75da10f 
  src/slave/slave.cpp 361433063b5cb38d8326f8247cf4fc6f4a18e5c9 
  src/tests/mesos.hpp ecdf9109d2e46e8730754eeeb4978863679d56e7 
  src/tests/mesos.cpp 509f9f205fdb1fa094e313b6f0da53000ffecbb3 

Diff: https://reviews.apache.org/r/34720/diff/


Testing
-------

make check


Thanks,

Niklas Nielsen


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.

> On June 12, 2015, 3:49 p.m., Jie Yu wrote:
> > src/slave/slave.cpp, lines 4203-4210
> > <https://reviews.apache.org/r/34720/diff/5/?file=983931#file983931line4203>
> >
> >     Can you add a TODO here saying that we may want to  check if containerId matches or not due to race (i.e., qos controller wants to kill container 1 of executor 1, but the executor you are checking here is for container 2 of executor 1).
> >     
> >     That means we need to add ContainerID in ResourceUsage/QoSCorrection as well

Added MESOS-2875 :)


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review87767
-----------------------------------------------------------


On June 16, 2015, 1:42 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 16, 2015, 1:42 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/flags.hpp 6c24e56d15881b0e3aeec3d4824842cf57121fc6 
>   src/slave/flags.cpp 93690cfa9dbf2658ce642829299f4adf08bb1062 
>   src/slave/slave.hpp dbed46d76cc7fbdd1a8d3ebcc2a1ff08b75da10f 
>   src/slave/slave.cpp 361433063b5cb38d8326f8247cf4fc6f4a18e5c9 
>   src/tests/mesos.hpp ecdf9109d2e46e8730754eeeb4978863679d56e7 
>   src/tests/mesos.cpp 509f9f205fdb1fa094e313b6f0da53000ffecbb3 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review87767
-----------------------------------------------------------



src/slave/slave.hpp
<https://reviews.apache.org/r/34720/#comment140194>

    The 'reason' is for the slave to encode the reason  behind a terminal status update for those pending/unterminated tasks when executor is terminated.



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140186>

    Do you want to introduce a configurable minimal interval between two 'corrections()' pull? (like we did for resource estimator)?
    
    You probably want to follow the implementation of `fowardOversubscribed` and `_forwardOversubscribed`.



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140192>

    " because the framework cannot be found"



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140189>

    indent



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140191>

    "is not running" is confusing. How about:
    
    "because the executor cannot be found"



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140193>

    Can you add a TODO here saying that we may want to  check if containerId matches or not due to race (i.e., qos controller wants to kill container 1 of executor 1, but the executor you are checking here is for container 2 of executor 1).
    
    That means we need to add ContainerID in ResourceUsage/QoSCorrection as well



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140190>

    "because the executor is " << executor->state



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140195>

    overrides `executor->reason`



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment140196>

    infer ... from 'killed' field in 'termination'.


- Jie Yu


On June 12, 2015, 9:48 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 12, 2015, 9:48 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7 
>   src/slave/slave.cpp 9af69d8f0b28c9441c684886c52320378f9b2869 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/
-----------------------------------------------------------

(Updated June 12, 2015, 2:48 p.m.)


Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.


Bugs: MESOS-2653
    https://issues.apache.org/jira/browse/MESOS-2653


Repository: mesos


Description
-------

See summary


Diffs (updated)
-----

  src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7 
  src/slave/slave.cpp 9af69d8f0b28c9441c684886c52320378f9b2869 

Diff: https://reviews.apache.org/r/34720/diff/


Testing
-------

make check


Thanks,

Niklas Nielsen


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/
-----------------------------------------------------------

(Updated June 11, 2015, 3:58 p.m.)


Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.


Bugs: MESOS-2653
    https://issues.apache.org/jira/browse/MESOS-2653


Repository: mesos


Description
-------

See summary


Diffs (updated)
-----

  src/slave/slave.hpp 0df1b55791963fb4159b7ea5318d09dde4f7d8c7 
  src/slave/slave.cpp b523c2fce50e56f4f94d55a9488f49c53452e4d4 

Diff: https://reviews.apache.org/r/34720/diff/


Testing
-------

make check


Thanks,

Niklas Nielsen


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.

> On June 4, 2015, 5:58 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 4400-4413
> > <https://reviews.apache.org/r/34720/diff/3/?file=979257#file979257line4400>
> >
> >     This is getting a little hairy. As the TODO says we really ought bubble this up via the Termination protobuf. Have you looked into it?

+1

It really gets nasty here. I remembered that jaybuff said that they plan to fix that. Maybe you want to sync with them first.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review86669
-----------------------------------------------------------


On June 4, 2015, 5:43 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 5:43 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
>   src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.

> On June 4, 2015, 10:58 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 4400-4413
> > <https://reviews.apache.org/r/34720/diff/3/?file=979257#file979257line4400>
> >
> >     This is getting a little hairy. As the TODO says we really ought bubble this up via the Termination protobuf. Have you looked into it?
> 
> Jie Yu wrote:
>     +1
>     
>     It really gets nasty here. I remembered that jaybuff said that they plan to fix that. Maybe you want to sync with them first.
> 
> Niklas Nielsen wrote:
>     Mind adding a few references? Where is the Termination proto?
> 
> Jie Yu wrote:
>     Some references for the context:
>     https://issues.apache.org/jira/browse/MESOS-2035
>     https://reviews.apache.org/r/33249/ (see discussions in the review)
>     https://issues.apache.org/jira/browse/MESOS-2657

Can you be a bit more concrete about your concerns and suggested path forward? Are you suggesting to land 2035 first? Is it the qosKilled bool that concerns you?


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review86669
-----------------------------------------------------------


On June 4, 2015, 10:43 a.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 10:43 a.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
>   src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.

> On June 4, 2015, 10:58 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 4400-4413
> > <https://reviews.apache.org/r/34720/diff/3/?file=979257#file979257line4400>
> >
> >     This is getting a little hairy. As the TODO says we really ought bubble this up via the Termination protobuf. Have you looked into it?
> 
> Jie Yu wrote:
>     +1
>     
>     It really gets nasty here. I remembered that jaybuff said that they plan to fix that. Maybe you want to sync with them first.

Mind adding a few references? Where is the Termination proto?


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review86669
-----------------------------------------------------------


On June 4, 2015, 10:43 a.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 10:43 a.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
>   src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.

> On June 4, 2015, 5:58 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 4400-4413
> > <https://reviews.apache.org/r/34720/diff/3/?file=979257#file979257line4400>
> >
> >     This is getting a little hairy. As the TODO says we really ought bubble this up via the Termination protobuf. Have you looked into it?
> 
> Jie Yu wrote:
>     +1
>     
>     It really gets nasty here. I remembered that jaybuff said that they plan to fix that. Maybe you want to sync with them first.
> 
> Niklas Nielsen wrote:
>     Mind adding a few references? Where is the Termination proto?
> 
> Jie Yu wrote:
>     Some references for the context:
>     https://issues.apache.org/jira/browse/MESOS-2035
>     https://reviews.apache.org/r/33249/ (see discussions in the review)
>     https://issues.apache.org/jira/browse/MESOS-2657
> 
> Niklas Nielsen wrote:
>     Can you be a bit more concrete about your concerns and suggested path forward? Are you suggesting to land 2035 first? Is it the qosKilled bool that concerns you?

My concern is that 'killed' field in Termination is overloaded. If it is true, it means either 1) user initiated killTask 2) killed due to container limit has been reached (e.g., memory, it's buggy already since we also have disk limit now). 3) killed due to QoS controller.

Therefore, relying on this field to decide what 'reason' and 'status' to send back is very unintuitive and hard to understand. Just feel that this is a tech debt we should fix asap.

I don't have a suggestion yet. I need to think about it.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review86669
-----------------------------------------------------------


On June 4, 2015, 5:43 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 5:43 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
>   src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Jie Yu <yu...@gmail.com>.

> On June 4, 2015, 5:58 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 4400-4413
> > <https://reviews.apache.org/r/34720/diff/3/?file=979257#file979257line4400>
> >
> >     This is getting a little hairy. As the TODO says we really ought bubble this up via the Termination protobuf. Have you looked into it?
> 
> Jie Yu wrote:
>     +1
>     
>     It really gets nasty here. I remembered that jaybuff said that they plan to fix that. Maybe you want to sync with them first.
> 
> Niklas Nielsen wrote:
>     Mind adding a few references? Where is the Termination proto?

Some references for the context:
https://issues.apache.org/jira/browse/MESOS-2035
https://reviews.apache.org/r/33249/ (see discussions in the review)
https://issues.apache.org/jira/browse/MESOS-2657


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review86669
-----------------------------------------------------------


On June 4, 2015, 5:43 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 5:43 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
>   src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.

> On June 4, 2015, 10:58 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 4400-4413
> > <https://reviews.apache.org/r/34720/diff/3/?file=979257#file979257line4400>
> >
> >     This is getting a little hairy. As the TODO says we really ought bubble this up via the Termination protobuf. Have you looked into it?
> 
> Jie Yu wrote:
>     +1
>     
>     It really gets nasty here. I remembered that jaybuff said that they plan to fix that. Maybe you want to sync with them first.
> 
> Niklas Nielsen wrote:
>     Mind adding a few references? Where is the Termination proto?
> 
> Jie Yu wrote:
>     Some references for the context:
>     https://issues.apache.org/jira/browse/MESOS-2035
>     https://reviews.apache.org/r/33249/ (see discussions in the review)
>     https://issues.apache.org/jira/browse/MESOS-2657
> 
> Niklas Nielsen wrote:
>     Can you be a bit more concrete about your concerns and suggested path forward? Are you suggesting to land 2035 first? Is it the qosKilled bool that concerns you?
> 
> Jie Yu wrote:
>     My concern is that 'killed' field in Termination is overloaded. If it is true, it means either 1) user initiated killTask 2) killed due to container limit has been reached (e.g., memory, it's buggy already since we also have disk limit now). 3) killed due to QoS controller.
>     
>     Therefore, relying on this field to decide what 'reason' and 'status' to send back is very unintuitive and hard to understand. Just feel that this is a tech debt we should fix asap.
>     
>     I don't have a suggestion yet. I need to think about it.

There are a few things we need to address:

1) What status should QoS killed tasks have? Thought TASK_LOST is better than TASK_FAILED.
   Adding new terminal statuses requires support in frameworks and my intuition was that we should try to exercise the retry logic in the frameworks with TASK_LOST.

2) How should the framework be able tell the difference between different events.
   In this case, I thought it deserved it's own reason (as those are fine grained).

Didn't put more thought into this as I mimicked the existing code, so I am open to any change you guys have in mind.
While I agree with all the above, it seems like a larger issue with the containerizer API (and not well suited to be addressed here alone?)

> Therefore, relying on this field to decide what 'reason' and 'status' to send back is very unintuitive and hard to understand. Just feel that this is a tech debt we should fix asap.

This patch issues a containerizer->destroy() and this is where it ends; in order to distinguish the cause of kill, it looks like we need to change the containerizer API?


- Niklas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review86669
-----------------------------------------------------------


On June 4, 2015, 10:43 a.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 10:43 a.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
>   src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/#review86669
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/34720/#comment138734>

    This is getting a little hairy. As the TODO says we really ought bubble this up via the Termination protobuf. Have you looked into it?


- Vinod Kone


On June 4, 2015, 5:43 p.m., Niklas Nielsen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34720/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 5:43 p.m.)
> 
> 
> Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.
> 
> 
> Bugs: MESOS-2653
>     https://issues.apache.org/jira/browse/MESOS-2653
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
>   src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 
> 
> Diff: https://reviews.apache.org/r/34720/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Niklas Nielsen
> 
>


Re: Review Request 34720: Added kill executor correction to slave.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34720/
-----------------------------------------------------------

(Updated June 4, 2015, 10:43 a.m.)


Review request for mesos, Bartek Plotka, Jie Yu, Joris Van Remoortere, and Vinod Kone.


Bugs: MESOS-2653
    https://issues.apache.org/jira/browse/MESOS-2653


Repository: mesos


Description
-------

See summary


Diffs (updated)
-----

  src/slave/slave.hpp 37e85af90275dc0c88c12b39a5813e7d71abb1f3 
  src/slave/slave.cpp 30e0d8ba2a2b12a0a7f1f7246b65360de30ef4ea 

Diff: https://reviews.apache.org/r/34720/diff/


Testing
-------

make check


Thanks,

Niklas Nielsen