You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Meng Zhu <mz...@mesosphere.io> on 2018/03/19 18:34:20 UTC

Review Request 66144: Enforced task launch order on the agent.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/
-----------------------------------------------------------

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8624
    https://issues.apache.org/jira/browse/MESOS-8624


Repository: mesos


Description
-------

Up until now, Mesos does not guarantee in-order
task launch on the agent. There are two asynchronous
steps (unschedule GC and task authorization) in the
agent task launch path. Depending on the CPU scheduling
order, a later task launch may finish these two steps earlier
than its predecessors and get to the launch executor stage
earlier, resulting in out-of-order task delivery.

This patch enforces the task delivery order by sequencing
task launch after the two asynchronous steps, specifically
right before entering `__run()`.


Diffs
-----

  src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
  src/slave/slave.cpp 915d4f1bbd642ca5d2587cfdd846262cb7e08788 


Diff: https://reviews.apache.org/r/66144/diff/1/


Testing
-------

make check


Thanks,

Meng Zhu


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review199711
-----------------------------------------------------------




src/slave/slave.hpp
Lines 1141-1143 (patched)
<https://reviews.apache.org/r/66144/#comment280123>

    Let's add some code to `Framework::destroyExecutor()` to GC these entries when we remove the executor struct.



src/slave/slave.cpp
Lines 2226-2227 (original), 2226-2227 (patched)
<https://reviews.apache.org/r/66144/#comment280124>

    Could you add a comment here explaining what's going on? Perhaps:
    
    "Use a sequence to ensure that task launch order is preserved."


- Greg Mann


On March 19, 2018, 6:34 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated March 19, 2018, 6:34 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp 915d4f1bbd642ca5d2587cfdd846262cb7e08788 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.

> On March 28, 2018, 11:11 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Lines 2261-2268 (patched)
> > <https://reviews.apache.org/r/66144/diff/4/?file=1989782#file1989782line2261>
> >
> >     I think we want to `return;` at the end of this conditional block? And if you make this lambda accept a string parameter, you could include the Failure message in this logging.
> >     
> >     Also, it looks like the local variable `error` is not really needed?

We need to continue. https://github.com/apache/mesos/blob/7e58e14de74a5dd39b167d6698e79e29cdd24ac6/src/slave/slave.cpp#L2193-L2204

The failure message must have been printed early, let's print it again here.

Removed `error`.


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review200133
-----------------------------------------------------------


On March 29, 2018, 11:15 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated March 29, 2018, 11:15 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/6/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review200133
-----------------------------------------------------------




src/slave/slave.cpp
Lines 2224 (patched)
<https://reviews.apache.org/r/66144/#comment280782>

    s/refer/referred/



src/slave/slave.cpp
Lines 2225 (patched)
<https://reviews.apache.org/r/66144/#comment280785>

    s/"seqFuture"/`seqFuture`/



src/slave/slave.cpp
Lines 2234-2235 (patched)
<https://reviews.apache.org/r/66144/#comment280783>

    Is unschedule GC the only failure which could lead to this?
    
    If not, then maybe say "due to unschedule GC or some other failure".



src/slave/slave.cpp
Lines 2241 (patched)
<https://reviews.apache.org/r/66144/#comment280786>

    s/tasks'/task's/



src/slave/slave.cpp
Lines 2244 (patched)
<https://reviews.apache.org/r/66144/#comment280787>

    s/sequence/the sequence/



src/slave/slave.cpp
Lines 2261-2268 (patched)
<https://reviews.apache.org/r/66144/#comment280814>

    I think we want to `return;` at the end of this conditional block? And if you make this lambda accept a string parameter, you could include the Failure message in this logging.
    
    Also, it looks like the local variable `error` is not really needed?


- Greg Mann


On March 28, 2018, 7:05 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated March 28, 2018, 7:05 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp e76daabad0d2d68aa42d1da809d4a23459eaaacb 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/
-----------------------------------------------------------

(Updated April 2, 2018, 5:36 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8617 and MESOS-8624
    https://issues.apache.org/jira/browse/MESOS-8617
    https://issues.apache.org/jira/browse/MESOS-8624


Repository: mesos


Description
-------

Up until now, Mesos does not guarantee in-order
task launch on the agent. There are two asynchronous
steps (unschedule GC and task authorization) in the
agent task launch path. Depending on the CPU scheduling
order, a later task launch may finish these two steps earlier
than its predecessors and get to the launch executor stage
earlier, resulting in out-of-order task delivery.

This patch enforces the task delivery order by sequencing
task launch after the two asynchronous steps, specifically
right before entering `__run()`.


Diffs (updated)
-----

  src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
  src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 


Diff: https://reviews.apache.org/r/66144/diff/8/

Changes: https://reviews.apache.org/r/66144/diff/7-8/


Testing
-------

make check


Thanks,

Meng Zhu


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.

> On April 2, 2018, 4:57 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Line 2233 (original), 2274 (patched)
> > <https://reviews.apache.org/r/66144/diff/7/?file=1991208#file1991208line2274>
> >
> >     We should defer this callback.

That would change the old behavior i.e. `sendExitedExecutorMessage` is sent synchronously along with other error handling code. https://github.com/apache/mesos/blob/594ee20c2453dad836313769aef9f8655cd75cd5/src/slave/slave.cpp#L2226-L2231

I found making the error handling asynchronous unnecessarily difficult to reason. e.g. making it asynchronous means that there is a brief moment that the first task has failed but the sequence is still alive--contradicting our comments. Tieing the sequence lifecycle and `exitedExecutorMessage` to task launch success atomically makes the code much easier to reason.

I am not sure if it would make a difference now, but we should stick to the old behavior unless there is compelling reason not to.


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review200329
-----------------------------------------------------------


On April 2, 2018, 5:36 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated April 2, 2018, 5:36 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/8/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.

> On April 2, 2018, 4:57 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Line 2233 (original), 2274 (patched)
> > <https://reviews.apache.org/r/66144/diff/7/?file=1991208#file1991208line2274>
> >
> >     We should defer this callback.
> 
> Meng Zhu wrote:
>     That would change the old behavior i.e. `sendExitedExecutorMessage` is sent synchronously along with other error handling code. https://github.com/apache/mesos/blob/594ee20c2453dad836313769aef9f8655cd75cd5/src/slave/slave.cpp#L2226-L2231
>     
>     I found making the error handling asynchronous unnecessarily difficult to reason. e.g. making it asynchronous means that there is a brief moment that the first task has failed but the sequence is still alive--contradicting our comments. Tieing the sequence lifecycle and `exitedExecutorMessage` to task launch success atomically makes the code much easier to reason.
>     
>     I am not sure if it would make a difference now, but we should stick to the old behavior unless there is compelling reason not to.

OK, need to defer, as the error handler might be called in a different context other than the agent actor.


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review200329
-----------------------------------------------------------


On April 2, 2018, 5:36 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated April 2, 2018, 5:36 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/8/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review200329
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Line 2233 (original), 2274 (patched)
<https://reviews.apache.org/r/66144/#comment281033>

    We should defer this callback.



src/slave/slave.cpp
Lines 2275 (patched)
<https://reviews.apache.org/r/66144/#comment281034>

    Ditto regarding naming of `framework_` here.


- Greg Mann


On April 2, 2018, 6:04 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated April 2, 2018, 6:04 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/7/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/
-----------------------------------------------------------

(Updated April 2, 2018, 11:04 a.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
-------

Thanks for the review! Patch updated.


Bugs: MESOS-8617 and MESOS-8624
    https://issues.apache.org/jira/browse/MESOS-8617
    https://issues.apache.org/jira/browse/MESOS-8624


Repository: mesos


Description
-------

Up until now, Mesos does not guarantee in-order
task launch on the agent. There are two asynchronous
steps (unschedule GC and task authorization) in the
agent task launch path. Depending on the CPU scheduling
order, a later task launch may finish these two steps earlier
than its predecessors and get to the launch executor stage
earlier, resulting in out-of-order task delivery.

This patch enforces the task delivery order by sequencing
task launch after the two asynchronous steps, specifically
right before entering `__run()`.


Diffs (updated)
-----

  src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
  src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 


Diff: https://reviews.apache.org/r/66144/diff/7/

Changes: https://reviews.apache.org/r/66144/diff/6-7/


Testing
-------

make check


Thanks,

Meng Zhu


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review200256
-----------------------------------------------------------




src/slave/slave.hpp
Lines 1147-1155 (patched)
<https://reviews.apache.org/r/66144/#comment280904>

    Before you mention the lifecycle, could you explain the purpose of these sequences?
    
    i.e.,
    
    ```
    // Sequences in this map are used to enforce the order of tasks launched on each executor.
    //
    // Note on the lifecycle of the sequence: ...
    ```



src/slave/slave.cpp
Lines 2227-2230 (patched)
<https://reviews.apache.org/r/66144/#comment280905>

    Is it true that "we do not want the discard event to propagate to other tasks"?
    I might recommend the following:
    
    "We use this sequence only to maintain the task launching order. If the sequence is deleted, we do not want the resulting discard event to propagate up the chain, which would prevent the previous `.then()` or `.repair()` callbacks from being invoked. Thus, we use `undiscardable` to protect each `taskLaunch`."



src/slave/slave.cpp
Lines 2233-2258 (patched)
<https://reviews.apache.org/r/66144/#comment280909>

    I would recommend moving this comment block above the `.onAny( ... )` call, since it is explaining the reason for the `.onAny` call and the conditions on `seqFuture`.
    
    Then, it would be good to add a comment above `taskLaunch.onReady( ... )` explaining why we are chaining onto `taskLaunch` here, rather than dispatching directly. Maybe something like:
    
    ```
    // We only want to execute the following callbacks once the work performed in the `taskLaunch` chain is complete. Thus, we add them onto the `taskLaunch` chain rather than dispatching directly.
    taskLaunch
      .onReady( ...
    ```



src/slave/slave.cpp
Lines 2242 (patched)
<https://reviews.apache.org/r/66144/#comment280906>

    s/task's/tasks'/



src/slave/slave.cpp
Lines 2251 (patched)
<https://reviews.apache.org/r/66144/#comment280907>

    s/tasks'/task's/



src/slave/slave.cpp
Line 2234 (original), 2279 (patched)
<https://reviews.apache.org/r/66144/#comment280910>

    s/expects new/expects a new/
    
    s/task launch/task/



src/slave/slave.cpp
Line 2236 (original), 2281 (patched)
<https://reviews.apache.org/r/66144/#comment280911>

    Use backticks instead of single quotes around `ExitedExecutorMessage`.



src/slave/slave.cpp
Lines 2286 (patched)
<https://reviews.apache.org/r/66144/#comment280903>

    If `framework_ == nullptr` is true, then dereferencing the pointer here will be bad. You could enclose this in a conditional check, `if (framework_ != nullptr)`.


- Greg Mann


On March 29, 2018, 6:15 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated March 29, 2018, 6:15 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/6/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/
-----------------------------------------------------------

(Updated March 29, 2018, 11:15 a.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8617 and MESOS-8624
    https://issues.apache.org/jira/browse/MESOS-8617
    https://issues.apache.org/jira/browse/MESOS-8624


Repository: mesos


Description
-------

Up until now, Mesos does not guarantee in-order
task launch on the agent. There are two asynchronous
steps (unschedule GC and task authorization) in the
agent task launch path. Depending on the CPU scheduling
order, a later task launch may finish these two steps earlier
than its predecessors and get to the launch executor stage
earlier, resulting in out-of-order task delivery.

This patch enforces the task delivery order by sequencing
task launch after the two asynchronous steps, specifically
right before entering `__run()`.


Diffs (updated)
-----

  src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
  src/slave/slave.cpp a4bd4ccd3fc59c3c0e462d9b480f5424b3e52d7a 


Diff: https://reviews.apache.org/r/66144/diff/5/

Changes: https://reviews.apache.org/r/66144/diff/4-5/


Testing
-------

make check


Thanks,

Meng Zhu


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/
-----------------------------------------------------------

(Updated March 28, 2018, 12:05 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8617 and MESOS-8624
    https://issues.apache.org/jira/browse/MESOS-8617
    https://issues.apache.org/jira/browse/MESOS-8624


Repository: mesos


Description
-------

Up until now, Mesos does not guarantee in-order
task launch on the agent. There are two asynchronous
steps (unschedule GC and task authorization) in the
agent task launch path. Depending on the CPU scheduling
order, a later task launch may finish these two steps earlier
than its predecessors and get to the launch executor stage
earlier, resulting in out-of-order task delivery.

This patch enforces the task delivery order by sequencing
task launch after the two asynchronous steps, specifically
right before entering `__run()`.


Diffs (updated)
-----

  src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
  src/slave/slave.cpp e76daabad0d2d68aa42d1da809d4a23459eaaacb 


Diff: https://reviews.apache.org/r/66144/diff/4/

Changes: https://reviews.apache.org/r/66144/diff/3-4/


Testing
-------

make check


Thanks,

Meng Zhu


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.

> On March 23, 2018, 1:12 a.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Lines 2906-2910 (patched)
> > <https://reviews.apache.org/r/66144/diff/3/?file=1985632#file1985632line2910>
> >
> >     In this case, the executor struct exists - do we really want to erase the sequence here?

ah, thanks for catching this!


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review199857
-----------------------------------------------------------


On March 28, 2018, 12:05 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated March 28, 2018, 12:05 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp e76daabad0d2d68aa42d1da809d4a23459eaaacb 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/#review199857
-----------------------------------------------------------




src/slave/slave.hpp
Lines 1147-1152 (patched)
<https://reviews.apache.org/r/66144/#comment280335>

    Nice comment :)



src/slave/slave.cpp
Lines 2220-2222 (patched)
<https://reviews.apache.org/r/66144/#comment280343>

    See my comment in a prevous review regarding implicit capture of the framework pointer.



src/slave/slave.cpp
Lines 2231-2235 (patched)
<https://reviews.apache.org/r/66144/#comment280339>

    These comments are quite good!



src/slave/slave.cpp
Lines 2237 (patched)
<https://reviews.apache.org/r/66144/#comment280340>

    s/direct/directing/



src/slave/slave.cpp
Lines 2477 (patched)
<https://reviews.apache.org/r/66144/#comment280341>

    s/exist/exists/



src/slave/slave.cpp
Lines 2510 (patched)
<https://reviews.apache.org/r/66144/#comment280350>

    We already know that framework is not nullptr here, and below.



src/slave/slave.cpp
Lines 2906-2910 (patched)
<https://reviews.apache.org/r/66144/#comment280342>

    In this case, the executor struct exists - do we really want to erase the sequence here?


- Greg Mann


On March 22, 2018, 7:07 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66144/
> -----------------------------------------------------------
> 
> (Updated March 22, 2018, 7:07 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8617 and MESOS-8624
>     https://issues.apache.org/jira/browse/MESOS-8617
>     https://issues.apache.org/jira/browse/MESOS-8624
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Up until now, Mesos does not guarantee in-order
> task launch on the agent. There are two asynchronous
> steps (unschedule GC and task authorization) in the
> agent task launch path. Depending on the CPU scheduling
> order, a later task launch may finish these two steps earlier
> than its predecessors and get to the launch executor stage
> earlier, resulting in out-of-order task delivery.
> 
> This patch enforces the task delivery order by sequencing
> task launch after the two asynchronous steps, specifically
> right before entering `__run()`.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
>   src/slave/slave.cpp e76daabad0d2d68aa42d1da809d4a23459eaaacb 
> 
> 
> Diff: https://reviews.apache.org/r/66144/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>


Re: Review Request 66144: Enforced task launch order on the agent.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66144/
-----------------------------------------------------------

(Updated March 22, 2018, 12:07 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
-------

Added code to manage the lifecycle of sequences. Also added some verbose comments.


Bugs: MESOS-8617 and MESOS-8624
    https://issues.apache.org/jira/browse/MESOS-8617
    https://issues.apache.org/jira/browse/MESOS-8624


Repository: mesos


Description
-------

Up until now, Mesos does not guarantee in-order
task launch on the agent. There are two asynchronous
steps (unschedule GC and task authorization) in the
agent task launch path. Depending on the CPU scheduling
order, a later task launch may finish these two steps earlier
than its predecessors and get to the launch executor stage
earlier, resulting in out-of-order task delivery.

This patch enforces the task delivery order by sequencing
task launch after the two asynchronous steps, specifically
right before entering `__run()`.


Diffs (updated)
-----

  src/slave/slave.hpp 37f0361251524e63d02d251e8a03901812b8affb 
  src/slave/slave.cpp e76daabad0d2d68aa42d1da809d4a23459eaaacb 


Diff: https://reviews.apache.org/r/66144/diff/3/

Changes: https://reviews.apache.org/r/66144/diff/2-3/


Testing
-------

make check


Thanks,

Meng Zhu