You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@apache.org> on 2018/06/20 05:33:04 UTC

Review Request 67664: Fixed a race between `UPDATE_STATE` and `UPDATE_OPERATION_STATUS`.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/
-----------------------------------------------------------

Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.


Bugs: MESOS-9010
    https://issues.apache.org/jira/browse/MESOS-9010


Repository: mesos


Description
-------

Since a resource provider and its operation status update manager run in
different actors, the `UPDATE_OPERATION_STATUS` call of a completed
operation may race with an `UPDATE_STATE` call.

To deal with this race, the agent should update the latest statuses of
all completed operations received in the `UPDATE_STATE` call to avoid
erroneously applying those operations when receiving
`UPDATE_OPERATION_STATUS`es.


Diffs
-----

  src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a 
  src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f 


Diff: https://reviews.apache.org/r/67664/diff/1/


Testing
-------

sudo make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and `UPDATE_OPERATION_STATUS`.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/#review205052
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Lines 7738-7746 (patched)
<https://reviews.apache.org/r/67664/#comment287932>

    Let's explicitly call out that the operation result is already reflected in the reported total, e.g.
    
        // Handle operations known to both the agent and the resource provider.
        //
        // If an operation became terminal it is already reflected in the total
        // reported by the resource provider and should not be applied again in
        // e.g., the `UPDATE_OPERATION_STATUS` handler when a status update
        // arrives.  Set the terminal `latest_status` here to prevent resource
        // mutations elsewhere.
        //
        // NOTE: We only update the `latest_status` of a known operation if it
        // is not terminal yet here; its `statuses` would be updated by an
        // operation status update handler.


- Benjamin Bannier


On June 20, 2018, 7:33 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67664/
> -----------------------------------------------------------
> 
> (Updated June 20, 2018, 7:33 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9010
>     https://issues.apache.org/jira/browse/MESOS-9010
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Since a resource provider and its operation status update manager run in
> different actors, the `UPDATE_OPERATION_STATUS` call of a completed
> operation may race with an `UPDATE_STATE` call.
> 
> To deal with this race, the agent should update the latest statuses of
> all completed operations received in the `UPDATE_STATE` call to avoid
> erroneously applying those operations when receiving
> `UPDATE_OPERATION_STATUS`es.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a 
>   src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f 
> 
> 
> Diff: https://reviews.apache.org/r/67664/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and `UPDATE_OPERATION_STATUS`.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/
-----------------------------------------------------------

(Updated July 6, 2018, 11:25 p.m.)


Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.


Changes
-------

Addressed Jan's comment.


Bugs: MESOS-9010
    https://issues.apache.org/jira/browse/MESOS-9010


Repository: mesos


Description
-------

Since a resource provider and its operation status update manager run in
different actors, the `UPDATE_OPERATION_STATUS` call of a completed
operation may race with an `UPDATE_STATE` call.

To deal with this race, the agent should update the latest statuses of
all completed operations received in the `UPDATE_STATE` call to avoid
erroneously applying those operations when receiving
`UPDATE_OPERATION_STATUS`es.


Diffs (updated)
-----

  src/slave/slave.hpp bf14d3569e677b2be6790ef774985df6937ebb29 
  src/slave/slave.cpp 06c2f5ffb6ac79b746c1db4a6762b9dc7e88c471 
  src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f 


Diff: https://reviews.apache.org/r/67664/diff/3/

Changes: https://reviews.apache.org/r/67664/diff/2-3/


Testing
-------

sudo make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and `UPDATE_OPERATION_STATUS`.

Posted by Jan Schlicht <ja...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/#review205568
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Lines 8047-8048 (patched)
<https://reviews.apache.org/r/67664/#comment288451>

    Not yours, but let's remove this comment, as it's just stating what is done in the code below.


- Jan Schlicht


On June 21, 2018, 6:29 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67664/
> -----------------------------------------------------------
> 
> (Updated June 21, 2018, 6:29 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9010
>     https://issues.apache.org/jira/browse/MESOS-9010
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Since a resource provider and its operation status update manager run in
> different actors, the `UPDATE_OPERATION_STATUS` call of a completed
> operation may race with an `UPDATE_STATE` call.
> 
> To deal with this race, the agent should update the latest statuses of
> all completed operations received in the `UPDATE_STATE` call to avoid
> erroneously applying those operations when receiving
> `UPDATE_OPERATION_STATUS`es.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp bf14d3569e677b2be6790ef774985df6937ebb29 
>   src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a 
>   src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f 
> 
> 
> Diff: https://reviews.apache.org/r/67664/diff/2/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and `UPDATE_OPERATION_STATUS`.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/
-----------------------------------------------------------

(Updated June 21, 2018, 4:29 a.m.)


Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.


Changes
-------

Addressed Benjamin's comments and move the update logic to a function as Jie suggested.


Bugs: MESOS-9010
    https://issues.apache.org/jira/browse/MESOS-9010


Repository: mesos


Description
-------

Since a resource provider and its operation status update manager run in
different actors, the `UPDATE_OPERATION_STATUS` call of a completed
operation may race with an `UPDATE_STATE` call.

To deal with this race, the agent should update the latest statuses of
all completed operations received in the `UPDATE_STATE` call to avoid
erroneously applying those operations when receiving
`UPDATE_OPERATION_STATUS`es.


Diffs (updated)
-----

  src/slave/slave.hpp bf14d3569e677b2be6790ef774985df6937ebb29 
  src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a 
  src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f 


Diff: https://reviews.apache.org/r/67664/diff/2/

Changes: https://reviews.apache.org/r/67664/diff/1-2/


Testing
-------

sudo make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and `UPDATE_OPERATION_STATUS`.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/#review205054
-----------------------------------------------------------




src/slave/slave.cpp
Lines 7750 (patched)
<https://reviews.apache.org/r/67664/#comment287933>

    We might even extend this condition to only trigger when the update is terminal itself.


- Benjamin Bannier


On June 20, 2018, 7:33 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67664/
> -----------------------------------------------------------
> 
> (Updated June 20, 2018, 7:33 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9010
>     https://issues.apache.org/jira/browse/MESOS-9010
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Since a resource provider and its operation status update manager run in
> different actors, the `UPDATE_OPERATION_STATUS` call of a completed
> operation may race with an `UPDATE_STATE` call.
> 
> To deal with this race, the agent should update the latest statuses of
> all completed operations received in the `UPDATE_STATE` call to avoid
> erroneously applying those operations when receiving
> `UPDATE_OPERATION_STATUS`es.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a 
>   src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f 
> 
> 
> Diff: https://reviews.apache.org/r/67664/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>