You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@apache.org> on 2018/06/20 05:33:04 UTC
Review Request 67664: Fixed a race between `UPDATE_STATE` and
`UPDATE_OPERATION_STATUS`.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/
-----------------------------------------------------------
Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
Bugs: MESOS-9010
https://issues.apache.org/jira/browse/MESOS-9010
Repository: mesos
Description
-------
Since a resource provider and its operation status update manager run in
different actors, the `UPDATE_OPERATION_STATUS` call of a completed
operation may race with an `UPDATE_STATE` call.
To deal with this race, the agent should update the latest statuses of
all completed operations received in the `UPDATE_STATE` call to avoid
erroneously applying those operations when receiving
`UPDATE_OPERATION_STATUS`es.
Diffs
-----
src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a
src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f
Diff: https://reviews.apache.org/r/67664/diff/1/
Testing
-------
sudo make check
Thanks,
Chun-Hung Hsiao
Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and
`UPDATE_OPERATION_STATUS`.
Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/#review205052
-----------------------------------------------------------
Fix it, then Ship it!
src/slave/slave.cpp
Lines 7738-7746 (patched)
<https://reviews.apache.org/r/67664/#comment287932>
Let's explicitly call out that the operation result is already reflected in the reported total, e.g.
// Handle operations known to both the agent and the resource provider.
//
// If an operation became terminal it is already reflected in the total
// reported by the resource provider and should not be applied again in
// e.g., the `UPDATE_OPERATION_STATUS` handler when a status update
// arrives. Set the terminal `latest_status` here to prevent resource
// mutations elsewhere.
//
// NOTE: We only update the `latest_status` of a known operation if it
// is not terminal yet here; its `statuses` would be updated by an
// operation status update handler.
- Benjamin Bannier
On June 20, 2018, 7:33 a.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67664/
> -----------------------------------------------------------
>
> (Updated June 20, 2018, 7:33 a.m.)
>
>
> Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
>
>
> Bugs: MESOS-9010
> https://issues.apache.org/jira/browse/MESOS-9010
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Since a resource provider and its operation status update manager run in
> different actors, the `UPDATE_OPERATION_STATUS` call of a completed
> operation may race with an `UPDATE_STATE` call.
>
> To deal with this race, the agent should update the latest statuses of
> all completed operations received in the `UPDATE_STATE` call to avoid
> erroneously applying those operations when receiving
> `UPDATE_OPERATION_STATUS`es.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a
> src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f
>
>
> Diff: https://reviews.apache.org/r/67664/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>
Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and
`UPDATE_OPERATION_STATUS`.
Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/
-----------------------------------------------------------
(Updated July 6, 2018, 11:25 p.m.)
Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
Changes
-------
Addressed Jan's comment.
Bugs: MESOS-9010
https://issues.apache.org/jira/browse/MESOS-9010
Repository: mesos
Description
-------
Since a resource provider and its operation status update manager run in
different actors, the `UPDATE_OPERATION_STATUS` call of a completed
operation may race with an `UPDATE_STATE` call.
To deal with this race, the agent should update the latest statuses of
all completed operations received in the `UPDATE_STATE` call to avoid
erroneously applying those operations when receiving
`UPDATE_OPERATION_STATUS`es.
Diffs (updated)
-----
src/slave/slave.hpp bf14d3569e677b2be6790ef774985df6937ebb29
src/slave/slave.cpp 06c2f5ffb6ac79b746c1db4a6762b9dc7e88c471
src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f
Diff: https://reviews.apache.org/r/67664/diff/3/
Changes: https://reviews.apache.org/r/67664/diff/2-3/
Testing
-------
sudo make check
Thanks,
Chun-Hung Hsiao
Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and
`UPDATE_OPERATION_STATUS`.
Posted by Jan Schlicht <ja...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/#review205568
-----------------------------------------------------------
Fix it, then Ship it!
src/slave/slave.cpp
Lines 8047-8048 (patched)
<https://reviews.apache.org/r/67664/#comment288451>
Not yours, but let's remove this comment, as it's just stating what is done in the code below.
- Jan Schlicht
On June 21, 2018, 6:29 a.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67664/
> -----------------------------------------------------------
>
> (Updated June 21, 2018, 6:29 a.m.)
>
>
> Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
>
>
> Bugs: MESOS-9010
> https://issues.apache.org/jira/browse/MESOS-9010
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Since a resource provider and its operation status update manager run in
> different actors, the `UPDATE_OPERATION_STATUS` call of a completed
> operation may race with an `UPDATE_STATE` call.
>
> To deal with this race, the agent should update the latest statuses of
> all completed operations received in the `UPDATE_STATE` call to avoid
> erroneously applying those operations when receiving
> `UPDATE_OPERATION_STATUS`es.
>
>
> Diffs
> -----
>
> src/slave/slave.hpp bf14d3569e677b2be6790ef774985df6937ebb29
> src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a
> src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f
>
>
> Diff: https://reviews.apache.org/r/67664/diff/2/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>
Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and
`UPDATE_OPERATION_STATUS`.
Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/
-----------------------------------------------------------
(Updated June 21, 2018, 4:29 a.m.)
Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
Changes
-------
Addressed Benjamin's comments and move the update logic to a function as Jie suggested.
Bugs: MESOS-9010
https://issues.apache.org/jira/browse/MESOS-9010
Repository: mesos
Description
-------
Since a resource provider and its operation status update manager run in
different actors, the `UPDATE_OPERATION_STATUS` call of a completed
operation may race with an `UPDATE_STATE` call.
To deal with this race, the agent should update the latest statuses of
all completed operations received in the `UPDATE_STATE` call to avoid
erroneously applying those operations when receiving
`UPDATE_OPERATION_STATUS`es.
Diffs (updated)
-----
src/slave/slave.hpp bf14d3569e677b2be6790ef774985df6937ebb29
src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a
src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f
Diff: https://reviews.apache.org/r/67664/diff/2/
Changes: https://reviews.apache.org/r/67664/diff/1-2/
Testing
-------
sudo make check
Thanks,
Chun-Hung Hsiao
Re: Review Request 67664: Fixed a race between `UPDATE_STATE` and
`UPDATE_OPERATION_STATUS`.
Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67664/#review205054
-----------------------------------------------------------
src/slave/slave.cpp
Lines 7750 (patched)
<https://reviews.apache.org/r/67664/#comment287933>
We might even extend this condition to only trigger when the update is terminal itself.
- Benjamin Bannier
On June 20, 2018, 7:33 a.m., Chun-Hung Hsiao wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67664/
> -----------------------------------------------------------
>
> (Updated June 20, 2018, 7:33 a.m.)
>
>
> Review request for mesos, Benjamin Bannier, Greg Mann, Jie Yu, and Jan Schlicht.
>
>
> Bugs: MESOS-9010
> https://issues.apache.org/jira/browse/MESOS-9010
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Since a resource provider and its operation status update manager run in
> different actors, the `UPDATE_OPERATION_STATUS` call of a completed
> operation may race with an `UPDATE_STATE` call.
>
> To deal with this race, the agent should update the latest statuses of
> all completed operations received in the `UPDATE_STATE` call to avoid
> erroneously applying those operations when receiving
> `UPDATE_OPERATION_STATUS`es.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp 8edd652f7f410dbadaf6c2ca3736349065e4340a
> src/tests/slave_tests.cpp b46fb8efc524852f62428040ff958bd44e9efe9f
>
>
> Diff: https://reviews.apache.org/r/67664/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Chun-Hung Hsiao
>
>