You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2019/03/07 00:10:57 UTC

Review Request 70147: WIP: Added a Sequence to the master to order updates to agent resources.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70147/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and Meng Zhu.


Bugs: MESOS-9460
    https://issues.apache.org/jira/browse/MESOS-9460


Repository: mesos


Description
-------

This patch adds a new `Sequence` data member to the master
which is used to prevent interleavings of master/allocator
state updates which could lead to inconsistent state in
the master and allocator actors.


Diffs
-----

  src/master/master.hpp 90e08149ece595147ca4a93da215385917a0f372 
  src/master/master.cpp b9db4ffd4ee8ea4a8e44a35d1afb6c1b8e03d74d 


Diff: https://reviews.apache.org/r/70147/diff/1/


Testing
-------

`bin/mesos-tests.sh --gtest_filter="*SpeculativeOperationRacesWithUpdateSlaveMessage*" --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 70147: WIP: Added a Sequence to the master to order updates to agent resources.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70147/
-----------------------------------------------------------

(Updated March 7, 2019, 12:26 a.m.)


Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and Meng Zhu.


Bugs: MESOS-9460
    https://issues.apache.org/jira/browse/MESOS-9460


Repository: mesos


Description (updated)
-------

This patch adds a new `Sequence` data member to the master
which is used to prevent interleavings of master/allocator
state updates which could lead to inconsistent state in
the master and allocator actors.

For example, the following interleaving of events would
previously lead to inconsistent state between the master
and allocator:

1) Master receives a RESERVE operation for agent A via the
   operator API. This invokes `Master::apply()`, which
   calls `allocator->updateAvailable()` for agent A.
2) Master receives an `UpdateSlaveMessage` containing
   oversubscribed resources from agent A. The
   `Master::updateSlave()` handler invokes
   `allocator->updateSlave()` which uses _stale_ resources
   from the `Slave` struct to update the allocator's view
   of agent A's resources. Once that event is processed by
   the allocator, the allocator will not include the
   reserved resources in agent A's total.
3) After the `allocator->updateAvailable()` call from #1
   returns, `Master::_apply()` is invoked, which updates
   the `Slave` struct for agent A to include the reserved
   resources. The master and allocator's views of agent
   A's total resources are now inconsistent.


Diffs
-----

  src/master/master.hpp 90e08149ece595147ca4a93da215385917a0f372 
  src/master/master.cpp b9db4ffd4ee8ea4a8e44a35d1afb6c1b8e03d74d 


Diff: https://reviews.apache.org/r/70147/diff/1/


Testing
-------

`bin/mesos-tests.sh --gtest_filter="*SpeculativeOperationRacesWithUpdateSlaveMessage*" --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 70147: WIP: Added a Sequence to the master to order updates to agent resources.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70147/
-----------------------------------------------------------

(Updated March 7, 2019, 12:25 a.m.)


Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and Meng Zhu.


Bugs: MESOS-9460
    https://issues.apache.org/jira/browse/MESOS-9460


Repository: mesos


Description (updated)
-------

This patch adds a new `Sequence` data member to the master
which is used to prevent interleavings of master/allocator
state updates which could lead to inconsistent state in
the master and allocator actors.

For example, the following interleaving of events would
previously lead to inconsistent state between the master
and allocator:

1) Master receives a RESERVE operation for agent A via the
   operator API. This invokes `Master::apply()`, which
   calls `allocator->updateAvailable()` for agent A.
2) Master receives an `UpdateSlaveMessage` containing
   oversubscribed resources from agent A. The handler
   `Master::updateSlave()` invokes
   `allocator->updateSlave()` which uses _stale_ resources
   from the `Slave` struct to update the allocator's view
   of agent A's resources. Once that event is processed by
   the allocator, the allocator will not include the
   reserved resources in agent A's total.
3) After the `allocator->updateAvailable()` call from #1
   returns, `Master::_apply()` is invoked, which updates
   the `Slave` struct for agent A to include the reserved
   resources. The master and allocator's views of agent
   A's total resources are now inconsistent.


Diffs
-----

  src/master/master.hpp 90e08149ece595147ca4a93da215385917a0f372 
  src/master/master.cpp b9db4ffd4ee8ea4a8e44a35d1afb6c1b8e03d74d 


Diff: https://reviews.apache.org/r/70147/diff/1/


Testing
-------

`bin/mesos-tests.sh --gtest_filter="*SpeculativeOperationRacesWithUpdateSlaveMessage*" --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann