You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Joseph Wu <jo...@mesosphere.io> on 2019/02/13 02:12:17 UTC

Review Request 69960: Added the concept of "orphaned operations" to the master.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/
-----------------------------------------------------------

Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

An orphaned operation is a non-terminal, non-speculative operation whose
originating framework has been torn down.  These operations will
consume resources until they are terminated, but will have no entry
in the allocator because their associated framework no longer exists.

To account for resources used by orphaned operations, the operation's
resources are removed from the agent's total resources upon being
orphaned.

This commit handles one of the two possible code paths which can
introduce orphaned operations.


Diffs
-----

  src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
  src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 


Diff: https://reviews.apache.org/r/69960/diff/1/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On Feb. 20, 2019, 1:22 p.m., Greg Mann wrote:
> > src/master/master.hpp
> > Lines 156 (patched)
> > <https://reviews.apache.org/r/69960/diff/3/?file=2125868#file2125868line156>
> >
> >     s/Marks a non-terminal, non-speculative/Marks a non-speculative/

I think this method will also be called in the case that the agent reregisters with a pending operation for a framework that might not be torn down but isn't known by the current master.

If I'm correct, then I think we should also mention that case here.


- Gastón


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/#review212972
-----------------------------------------------------------


On Feb. 19, 2019, 4:45 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69960/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2019, 4:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphaned operation is a non-terminal, non-speculative operation whose
> originating framework has been torn down.  These operations will
> consume resources until they are terminated, but will have no entry
> in the allocator because their associated framework no longer exists.
> 
> To account for resources used by orphaned operations, the operation's
> resources are removed from the agent's total resources upon being
> orphaned.
> 
> This commit handles one of the two possible code paths which can
> introduce orphaned operations.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69960/diff/3/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/#review212972
-----------------------------------------------------------


Fix it, then Ship it!





src/master/master.hpp
Lines 156 (patched)
<https://reviews.apache.org/r/69960/#comment298840>

    s/Marks a non-terminal, non-speculative/Marks a non-speculative/



src/master/master.hpp
Lines 257 (patched)
<https://reviews.apache.org/r/69960/#comment298841>

    Suggestion:
    
    s/Pending operations whose originating framework is unknown./Operations whose originating framework is unknown. These operations could be pending, or terminal with unacknowledged status updates./


- Greg Mann


On Feb. 20, 2019, 12:45 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69960/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2019, 12:45 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphaned operation is a non-terminal, non-speculative operation whose
> originating framework has been torn down.  These operations will
> consume resources until they are terminated, but will have no entry
> in the allocator because their associated framework no longer exists.
> 
> To account for resources used by orphaned operations, the operation's
> resources are removed from the agent's total resources upon being
> orphaned.
> 
> This commit handles one of the two possible code paths which can
> introduce orphaned operations.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69960/diff/3/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

Posted by Joseph Wu <jo...@mesosphere.io>.

> On Feb. 20, 2019, 1:40 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 10694-10696 (patched)
> > <https://reviews.apache.org/r/69960/diff/3/?file=2125869#file2125869line10694>
> >
> >     What happens if `_allocate()` is executed on the allocator actor in between `removeFramework()` and `updateSlave()`?

Hmmm... this could be bad.  If `_allocate()` is called after removing the framework, the allocator will have de-allocated the framework, but still know about any resources used by pending operations.  The `_allocate()` would then call the `Master::offer` callback.  If the master deems the offer invalid (as is possible for many reasons), the master will call `Allocator::recoverResources()`.

But the allocator's dispatch queue would place the `recoverResources` call after the `updateSlave` calls from this block of code.  And if we attempt to recover non-existent resources, we CHECK fail.

To avoid this case, the two allocator calls here must be turned into a sort of critical section via `allocator->pause/resume`.  That will prevent any allocations from being interleaved.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/#review212984
-----------------------------------------------------------


On Feb. 19, 2019, 4:45 p.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69960/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2019, 4:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphaned operation is a non-terminal, non-speculative operation whose
> originating framework has been torn down.  These operations will
> consume resources until they are terminated, but will have no entry
> in the allocator because their associated framework no longer exists.
> 
> To account for resources used by orphaned operations, the operation's
> resources are removed from the agent's total resources upon being
> orphaned.
> 
> This commit handles one of the two possible code paths which can
> introduce orphaned operations.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69960/diff/3/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/#review212984
-----------------------------------------------------------




src/master/master.cpp
Lines 10694-10696 (patched)
<https://reviews.apache.org/r/69960/#comment298845>

    What happens if `_allocate()` is executed on the allocator actor in between `removeFramework()` and `updateSlave()`?


- Greg Mann


On Feb. 20, 2019, 12:45 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69960/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2019, 12:45 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphaned operation is a non-terminal, non-speculative operation whose
> originating framework has been torn down.  These operations will
> consume resources until they are terminated, but will have no entry
> in the allocator because their associated framework no longer exists.
> 
> To account for resources used by orphaned operations, the operation's
> resources are removed from the agent's total resources upon being
> orphaned.
> 
> This commit handles one of the two possible code paths which can
> introduce orphaned operations.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69960/diff/3/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/#review213174
-----------------------------------------------------------


Ship it!




Ship It!

- Greg Mann


On Feb. 22, 2019, 12:06 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69960/
> -----------------------------------------------------------
> 
> (Updated Feb. 22, 2019, 12:06 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphaned operation is a non-speculative operation whose
> originating framework is unknown.  These operations will consume
> resources until they are terminated, but will have no entry
> in the allocator because their associated framework does not exist.
> 
> To account for resources used by orphaned operations, the operation's
> resources are removed from the agent's total resources upon being
> orphaned.
> 
> This commit handles one of the two possible code paths which can
> introduce orphaned operations.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69960/diff/4/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/
-----------------------------------------------------------

(Updated Feb. 21, 2019, 4:06 p.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
-------

* Reworded/reworked some comments based on various changes made to the chain over the last few revisions.
* When removing a framework, terminal operations are also marked as orphans now.
* Added a critical section to the allocator when removing a framework and its operations.
* Added a log message when orphaning an operation.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description (updated)
-------

An orphaned operation is a non-speculative operation whose
originating framework is unknown.  These operations will consume
resources until they are terminated, but will have no entry
in the allocator because their associated framework does not exist.

To account for resources used by orphaned operations, the operation's
resources are removed from the agent's total resources upon being
orphaned.

This commit handles one of the two possible code paths which can
introduce orphaned operations.


Diffs (updated)
-----

  src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
  src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 


Diff: https://reviews.apache.org/r/69960/diff/4/

Changes: https://reviews.apache.org/r/69960/diff/3-4/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/
-----------------------------------------------------------

(Updated Feb. 19, 2019, 4:45 p.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
-------

Allowed orphan operations to be added in a terminal state.  This happens mainly when an operation is unacknowledged, but complete.
The rest of the chain already handles this case because the transition from non-terminal to terminal was allowed as an orphan.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

An orphaned operation is a non-terminal, non-speculative operation whose
originating framework has been torn down.  These operations will
consume resources until they are terminated, but will have no entry
in the allocator because their associated framework no longer exists.

To account for resources used by orphaned operations, the operation's
resources are removed from the agent's total resources upon being
orphaned.

This commit handles one of the two possible code paths which can
introduce orphaned operations.


Diffs (updated)
-----

  src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
  src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 


Diff: https://reviews.apache.org/r/69960/diff/3/

Changes: https://reviews.apache.org/r/69960/diff/2-3/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu