You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@apache.org> on 2019/03/06 00:53:58 UTC

Review Request 70132: Do not implicitly refuse speculatively converted resources.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

Review request for mesos, Benjamin Mahler and Meng Zhu.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
-------

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly refused. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be refused.


Diffs
-----

  docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
  src/master/master.cpp 015da54583448a8d102d8e401e48bd228baf6dd6 
  src/tests/slave_tests.cpp 22a0295086ae4f4ec26df00a0e077eecfa27f1fb 


Diff: https://reviews.apache.org/r/70132/diff/1/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214901
-----------------------------------------------------------


Fix it, then Ship it!




Thank you Chun!

I think the description is still a bit tough of a read, here's a
suggestion to work off of that hopefully makes the issue clearer to
the uninitiated:

```
Currently, if a framework accepts an offer with a RESERVE operation,
the RESERVE resoures will be implicitly declined. This is counter to
what one would expect (that only the remaining resources in the offer
will be declined):

  Offer cpus:10 -> ACCEPT with RESERVE cpus(role):1
                   *Actual* implicit decline: cpus:9;cpus(role):1
                   *Expected* implicit decline: cpus:9

The same issue is present with the other transformational operations
(i.e. UNRESERVE, CREATE, ...).

This patch fixes the issue by computing the transformed "remaining"
resources in the offer and subtracting the converted resources from
the remaining resources. In the example above:

  Offered:   cpus:10
  Remaining: cpus:9;cpus(role):1
  Implicitly Declined = Remaining - (Remaining - Offered)
                      = cpus:9;cpus(role):1 - cpus(role):1
                      = cpus:9
```


src/master/master.cpp
Lines 5941-5950 (patched)
<https://reviews.apache.org/r/70132/#comment301196>

    I think some clear formulas would be more readable over the wall of text?
    
    ```
    // We now need to compute the amounts of resources to (1) decline with the
    // filter and (2) recover without a filter.
    //
    //   Implicity      (Offered Resources).apply(Operations)
    //    Declined  =     - LAUNCH[_GROUP] Resources
    //   Resources        - Converted Remaining Resources
    //
    //                      |
    //                      V
    //
    //   Implicity      _offeredResources
    //    Declined  =     - Converted Remaining Resources
    //   Resources
    //
    //                      |
    //                      V
    //
    //   Implicity      _offeredResources
    //    Declined  =     - (_offeredResources - offeredResources)
    //   Resources
    //
    ```



src/master/master.cpp
Lines 5944-5945 (patched)
<https://reviews.apache.org/r/70132/#comment301194>

    "We should" seems to imply there's no api contract or anything, probably this can just phrase it as what we're supposed to do and how we compute it?



src/master/master.cpp
Lines 5943-5944 (original), 5963-5964 (patched)
<https://reviews.apache.org/r/70132/#comment301195>

    Let's remove this TODO, I don't think it's accurate. If pausing is a workaround of anything, it's that we don't allow recovering both pieces of the offer in a single call, which I don't believe there's a ticket for


- Benjamin Mahler


On April 25, 2019, 10:14 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 25, 2019, 10:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/7/
> 
> 
> Testing
> -------
> 
> make check
> 
> More testing done in r/70537.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

(Updated May 1, 2019, 8:35 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
-------

Addressed BenM's comments.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description (updated)
-------

Currently if a framework accepts an offer with a `RESERVE` operation
without a task consuming the reserved resources, the resources will be
implicitly declined. This is counter to what one would expect (that only
the remaining resources in the offer will be declined):

  Offer `cpus:10` -> `ACCEPT` with `RESERVE cpus(role):1`
                     *Actual* implicit decline: `cpus:9;cpus(role):1`
                     *Expected* implicit decline: `cpus:9`

The same issue is present with other transformational operations (i.e.,
`UNRESERVE`, `CREATE` and `DESTROY`).

This patch fixes this issue by only implicitly declining the "remaining"
untransformed resources, computed as follows:

  Offered = `cpus:10`
  Remaining = `cpus:9;cpus(role):1`
  Implicitly declined = remaining - (remaining - offered)
                      = `cpus:9;cpus(role):1` - `cpus:(role):1`
                      = `cpus:9`


Diffs (updated)
-----

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/8/

Changes: https://reviews.apache.org/r/70132/diff/7-8/


Testing
-------

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

(Updated April 25, 2019, 10:14 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
-------

Moved documentation to r/70547.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
-------

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-----

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/7/

Changes: https://reviews.apache.org/r/70132/diff/6-7/


Testing
-------

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

(Updated April 24, 2019, 2:02 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
-------

Addressed some of Benjamin's comments and added examples.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
-------

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-----

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/6/

Changes: https://reviews.apache.org/r/70132/diff/5-6/


Testing (updated)
-------

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/tests/slave_tests.cpp
> > Lines 6499 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140651#file2140651line6499>
> >
> >     Since the changes in this patch are strongly related to behavior framework authors need to reason about I strongly feel that we must add a test for the expected behavior.
> 
> Chun-Hung Hsiao wrote:
>     I could add a unit test in a separated patch. This patch itself will be backported, after discussed with @bmahler.

Done in r/70537.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> >     What do you think of getting rid of "implicitly declined" behavior for "cancelling operations"?
> >     
> >     It seems that behavior is more driven by the implementation than intuitive api behavior; it e.g., forces frameworks to reason differently about operations executed in isolation vs. executed together. It seems having the identical behavior for both cases would both be easier to explain and also program against. The behavior that seems to make most sense for me would be to only ever implictly decline "untouched resources", e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
>     It seems to me that "cancelling operations" as something that are both 1. very rare and 2. make little sense for frameworks, so I'm more like delivering a fix for common cases without making the alrealy-messy code path more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's suggestion? IIRC you mentioned something like some are designed behaviors before, but I didn't know the context.
> 
> Benjamin Mahler wrote:
>     Thanks for bringing this up, it's certainly a bit bizarre of a use case. I think the more common case is UNRESERVE on its own, where it still seems a bit bizarre that the "untouched" resources are declined with the filter and the UNRESERVE resources are not filtered. That seems a bit arbitrary to me, but I'm not sure what to do about it without allowing the framework to be explicit about which part it wants to "decline and filter" when accepting, and this requires an interface change.
>     
>     Personally I would consider RESERVE+UNRESERVE to be "touching" those resources, but I don't think we should worry about it in this patch (I assume that wasn't your intent anyway, and you were more wanting to raise this topic for discussion?)
> 
> Benjamin Bannier wrote:
>     What I worry most is that this edge case makes explaining suggested framework behavior harder ("should any of the offer operations in a single accept call cancel each other out you will not get offered the resources again until the default offer filter timeout expires (the timeout isn't up to you here)" -> framework defensively revives after each accept call if it has more work to do). Instead we would like frameworks to focus on getting their offer handling and decline behavior correct and only ever revive in exceptional scenarios (e.g., even "_new_ work arrived").
>     
>     Since this patch tries to fix incorrect master behavior we should make sure to get the behavior somewhat right or else risk that frameworks implement suboptimal behavior which will be hard to unlearn. That being said, the fact that no framework author complained when this bug was introduced makes me worry that they either do not care about how fast offers arrive or already implement a overly pessimistc approach (e.g., revive whenever there is more work to do in their state machine).

The timeout is still controlled by the scheduler even in the case where operations cancel each other out.

We're trying to move to a world where frameworks are more expilict about what filters they want to set up. With this in mind, it seems to me that neither the current fix nor what you suggested fits into this goal well, because here we're *guessing* what the frameworks' intentions. Say if a framework reserves two CPUs and unreserves the two reserved CPUs and then reserves two CPUs again, there is no way to infer if the framework is trying to use just two CPUs or four CPUs. This could become really messy in terms of both semantics and implementation.

Since as you said we're not aware of any complain about this, I'd say let's keep the logic simple and determine the declined resources based on the end result of an `ACCEPT` call. Dropping this for now.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------


On April 25, 2019, 10:14 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 25, 2019, 10:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/7/
> 
> 
> Testing
> -------
> 
> make check
> 
> More testing done in r/70537.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> >     What do you think of getting rid of "implicitly declined" behavior for "cancelling operations"?
> >     
> >     It seems that behavior is more driven by the implementation than intuitive api behavior; it e.g., forces frameworks to reason differently about operations executed in isolation vs. executed together. It seems having the identical behavior for both cases would both be easier to explain and also program against. The behavior that seems to make most sense for me would be to only ever implictly decline "untouched resources", e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.

It seems to me that "cancelling operations" as something that are both 1. very rare and 2. make little sense for frameworks, so I'm more like delivering a fix for common cases without making the alrealy-messy code path more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's suggestion? IIRC you mentioned something like some are designed behaviors before, but I didn't know the context.


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/master/master.cpp
> > Lines 5963-5964 (original), 5983-5984 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140650#file2140650line5983>
> >
> >     Is this a workaround we need until MESOS-4553 gets resolved? If it is, let's add a `TODO`.

I don't know actually lol. I just copied it from https://github.com/apache/mesos/blob/45c9788618e7123f408a1dffcf6772a1285cd2e5/src/master/master.cpp#L10969-L10972, as @mzhu suggested that if there's an allocation in between there might be offer fragmentation. Is this a workaround for MESOS-4553?


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/tests/slave_tests.cpp
> > Lines 6499 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140651#file2140651line6499>
> >
> >     Since the changes in this patch are strongly related to behavior framework authors need to reason about I strongly feel that we must add a test for the expected behavior.

I could add a unit test in a separated patch. This patch itself will be backported, after discussed with @bmahler.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Benjamin Mahler <bm...@apache.org>.

> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> >     What do you think of getting rid of "implicitly declined" behavior for "cancelling operations"?
> >     
> >     It seems that behavior is more driven by the implementation than intuitive api behavior; it e.g., forces frameworks to reason differently about operations executed in isolation vs. executed together. It seems having the identical behavior for both cases would both be easier to explain and also program against. The behavior that seems to make most sense for me would be to only ever implictly decline "untouched resources", e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
>     It seems to me that "cancelling operations" as something that are both 1. very rare and 2. make little sense for frameworks, so I'm more like delivering a fix for common cases without making the alrealy-messy code path more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's suggestion? IIRC you mentioned something like some are designed behaviors before, but I didn't know the context.

Thanks for bringing this up, it's certainly a bit bizarre of a use case. I think the more common case is UNRESERVE on its own, where it still seems a bit bizarre that the "untouched" resources are declined with the filter and the UNRESERVE resources are not filtered. That seems a bit arbitrary to me, but I'm not sure what to do about it without allowing the framework to be explicit about which part it wants to "decline and filter" when accepting, and this requires an interface change.

Personally I would consider RESERVE+UNRESERVE to be "touching" those resources, but I don't think we should worry about it in this patch (I assume that wasn't your intent anyway, and you were more wanting to raise this topic for discussion?)


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/master/master.cpp
> > Lines 5963-5964 (original), 5983-5984 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140650#file2140650line5983>
> >
> >     Is this a workaround we need until MESOS-4553 gets resolved? If it is, let's add a `TODO`.
> 
> Chun-Hung Hsiao wrote:
>     I don't know actually lol. I just copied it from https://github.com/apache/mesos/blob/45c9788618e7123f408a1dffcf6772a1285cd2e5/src/master/master.cpp#L10969-L10972, as @mzhu suggested that if there's an allocation in between there might be offer fragmentation. Is this a workaround for MESOS-4553?

Well, one could say all the interactions with the allocator around offers are a "workaround" until MESOS-4553 is done :)

I would say that the pause/resume here is more a workaround of the limited recoverResources interface (i.e. it doesn't let you specify a collection of resources and filters, so we need to perform two calls). The only issue with pause/resume is: https://issues.apache.org/jira/browse/MESOS-9734


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On April 23, 2019, 12:47 p.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> >     What do you think of getting rid of "implicitly declined" behavior for "cancelling operations"?
> >     
> >     It seems that behavior is more driven by the implementation than intuitive api behavior; it e.g., forces frameworks to reason differently about operations executed in isolation vs. executed together. It seems having the identical behavior for both cases would both be easier to explain and also program against. The behavior that seems to make most sense for me would be to only ever implictly decline "untouched resources", e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
>     It seems to me that "cancelling operations" as something that are both 1. very rare and 2. make little sense for frameworks, so I'm more like delivering a fix for common cases without making the alrealy-messy code path more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's suggestion? IIRC you mentioned something like some are designed behaviors before, but I didn't know the context.
> 
> Benjamin Mahler wrote:
>     Thanks for bringing this up, it's certainly a bit bizarre of a use case. I think the more common case is UNRESERVE on its own, where it still seems a bit bizarre that the "untouched" resources are declined with the filter and the UNRESERVE resources are not filtered. That seems a bit arbitrary to me, but I'm not sure what to do about it without allowing the framework to be explicit about which part it wants to "decline and filter" when accepting, and this requires an interface change.
>     
>     Personally I would consider RESERVE+UNRESERVE to be "touching" those resources, but I don't think we should worry about it in this patch (I assume that wasn't your intent anyway, and you were more wanting to raise this topic for discussion?)

What I worry most is that this edge case makes explaining suggested framework behavior harder ("should any of the offer operations in a single accept call cancel each other out you will not get offered the resources again until the default offer filter timeout expires (the timeout isn't up to you here)" -> framework defensively revives after each accept call if it has more work to do). Instead we would like frameworks to focus on getting their offer handling and decline behavior correct and only ever revive in exceptional scenarios (e.g., even "_new_ work arrived").

Since this patch tries to fix incorrect master behavior we should make sure to get the behavior somewhat right or else risk that frameworks implement suboptimal behavior which will be hard to unlearn. That being said, the fact that no framework author complained when this bug was introduced makes me worry that they either do not care about how fast offers arrive or already implement a overly pessimistc approach (e.g., revive whenever there is more work to do in their state machine).


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------


On April 23, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 23, 2019, 3:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-----------------------------------------------------------




docs/scheduler-http-api.md
Line 132 (original), 132 (patched)
<https://reviews.apache.org/r/70132/#comment301063>

    What do you think of getting rid of "implicitly declined" behavior for "cancelling operations"?
    
    It seems that behavior is more driven by the implementation than intuitive api behavior; it e.g., forces frameworks to reason differently about operations executed in isolation vs. executed together. It seems having the identical behavior for both cases would both be easier to explain and also program against. The behavior that seems to make most sense for me would be to only ever implictly decline "untouched resources", e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.



src/master/master.cpp
Lines 5963-5964 (original), 5983-5984 (patched)
<https://reviews.apache.org/r/70132/#comment301062>

    Is this a workaround we need until MESOS-4553 gets resolved? If it is, let's add a `TODO`.



src/tests/slave_tests.cpp
Lines 6499 (patched)
<https://reviews.apache.org/r/70132/#comment301064>

    Since the changes in this patch are strongly related to behavior framework authors need to reason about I strongly feel that we must add a test for the expected behavior.


- Benjamin Bannier


On April 23, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated April 23, 2019, 3:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

(Updated April 23, 2019, 1:15 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
-------

Updated variable names and comments to make it easier to understand.


Summary (updated)
-----------------

Do not implicitly decline speculatively converted resources.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description (updated)
-------

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-----

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/5/

Changes: https://reviews.apache.org/r/70132/diff/4-5/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

(Updated April 9, 2019, 9:16 p.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
-------

Renamed `_offeredResources` as BenM suggested offline.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
-------

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly refused. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be refused.


Diffs (updated)
-----

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp 528a25a837513f153de2a5e89897440144385633 


Diff: https://reviews.apache.org/r/70132/diff/4/

Changes: https://reviews.apache.org/r/70132/diff/3-4/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

(Updated March 8, 2019, 12:23 a.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
-------

Addressed BenM's comments.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
-------

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly refused. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be refused.


Diffs (updated)
-----

  docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
  src/master/master.cpp b9db4ffd4ee8ea4a8e44a35d1afb6c1b8e03d74d 
  src/tests/slave_tests.cpp 5ee5609af0861e9aecf02a5eaefafe137bd9b843 


Diff: https://reviews.apache.org/r/70132/diff/3/

Changes: https://reviews.apache.org/r/70132/diff/2-3/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On March 7, 2019, 11 p.m., Benjamin Mahler wrote:
> > src/master/master.cpp
> > Lines 5919-5927 (original), 5924-5937 (patched)
> > <https://reviews.apache.org/r/70132/diff/2/?file=2129168#file2129168line5924>
> >
> >     I'm lost at what's happening here and why, can you add a guiding comment?

I added some more high-level comments explaining that we should implicitly decline unused but not converted resources. Is it now clear, or do you think I should add more explanations here?


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review213538
-----------------------------------------------------------


On March 8, 2019, 12:23 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated March 8, 2019, 12:23 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly refused. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be refused.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
>   src/master/master.cpp b9db4ffd4ee8ea4a8e44a35d1afb6c1b8e03d74d 
>   src/tests/slave_tests.cpp 5ee5609af0861e9aecf02a5eaefafe137bd9b843 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review213538
-----------------------------------------------------------



Thank you for finding this and fixing it!

Can you expand the description here about why these bugs happen? Was it that as new changes were done, the existing logic didn't behave as one would expect? Something else? I can't quite follow what the issue is and why, even though it was already explained to me and I understood it at one point, so someone fresh is probably going to be puzzled.


docs/scheduler-http-api.md
Line 132 (original), 132 (patched)
<https://reviews.apache.org/r/70132/#comment299488>

    Not yours, but we should really split up this very long line and break at 80 or so (thought we had a markdown linter..), it would have the nice additional effect of making the diff work in reviewboard



src/master/master.cpp
Lines 5918-5919 (patched)
<https://reviews.apache.org/r/70132/#comment299490>

    Some additional context here would be great, because the variable names seem unhelpful? The distinction between `_offeredResources` and `offeredResoures` isn't obvious and looking at the code above it seems as though `_offeredResources` is already representing "unused resources"? ("consumed resources" are subtracted from it)
    
    A guiding high level comment would help here, I'm left puzzled by how these two calculations are computing what their names describe.



src/master/master.cpp
Lines 5919-5927 (original), 5924-5937 (patched)
<https://reviews.apache.org/r/70132/#comment299491>

    I'm lost at what's happening here and why, can you add a guiding comment?



src/master/master.cpp
Line 10812 (original), 10824 (patched)
<https://reviews.apache.org/r/70132/#comment299489>

    Can you just commit this directly so that we don't pollute this patch with this unrelated typo fix?


- Benjamin Mahler


On March 6, 2019, 5:07 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated March 6, 2019, 5:07 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly refused. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be refused.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
>   src/master/master.cpp 015da54583448a8d102d8e401e48bd228baf6dd6 
>   src/tests/slave_tests.cpp 22a0295086ae4f4ec26df00a0e077eecfa27f1fb 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
-----------------------------------------------------------

(Updated March 6, 2019, 5:07 a.m.)


Review request for mesos, Benjamin Mahler and Meng Zhu.


Changes
-------

Addressed Meng's comments.


Bugs: MESOS-9616
    https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
-------

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly refused. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be refused.


Diffs (updated)
-----

  docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
  src/master/master.cpp 015da54583448a8d102d8e401e48bd228baf6dd6 
  src/tests/slave_tests.cpp 22a0295086ae4f4ec26df00a0e077eecfa27f1fb 


Diff: https://reviews.apache.org/r/70132/diff/2/

Changes: https://reviews.apache.org/r/70132/diff/1-2/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On March 6, 2019, 2:29 a.m., Meng Zhu wrote:
> > src/master/master.cpp
> > Lines 5921-5928 (original), 5928-5937 (patched)
> > <https://reviews.apache.org/r/70132/diff/1/?file=2129042#file2129042line5928>
> >
> >     hmm, I wonder if it is better to also not implicitly filter `unusedResources` if `speculativeResources` is not empty.
> >     
> >     If a framework wants to consume the updated resources e.g. volume, it probably also wants other resources such as cpu/mem (unless it can consume disk only resource). In that case, the above code would not improve the situation because the framework still needs to wait for the filtered cpu/mem. Also I am a little uncomfortable that, we now have *two* recoverResources calls which could increase agent resource fragmentation (if an offer is sent out in between).
> >     
> >     what do you think?

We have two problems here:

1. A framework may still wait for the unused resources which get filters.
This won't be a problem since the converted resources will enlarge the resource pool. As a result, the "filtered" unused resources will come back along with the converted ones in the next allocation.

2. `recoverResources(...converted..., None())` -> allocation -> `recoverResources(...unused..., filters)` could lead to resource fragmentation.
How about doing the following?
```
allocator->pause();
allocator->recoverResources(...converted..., None());
allocator->recoverResources(...unused..., filters);
allocator->resume();
```


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review213453
-----------------------------------------------------------


On March 6, 2019, 12:53 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated March 6, 2019, 12:53 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly refused. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be refused.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
>   src/master/master.cpp 015da54583448a8d102d8e401e48bd228baf6dd6 
>   src/tests/slave_tests.cpp 22a0295086ae4f4ec26df00a0e077eecfa27f1fb 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review213453
-----------------------------------------------------------




src/master/master.cpp
Lines 5918-5920 (patched)
<https://reviews.apache.org/r/70132/#comment299393>

    Not sure what value the comment has. Maybe you want to express its dependency on certain `Resources` subtraction semantics such as dropping negatives? Is so, either spit that out or I think you can just remove the comment.



src/master/master.cpp
Lines 5920 (patched)
<https://reviews.apache.org/r/70132/#comment299394>

    how about just name it `updatedResources` which is easier to grasp. In this particular context, it's just the diff after the operation update (as opppose to "unused" below) whether it is speculative or not does not matter.



src/master/master.cpp
Lines 5921-5928 (original), 5928-5937 (patched)
<https://reviews.apache.org/r/70132/#comment299395>

    hmm, I wonder if it is better to also not implicitly filter `unusedResources` if `speculativeResources` is not empty.
    
    If a framework wants to consume the updated resources e.g. volume, it probably also wants other resources such as cpu/mem (unless it can consume disk only resource). In that case, the above code would not improve the situation because the framework still needs to wait for the filtered cpu/mem. Also I am a little uncomfortable that, we now have *two* recoverResources calls which could increase agent resource fragmentation (if an offer is sent out in between).
    
    what do you think?


- Meng Zhu


On March 5, 2019, 4:53 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated March 5, 2019, 4:53 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly refused. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be refused.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
>   src/master/master.cpp 015da54583448a8d102d8e401e48bd228baf6dd6 
>   src/tests/slave_tests.cpp 22a0295086ae4f4ec26df00a0e077eecfa27f1fb 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 70132: Do not implicitly refuse speculatively converted resources.

Posted by Gilbert Song <so...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review213450
-----------------------------------------------------------


Ship it!




Ship It!

- Gilbert Song


On March 5, 2019, 4:53 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> -----------------------------------------------------------
> 
> (Updated March 5, 2019, 4:53 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
>     https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly refused. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be refused.
> 
> 
> Diffs
> -----
> 
>   docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
>   src/master/master.cpp 015da54583448a8d102d8e401e48bd228baf6dd6 
>   src/tests/slave_tests.cpp 22a0295086ae4f4ec26df00a0e077eecfa27f1fb 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>