You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Meng Zhu <mz...@mesosphere.com> on 2018/12/03 20:25:53 UTC

New scheduler API proposal: unsuppress and clear_filter

Hi:

tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
clear_filter in order to decouple the dual-semantics of the current revive
call.

As pointed out in the Mesos framework scalability guide
<http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>,
utilizing the suppress
<http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
call is the key to get your cluster to a large number of frameworks
<https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>.
In short, when a framework is idling with no intention to launch any tasks,
it should suppress to inform the Mesos to stop sending any more offers. And
the framework should revive
<http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
when new work arrives. This way, the allocator will skip the framework when
performing resource allocations. As a result, thorny issues such as offer
starvation and resource fragmentation would be greatly mitigated.

That being said. The suppress/revive calls currently are a little bit
unwieldy due to MESOS-9028
<https://issues.apache.org/jira/browse/MESOS-9028>:

The revive call has two semantics. It unsuppresses the framework AND clears
all the existing filters. The later makes the revive call non-idempotent.
And sometimes users may want to keep the existing filters when reiving
which is not possible atm.

To decouple the semantics, as suggested in the ticket, we propose to add
two new V1 scheduler calls:

(1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
(2) `CLEAR_FILTER` call will explicitly clear all the existing filters.

To make life easier, both calls will return 200 OK (as opposed to 202
returned by most existing scheduler calls, including `SUPPRESS` and
`REVIVE`).

We will keep the revive call and its semantics (i.e. unsupppress AND clear
filters) for backward compatibility.

Note, the changes are proposed for V1 API only. Thus, once the changes are
landed, framework developers are encouraged to move to V1 API to take
advantage of the new calls (among many other benefits).

Any feedback/comments are welcome.

-Meng

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Mahler <bm...@apache.org>.

Thanks for bringing REQUEST_RESOURCES up for discussion, it's one of the
mechanisms that we've been considering for further scaling pessimistic
offers before we make the migration to optimistic offers. It's also been
referred to as "demand" rather than "request", but for the sake of this
discussion consider them the same.

I couldn't quite tell how you were imagining this would work, but let me
spell out the two models that I've been considering, and you can tell me if
one of these matches what you had in mind or if you had a different model
in mind:

(1) "Effective limit" or "give me this much more": when a scheduler
expresses its "request" for a role, it would be equivalent to setting an
"effective limit" on the framework leaf node underneath the role node (i.e.
.../role/<framework>). The effective limit would probably be set to
(request + existing .../role/<framework allocation). Due to this, the
demand would be expressed only as a quantity with no metadata and no
"chunks". When mesos performs allocation, it would simply enforce the limit
or the effective limit if applicable, whichever is lower. Of course, this
wouldn't allow a framework to say that it is specifically interested in
say, a set of reservations in a role.

(2) "Matchers" or "give me things that look like this": when a scheduler
expresses its "request" for a role, it would act as a "matcher" (opposite
of filter). When mesos is allocating resources, it only proceeds if
(requests.matches(resources) && !filters.filtered(resources)). The open
ended aspect here is what a matcher would consist of. Consider a case where
a matcher is a resource quantity and multiple are allowed; if any matcher
matches, the result is a match. This would be equivalent to letting
frameworks specify their own --min_allocatable_resources for a role (which
is something that has been considered). The "matchers" could be more
sophisticated: full resource objects just like filters (but global), full
resource objects but with quantities for non-scalar resources like ports,
etc.

I think in both approaches, we could explore where it gets expressed (e.g.
inside unsuppress, inside revive, inside subscribe, etc).

With regard to incentives, the incentive today for adhering to suppress is
that your framework will be doing less processing of offers when it has no
work to do and that other instances of your own framework as well as other
frameworks would get resources faster. The second aspect is indeed
indirect. The incentive structure with "request" / "demand" does indeed
seem to be more direct (while still having the indirect benefit on other
frameworks / roles): "I'll tell you what to show me so that I get it
faster".

However, as far as performance is concerned, we still need suppress
adoption and not just request adoption. Suppress is actually the bigger
performance win at the current time, unless we think that frameworks with
no work would "effectively suppress" via requests (e.g. "no work? set a 0
request so nothing matches"). Note though, that "effectively suppressing"
via requests has the same incentive structure as suppress itself, right?

On Tue, Dec 4, 2018 at 4:50 AM Benjamin Bannier <
benjamin.bannier@mesosphere.io> wrote:

> Hi Meng,
>
> thanks for the proposal, I agree that the way these two aspects are
> currently entangled is an issue (e.g., for master/allocator performance
> reasons). At the same time, the workflow we currently expect frameworks to
> follow is conceptually not hard to grasp,
>
> (1) If framework has work then
> (i) put framework in unsuppressed state,
> (ii) decline not matching offers with a long filter duration.
> (2) If an offer matches, accept.
> (3) If there is no more work, suppress. GOTO (1).
>
> Here the framework does not need to track its filters across allocation
> cycles (they are an unexposed implementation detail of the hierarchical
> allocator anyway) which e.g., allows metaschedulers like Marathon or Apache
> Aurora to decouple the scheduling of different workloads. A downside of
> this interface is that
>
> * there is little incentive for frameworks to use SUPPRESS in addition to
> filters, and
> * unsupression is all-or-nothing, forcing the master to send potentially
> all unused resources to one framework, even if it is only interested in a
> fraction. This can cause, at least temporal, non-optimal allocation
> behavior.
>
> It seems to me that even though adding UNSUPPRESS and CLEAR_FILTERS would
> give frameworks more control, it would only be a small improvement. In
> above framework workflow we would allow a small improvement if the
> framework knows that a new workload matches a previously running workflow
> (i.e., it can infer that no filters for the resources it is interested in
> is active) so that it can issue UNSUPPRESS instead of CLEAR_FILTERS.
> Incidentally, there seems little local benefit for frameworks to use these
> new calls as they’d mostly help the master and I’d imagine we wouldn’t want
> to imply that clearing filters would unsuppress the framework. This seems
> too little to me, and we run the danger that frameworks would just always
> pair UNSUPPRESS and CLEAR_FILTERS (or keep using REVIVE) to simplify their
> workflow. If we’d model the interface more along framework needs, there
> would be clear benefit which would help adoption.
>
> A more interesting call for me would be REQUEST_RESOURCES. It maps very
> well onto framework needs (e.g., “I want to launch a task requiring these
> resources”), and clearly communicates a requirement to the master so that
> it e.g., doesn’t need to remove all filters for a framework. It also seems
> to fit the allocator model pretty well which doesn’t explicitly expose
> filters. I believe implementing it should not be too hard if we'd restrict
> its semantics to only communicate to the master that a framework _is
> interested in a certain resource_ without promising that the framework
> _will get them in any amount of time_ (i.e., no need to rethink DRF
> fairness semantics in the hierarchical allocator). I also feel that if we
> have REQUEST_RESOURCES we would have some freedom to perform further
> improvements around filters in the master/allocator (e.g., filter
> compatification, work around increasing the default filter duration, …).
>
>
> A possible zeroth implementation for REQUEST_RESOURCES with the
> hierarchical allocator would be to have it remove any filters containing
> the requested resource and likely to unsuppress the framework. A
> REQUEST_RESOURCES call would hold an optional resource and an optional
> AgentID; the case where both are empty would map onto CLEAR_FILTERS.
>
>
> That being said, it might still be useful to in the future expose a
> low-level knob for framework allowing them to explicitly manage their
> filters.
>
>
> Cheers,
>
> Benjamin
>
>
> On Dec 4, 2018, at 5:44 AM, Meng Zhu <mz...@mesosphere.com> wrote:
> >
> > See my comments inline.
> >
> > On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:
> >
> >> Thanks Meng for the explanation.
> >>
> >> I imagine most frameworks do not remember what stuff they filtered much
> >> less figure out how previously filtered stuff  can satisfy new
> operations.
> >> That sounds complicated!
> >>
> >
> > Frameworks do not need to remember what filters they currently have. Only
> > knowing
> > the resource profiles of the current vs. the previous operation would
> help
> > a lot.
> > But yeah, even this may be too much complexity.
> >
> >>
> >> But I like your example. So a suggestion we could make to frameworks
> could
> >> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down,
> new
> >> app (they might want to use this even if they aren't suppressed!); and
> to
> >> use UNSUPPRESS when they are rescheduling old work?
> >>
> >
> > Yeah, these are the general guideline.
> >
> > I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
> > suppression.
> > Framework should consider clearing filters regardless of suppression.
> >
> > Ideally, when there is new different work, old irelavent filters should
> be
> > cleared. This helps
> > framework to get more offers and makes the allocator run faster (filter
> > could take up
> > bulk of the allocation time when they build up). On the flip side,
> calling
> > CLEAR_FILTERS too often
> > might also have performance implications (esp. if the master/allocator
> > actors are already stressed).
> >
> > Thoughts?
> >>
> >> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>
> >>> Hi Vinod:
> >>>
> >>> Yeah, `CLEAR_FILTERS` sounds good.
> >>>
> >>> UNSUPPRESS should be used whenever currently suppressed framework wants
> >> to
> >>> resume getting offers after a previous SUPPRESS call.
> >>>
> >>> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is
> to
> >>> call it whenever the framework wants to clear all the existing filters.
> >>>
> >>> To elaborate it, frameworks decline and accumulate filters when it is
> >>> trying to satisfy a particular set of requirements/constraints to
> perform
> >>> an operation. Once the operation is done and the next operation comes,
> if
> >>> the new operation has the same (or strictly more) resource
> >>> requirements/constraints compared to the last one, then it is more
> >>> efficient to KEEP the existing filters instead of getting useless
> offers
> >>> and rebuild the filters again.
> >>>
> >>> On the other hand, if the requirements/constraints are different (i.e.
> >> some
> >>> of the previous requirements could be loosened), then it means the
> >> existing
> >>> filter no longer make sense. Then it might be a good idea to clear all
> >> the
> >>> existing filters to improve the chance of getting more offers.
> >>>
> >>> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> >>> `REVIVE` call, its usage should be independent of suppression/revival.
> >> The
> >>> decision to clear the filters only depends on whether the existing
> >> filters
> >>> make sense for the current operation constraints/requirements.
> >>>
> >>> Examples:
> >>> If a framework first launches a task, then wants to launch a
> replacement
> >>> task (because the first task failed), then it should keep the filters
> >> built
> >>> up during the first launch. However, if the framework wants to launch a
> >>> second task with a completely different resource profile, then clearing
> >>> filters might help to get more (otherwise filtered) offers and hence
> >> speed
> >>> up the deployment.
> >>>
> >>> -Meng
> >>>
> >>> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org>
> wrote:
> >>>
> >>>> Hi Meng,
> >>>>
> >>>> What would be the recommendation for framework authors on when to use
> >>>> UNSUPPRESS vs CLEAR_FILTER?
> >>>>
> >>>> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >>>>
> >>>> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>>>
> >>>>> Hi:
> >>>>>
> >>>>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
> >> and
> >>>>> clear_filter in order to decouple the dual-semantics of the current
> >>> revive
> >>>>> call.
> >>>>>
> >>>>> As pointed out in the Mesos framework scalability guide
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >>>> ,
> >>>>> utilizing the suppress
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >>>>> call is the key to get your cluster to a large number of frameworks
> >>>>> <
> >>>
> >>
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >>>> .
> >>>>> In short, when a framework is idling with no intention to launch any
> >>> tasks,
> >>>>> it should suppress to inform the Mesos to stop sending any more
> >> offers.
> >>> And
> >>>>> the framework should revive
> >>>>> <
> >>>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >>>>> when new work arrives. This way, the allocator will skip the
> framework
> >>> when
> >>>>> performing resource allocations. As a result, thorny issues such as
> >>> offer
> >>>>> starvation and resource fragmentation would be greatly mitigated.
> >>>>>
> >>>>> That being said. The suppress/revive calls currently are a little bit
> >>>>> unwieldy due to MESOS-9028
> >>>>> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>>>>
> >>>>> The revive call has two semantics. It unsuppresses the framework AND
> >>>>> clears all the existing filters. The later makes the revive call
> >>>>> non-idempotent. And sometimes users may want to keep the existing
> >>> filters
> >>>>> when reiving which is not possible atm.
> >>>>>
> >>>>> To decouple the semantics, as suggested in the ticket, we propose to
> >> add
> >>>>> two new V1 scheduler calls:
> >>>>>
> >>>>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> >>>>> (2) `CLEAR_FILTER` call will explicitly clear all the existing
> >> filters.
> >>>>>
> >>>>> To make life easier, both calls will return 200 OK (as opposed to 202
> >>>>> returned by most existing scheduler calls, including `SUPPRESS` and
> >>>>> `REVIVE`).
> >>>>>
> >>>>> We will keep the revive call and its semantics (i.e. unsupppress AND
> >>>>> clear filters) for backward compatibility.
> >>>>>
> >>>>> Note, the changes are proposed for V1 API only. Thus, once the
> changes
> >>>>> are landed, framework developers are encouraged to move to V1 API to
> >>> take
> >>>>> advantage of the new calls (among many other benefits).
> >>>>>
> >>>>> Any feedback/comments are welcome.
> >>>>>
> >>>>> -Meng
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

Thanks Ben. Some thoughts below:

From a scheduler's perspective the difference between the two models is:
>
> (1) expressing "how much more" you need
> (2) expressing an offer "matcher"
>
> So:
>
> (1) covers the middle part of the demand quantity spectrum we currently
> have: unsuppressed -> infinite additional demand, suppressed -> 0
> additional demand, and now also unsuppressed w/ request of X -> X
> additional demand
>

I am not quite sure if the middle ground (expressing "how much more")
is needed. Even with matchers, the framework may still find itself to cycle
through several offers before finding the right resource. Setting
"effective limit"
will surely prolong this process. I guess the motivation here is to avoid
e.g. sending
too much resources to a just-unsuppressed framework that only wants to
launch a small task. I would say the inefficiency of flooding the framework
with offers would be tolerable if the framework rejects most offers in time,
as we are making progress. Even in cases where such limiting is desired
(e.g. the number of frameworks is too large), I think it is more appropirate
to rely on operators to configure the cluster prioirty by e.g. setting
limits,
than to expect individual frameworks to perform such altruistc action to
limit its own offers (while still having pending work).


> (2) is a global filtering mechanism to avoid getting offers in an unusable
> shape
>

Yeah, as you mentioned, I think we all agree that adding global matchers to
filter-out undesired resources is a good direction--which I think is what
matters most here. I think the small difference lies in how should the
framework
communicate the information: whether a more declarative approach or
exposing the global matchers to frameworks directly.


> They both solve inefficiencies we have, and they're complementary: a
> "request" could actually consist of (1) and (2), e.g. "I need an additional
> 10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".
>
> I'll schedule a meeting to discuss further. We should also make sure we
> come back to the original problem in this thread around REVIVE retries.
>
> On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
> benjamin.bannier@mesosphere.io> wrote:
>
> > Hi Ben et al.,
> >
> > I'd expect frameworks to *always* know how to accept or decline offers in
> > general. More involved frameworks might know how to suppress offers. I
> > don't expect that any framework models filters and their associated
> > durations in detail (that's why I called them a Mesos implementation
> > detail) since there is not much benefit to a framework's primary goal of
> > running tasks as quickly as possible.
> >
> > > I couldn't quite tell how you were imagining this would work, but let
> me
> > spell out the two models that I've been considering, and you can tell me
> if
> > one of these matches what you had in mind or if you had a different model
> > in mind:
> >
> > > (1) "Effective limit" or "give me this much more" ...
> >
> > This sounds more like an operator-type than a framework-type API to me.
> > I'd assume that frameworks would not worry about their total limit the
> way
> > an operator would, but instead care about getting resources to run a
> > certain task at a point in time. I could also imagine this being easy to
> > use incorrectly as frameworks would likely need to understand their total
> > limit when issuing the call which could require state or coordination
> among
> > internal framework components (think: multi-purpose frameworks like
> > Marathon or Aurora).
> >
> > > (2) "Matchers" or "give me things that look like this": when a
> scheduler
> > expresses its "request" for a role, it would act as a "matcher" (opposite
> > of filter). When mesos is allocating resources, it only proceeds if
> > (requests.matches(resources) && !filters.filtered(resources)). The open
> > ended aspect here is what a matcher would consist of. Consider a case
> where
> > a matcher is a resource quantity and multiple are allowed; if any matcher
> > matches, the result is a match. This would be equivalent to letting
> > frameworks specify their own --min_allocatable_resources for a role
> (which
> > is something that has been considered). The "matchers" could be more
> > sophisticated: full resource objects just like filters (but global), full
> > resource objects but with quantities for non-scalar resources like ports,
> > etc.
> >
> > I was thinking in this direction, but what you described is more involved
> > than what I had in mind as a possible first attempt. I'd expect that
> > frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> > as a way to manage their filter state tracked in the allocator. Assuming
> we
> > have some way to express resource quantities (i.e., MESOS-9314), we
> should
> > be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> > clears all filters for resource containing the requested resources (or
> all
> > filters if no explicit resource request). Even if that let to more offers
> > than needed it would likely still perform better than `REVIVE` (or
> > `CLEAR_FILTERS` which has similar semantics). If we keep the scope of
> these
> > calls narrow and clear we have freedom to be smarter in the future
> > internally.
> >
> > This should not only be pretty straight-forward to implement in Mesos,
> but
> > I'd imagine also map pretty well onto framework use cases (i.e., I assume
> > frameworks are interested in controlling the resources they are offered,
> > not in managing filters we maintain for them).
> >
> > > With regard to incentives, the incentive today for adhering to suppress
> > is that your framework will be doing less processing of offers when it
> has
> > no work to do and that other instances of your own framework as well as
> > other frameworks would get resources faster. The second aspect is indeed
> > indirect. The incentive structure with "request" / "demand" does indeed
> > seem to be more direct (while still having the indirect benefit on other
> > frameworks / roles): "I'll tell you what to show me so that I get it
> > faster".
> >
> > Additionally, by potentially explicitly introducing filters as a
> framework
> > API concept, we ask the majority of framework authors to reason about an
> > aspect they didn't have to worry about up until then (previously: "if
> work
> > arrives, revive, and decline until an offer can be accepted, then
> > suppress"). If we provided them something which fits their *current
> mental
> > model* while also gives them more control, we have a higher chance of it
> > being globally useful and adopted than if we'd add an expert-level knob.
> >
> > > However, as far as performance is concerned, we still need suppress
> > adoption and not just request adoption. Suppress is actually the bigger
> > performance win at the current time, unless we think that frameworks with
> > no work would "effectively suppress" via requests (e.g. "no work? set a 0
> > request so nothing matches"). Note though, that "effectively suppressing"
> > via requests has the same incentive structure as suppress itself, right?
> >
> > I was also wondering about how what I suggested would fit here as we have
> > two concepts controlling if and which offers a framework gets (a single
> > global flag for suppress, and a zoo of many fine-grained filters).
> > Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems
> that
> > explicitly adding framework control over filters to that might restrict
> > what we can do internally in the future. Right now the API gives us some
> > freedom how we interpret declines, we could e.g., merge filters which
> > expire at the same time, or even interpret filters on all cluster
> resources
> > interchangebly with a suppressed state (the API would actually allow us
> to
> > put a framework into suppressed state -- maybe for some time -- even
> before
> > it has seen all resources). If we exposed filters we loose some of that
> > implementation freedom, and we should make sure it is worth it.
> >
> > As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow
> > frameworks to make their interaction with Mesos more declarative yet
> > conceptually not much harder. Even if we (Mesos) wouldn’t be able to
> > implement optimal handling right away, it should could already be useful
> > with an MVP implementation on the Mesos side. Also, it would open up
> > potential for future optimizations with frameworks already "speaking the
> > right protocol".
> >
> >
> >
> > Cheers,
> >
> > Benjamin
> >
> >
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

Thanks Ben. Some thoughts below:

From a scheduler's perspective the difference between the two models is:
>
> (1) expressing "how much more" you need
> (2) expressing an offer "matcher"
>
> So:
>
> (1) covers the middle part of the demand quantity spectrum we currently
> have: unsuppressed -> infinite additional demand, suppressed -> 0
> additional demand, and now also unsuppressed w/ request of X -> X
> additional demand
>

I am not quite sure if the middle ground (expressing "how much more")
is needed. Even with matchers, the framework may still find itself to cycle
through several offers before finding the right resource. Setting
"effective limit"
will surely prolong this process. I guess the motivation here is to avoid
e.g. sending
too much resources to a just-unsuppressed framework that only wants to
launch a small task. I would say the inefficiency of flooding the framework
with offers would be tolerable if the framework rejects most offers in time,
as we are making progress. Even in cases where such limiting is desired
(e.g. the number of frameworks is too large), I think it is more appropirate
to rely on operators to configure the cluster prioirty by e.g. setting
limits,
than to expect individual frameworks to perform such altruistc action to
limit its own offers (while still having pending work).


> (2) is a global filtering mechanism to avoid getting offers in an unusable
> shape
>

Yeah, as you mentioned, I think we all agree that adding global matchers to
filter-out undesired resources is a good direction--which I think is what
matters most here. I think the small difference lies in how should the
framework
communicate the information: whether a more declarative approach or
exposing the global matchers to frameworks directly.


> They both solve inefficiencies we have, and they're complementary: a
> "request" could actually consist of (1) and (2), e.g. "I need an additional
> 10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".
>
> I'll schedule a meeting to discuss further. We should also make sure we
> come back to the original problem in this thread around REVIVE retries.
>
> On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
> benjamin.bannier@mesosphere.io> wrote:
>
> > Hi Ben et al.,
> >
> > I'd expect frameworks to *always* know how to accept or decline offers in
> > general. More involved frameworks might know how to suppress offers. I
> > don't expect that any framework models filters and their associated
> > durations in detail (that's why I called them a Mesos implementation
> > detail) since there is not much benefit to a framework's primary goal of
> > running tasks as quickly as possible.
> >
> > > I couldn't quite tell how you were imagining this would work, but let
> me
> > spell out the two models that I've been considering, and you can tell me
> if
> > one of these matches what you had in mind or if you had a different model
> > in mind:
> >
> > > (1) "Effective limit" or "give me this much more" ...
> >
> > This sounds more like an operator-type than a framework-type API to me.
> > I'd assume that frameworks would not worry about their total limit the
> way
> > an operator would, but instead care about getting resources to run a
> > certain task at a point in time. I could also imagine this being easy to
> > use incorrectly as frameworks would likely need to understand their total
> > limit when issuing the call which could require state or coordination
> among
> > internal framework components (think: multi-purpose frameworks like
> > Marathon or Aurora).
> >
> > > (2) "Matchers" or "give me things that look like this": when a
> scheduler
> > expresses its "request" for a role, it would act as a "matcher" (opposite
> > of filter). When mesos is allocating resources, it only proceeds if
> > (requests.matches(resources) && !filters.filtered(resources)). The open
> > ended aspect here is what a matcher would consist of. Consider a case
> where
> > a matcher is a resource quantity and multiple are allowed; if any matcher
> > matches, the result is a match. This would be equivalent to letting
> > frameworks specify their own --min_allocatable_resources for a role
> (which
> > is something that has been considered). The "matchers" could be more
> > sophisticated: full resource objects just like filters (but global), full
> > resource objects but with quantities for non-scalar resources like ports,
> > etc.
> >
> > I was thinking in this direction, but what you described is more involved
> > than what I had in mind as a possible first attempt. I'd expect that
> > frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> > as a way to manage their filter state tracked in the allocator. Assuming
> we
> > have some way to express resource quantities (i.e., MESOS-9314), we
> should
> > be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> > clears all filters for resource containing the requested resources (or
> all
> > filters if no explicit resource request). Even if that let to more offers
> > than needed it would likely still perform better than `REVIVE` (or
> > `CLEAR_FILTERS` which has similar semantics). If we keep the scope of
> these
> > calls narrow and clear we have freedom to be smarter in the future
> > internally.
> >
> > This should not only be pretty straight-forward to implement in Mesos,
> but
> > I'd imagine also map pretty well onto framework use cases (i.e., I assume
> > frameworks are interested in controlling the resources they are offered,
> > not in managing filters we maintain for them).
> >
> > > With regard to incentives, the incentive today for adhering to suppress
> > is that your framework will be doing less processing of offers when it
> has
> > no work to do and that other instances of your own framework as well as
> > other frameworks would get resources faster. The second aspect is indeed
> > indirect. The incentive structure with "request" / "demand" does indeed
> > seem to be more direct (while still having the indirect benefit on other
> > frameworks / roles): "I'll tell you what to show me so that I get it
> > faster".
> >
> > Additionally, by potentially explicitly introducing filters as a
> framework
> > API concept, we ask the majority of framework authors to reason about an
> > aspect they didn't have to worry about up until then (previously: "if
> work
> > arrives, revive, and decline until an offer can be accepted, then
> > suppress"). If we provided them something which fits their *current
> mental
> > model* while also gives them more control, we have a higher chance of it
> > being globally useful and adopted than if we'd add an expert-level knob.
> >
> > > However, as far as performance is concerned, we still need suppress
> > adoption and not just request adoption. Suppress is actually the bigger
> > performance win at the current time, unless we think that frameworks with
> > no work would "effectively suppress" via requests (e.g. "no work? set a 0
> > request so nothing matches"). Note though, that "effectively suppressing"
> > via requests has the same incentive structure as suppress itself, right?
> >
> > I was also wondering about how what I suggested would fit here as we have
> > two concepts controlling if and which offers a framework gets (a single
> > global flag for suppress, and a zoo of many fine-grained filters).
> > Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems
> that
> > explicitly adding framework control over filters to that might restrict
> > what we can do internally in the future. Right now the API gives us some
> > freedom how we interpret declines, we could e.g., merge filters which
> > expire at the same time, or even interpret filters on all cluster
> resources
> > interchangebly with a suppressed state (the API would actually allow us
> to
> > put a framework into suppressed state -- maybe for some time -- even
> before
> > it has seen all resources). If we exposed filters we loose some of that
> > implementation freedom, and we should make sure it is worth it.
> >
> > As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow
> > frameworks to make their interaction with Mesos more declarative yet
> > conceptually not much harder. Even if we (Mesos) wouldn’t be able to
> > implement optimal handling right away, it should could already be useful
> > with an MVP implementation on the Mesos side. Also, it would open up
> > potential for future optimizations with frameworks already "speaking the
> > right protocol".
> >
> >
> >
> > Cheers,
> >
> > Benjamin
> >
> >
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Mahler <bm...@apache.org>.

I think we're agreed:

    -There are no schedulers modeling the existing per-agent time-based
filters that mesos is tracking, and we shouldn't go in a direction that
encourages frameworks to try to model and manage these. So, we should be
very careful in considering something like CLEAR_FILTERS. We're probably
also agreed that the current filters aren't so great. :)
    -Letting a scheduler have more explicit control over the offers it gets
(both in shape of the offers and overall quantity of resources) is a good
direction to go in to reduce the inefficiency in the pessimistic offer
model.
    -Combining matchers of model (2) with REVIVE may eliminate the need for
CLEAR_FILTERS. I think once you have global matchers in play, it eliminates
the need for the existing decline filters to involve resource subsets and
we may be able to move new schedulers forward with a better model without
breaking old schedulers.

I don’t think model (1) was understood as intended. Schedulers would not be
expressing limits, they would be expressing a "request" equivalent to “how
much more they want”. The internal effective limit (equal to
allocation+request) is just an implementation detail here that demonstrates
how it fits cleanly into the allocation algorithm. So, if a scheduler needs
to run 10 tasks with [1 cpu, 10GB mem], they would express a request of
[10cpus ,100GB mem] regardless of how much else is already allocated at
that role/scheduler node.

From a scheduler's perspective the difference between the two models is:

(1) expressing "how much more" you need
(2) expressing an offer "matcher"

So:

(1) covers the middle part of the demand quantity spectrum we currently
have: unsuppressed -> infinite additional demand, suppressed -> 0
additional demand, and now also unsuppressed w/ request of X -> X
additional demand

(2) is a global filtering mechanism to avoid getting offers in an unusable
shape

They both solve inefficiencies we have, and they're complementary: a
"request" could actually consist of (1) and (2), e.g. "I need an additional
10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".

I'll schedule a meeting to discuss further. We should also make sure we
come back to the original problem in this thread around REVIVE retries.

On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
benjamin.bannier@mesosphere.io> wrote:

> Hi Ben et al.,
>
> I'd expect frameworks to *always* know how to accept or decline offers in
> general. More involved frameworks might know how to suppress offers. I
> don't expect that any framework models filters and their associated
> durations in detail (that's why I called them a Mesos implementation
> detail) since there is not much benefit to a framework's primary goal of
> running tasks as quickly as possible.
>
> > I couldn't quite tell how you were imagining this would work, but let me
> spell out the two models that I've been considering, and you can tell me if
> one of these matches what you had in mind or if you had a different model
> in mind:
>
> > (1) "Effective limit" or "give me this much more" ...
>
> This sounds more like an operator-type than a framework-type API to me.
> I'd assume that frameworks would not worry about their total limit the way
> an operator would, but instead care about getting resources to run a
> certain task at a point in time. I could also imagine this being easy to
> use incorrectly as frameworks would likely need to understand their total
> limit when issuing the call which could require state or coordination among
> internal framework components (think: multi-purpose frameworks like
> Marathon or Aurora).
>
> > (2) "Matchers" or "give me things that look like this": when a scheduler
> expresses its "request" for a role, it would act as a "matcher" (opposite
> of filter). When mesos is allocating resources, it only proceeds if
> (requests.matches(resources) && !filters.filtered(resources)). The open
> ended aspect here is what a matcher would consist of. Consider a case where
> a matcher is a resource quantity and multiple are allowed; if any matcher
> matches, the result is a match. This would be equivalent to letting
> frameworks specify their own --min_allocatable_resources for a role (which
> is something that has been considered). The "matchers" could be more
> sophisticated: full resource objects just like filters (but global), full
> resource objects but with quantities for non-scalar resources like ports,
> etc.
>
> I was thinking in this direction, but what you described is more involved
> than what I had in mind as a possible first attempt. I'd expect that
> frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> as a way to manage their filter state tracked in the allocator. Assuming we
> have some way to express resource quantities (i.e., MESOS-9314), we should
> be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> clears all filters for resource containing the requested resources (or all
> filters if no explicit resource request). Even if that let to more offers
> than needed it would likely still perform better than `REVIVE` (or
> `CLEAR_FILTERS` which has similar semantics). If we keep the scope of these
> calls narrow and clear we have freedom to be smarter in the future
> internally.
>
> This should not only be pretty straight-forward to implement in Mesos, but
> I'd imagine also map pretty well onto framework use cases (i.e., I assume
> frameworks are interested in controlling the resources they are offered,
> not in managing filters we maintain for them).
>
> > With regard to incentives, the incentive today for adhering to suppress
> is that your framework will be doing less processing of offers when it has
> no work to do and that other instances of your own framework as well as
> other frameworks would get resources faster. The second aspect is indeed
> indirect. The incentive structure with "request" / "demand" does indeed
> seem to be more direct (while still having the indirect benefit on other
> frameworks / roles): "I'll tell you what to show me so that I get it
> faster".
>
> Additionally, by potentially explicitly introducing filters as a framework
> API concept, we ask the majority of framework authors to reason about an
> aspect they didn't have to worry about up until then (previously: "if work
> arrives, revive, and decline until an offer can be accepted, then
> suppress"). If we provided them something which fits their *current mental
> model* while also gives them more control, we have a higher chance of it
> being globally useful and adopted than if we'd add an expert-level knob.
>
> > However, as far as performance is concerned, we still need suppress
> adoption and not just request adoption. Suppress is actually the bigger
> performance win at the current time, unless we think that frameworks with
> no work would "effectively suppress" via requests (e.g. "no work? set a 0
> request so nothing matches"). Note though, that "effectively suppressing"
> via requests has the same incentive structure as suppress itself, right?
>
> I was also wondering about how what I suggested would fit here as we have
> two concepts controlling if and which offers a framework gets (a single
> global flag for suppress, and a zoo of many fine-grained filters).
> Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems that
> explicitly adding framework control over filters to that might restrict
> what we can do internally in the future. Right now the API gives us some
> freedom how we interpret declines, we could e.g., merge filters which
> expire at the same time, or even interpret filters on all cluster resources
> interchangebly with a suppressed state (the API would actually allow us to
> put a framework into suppressed state -- maybe for some time -- even before
> it has seen all resources). If we exposed filters we loose some of that
> implementation freedom, and we should make sure it is worth it.
>
> As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow
> frameworks to make their interaction with Mesos more declarative yet
> conceptually not much harder. Even if we (Mesos) wouldn’t be able to
> implement optimal handling right away, it should could already be useful
> with an MVP implementation on the Mesos side. Also, it would open up
> potential for future optimizations with frameworks already "speaking the
> right protocol".
>
>
>
> Cheers,
>
> Benjamin
>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Mahler <bm...@apache.org>.

I think we're agreed:

    -There are no schedulers modeling the existing per-agent time-based
filters that mesos is tracking, and we shouldn't go in a direction that
encourages frameworks to try to model and manage these. So, we should be
very careful in considering something like CLEAR_FILTERS. We're probably
also agreed that the current filters aren't so great. :)
    -Letting a scheduler have more explicit control over the offers it gets
(both in shape of the offers and overall quantity of resources) is a good
direction to go in to reduce the inefficiency in the pessimistic offer
model.
    -Combining matchers of model (2) with REVIVE may eliminate the need for
CLEAR_FILTERS. I think once you have global matchers in play, it eliminates
the need for the existing decline filters to involve resource subsets and
we may be able to move new schedulers forward with a better model without
breaking old schedulers.

I don’t think model (1) was understood as intended. Schedulers would not be
expressing limits, they would be expressing a "request" equivalent to “how
much more they want”. The internal effective limit (equal to
allocation+request) is just an implementation detail here that demonstrates
how it fits cleanly into the allocation algorithm. So, if a scheduler needs
to run 10 tasks with [1 cpu, 10GB mem], they would express a request of
[10cpus ,100GB mem] regardless of how much else is already allocated at
that role/scheduler node.

From a scheduler's perspective the difference between the two models is:

(1) expressing "how much more" you need
(2) expressing an offer "matcher"

So:

(1) covers the middle part of the demand quantity spectrum we currently
have: unsuppressed -> infinite additional demand, suppressed -> 0
additional demand, and now also unsuppressed w/ request of X -> X
additional demand

(2) is a global filtering mechanism to avoid getting offers in an unusable
shape

They both solve inefficiencies we have, and they're complementary: a
"request" could actually consist of (1) and (2), e.g. "I need an additional
10 cpus, 100GB mem, and I want offers to contain [1cpu, 10GB mem]".

I'll schedule a meeting to discuss further. We should also make sure we
come back to the original problem in this thread around REVIVE retries.

On Mon, Dec 10, 2018 at 11:58 AM Benjamin Bannier <
benjamin.bannier@mesosphere.io> wrote:

> Hi Ben et al.,
>
> I'd expect frameworks to *always* know how to accept or decline offers in
> general. More involved frameworks might know how to suppress offers. I
> don't expect that any framework models filters and their associated
> durations in detail (that's why I called them a Mesos implementation
> detail) since there is not much benefit to a framework's primary goal of
> running tasks as quickly as possible.
>
> > I couldn't quite tell how you were imagining this would work, but let me
> spell out the two models that I've been considering, and you can tell me if
> one of these matches what you had in mind or if you had a different model
> in mind:
>
> > (1) "Effective limit" or "give me this much more" ...
>
> This sounds more like an operator-type than a framework-type API to me.
> I'd assume that frameworks would not worry about their total limit the way
> an operator would, but instead care about getting resources to run a
> certain task at a point in time. I could also imagine this being easy to
> use incorrectly as frameworks would likely need to understand their total
> limit when issuing the call which could require state or coordination among
> internal framework components (think: multi-purpose frameworks like
> Marathon or Aurora).
>
> > (2) "Matchers" or "give me things that look like this": when a scheduler
> expresses its "request" for a role, it would act as a "matcher" (opposite
> of filter). When mesos is allocating resources, it only proceeds if
> (requests.matches(resources) && !filters.filtered(resources)). The open
> ended aspect here is what a matcher would consist of. Consider a case where
> a matcher is a resource quantity and multiple are allowed; if any matcher
> matches, the result is a match. This would be equivalent to letting
> frameworks specify their own --min_allocatable_resources for a role (which
> is something that has been considered). The "matchers" could be more
> sophisticated: full resource objects just like filters (but global), full
> resource objects but with quantities for non-scalar resources like ports,
> etc.
>
> I was thinking in this direction, but what you described is more involved
> than what I had in mind as a possible first attempt. I'd expect that
> frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not
> as a way to manage their filter state tracked in the allocator. Assuming we
> have some way to express resource quantities (i.e., MESOS-9314), we should
> be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which
> clears all filters for resource containing the requested resources (or all
> filters if no explicit resource request). Even if that let to more offers
> than needed it would likely still perform better than `REVIVE` (or
> `CLEAR_FILTERS` which has similar semantics). If we keep the scope of these
> calls narrow and clear we have freedom to be smarter in the future
> internally.
>
> This should not only be pretty straight-forward to implement in Mesos, but
> I'd imagine also map pretty well onto framework use cases (i.e., I assume
> frameworks are interested in controlling the resources they are offered,
> not in managing filters we maintain for them).
>
> > With regard to incentives, the incentive today for adhering to suppress
> is that your framework will be doing less processing of offers when it has
> no work to do and that other instances of your own framework as well as
> other frameworks would get resources faster. The second aspect is indeed
> indirect. The incentive structure with "request" / "demand" does indeed
> seem to be more direct (while still having the indirect benefit on other
> frameworks / roles): "I'll tell you what to show me so that I get it
> faster".
>
> Additionally, by potentially explicitly introducing filters as a framework
> API concept, we ask the majority of framework authors to reason about an
> aspect they didn't have to worry about up until then (previously: "if work
> arrives, revive, and decline until an offer can be accepted, then
> suppress"). If we provided them something which fits their *current mental
> model* while also gives them more control, we have a higher chance of it
> being globally useful and adopted than if we'd add an expert-level knob.
>
> > However, as far as performance is concerned, we still need suppress
> adoption and not just request adoption. Suppress is actually the bigger
> performance win at the current time, unless we think that frameworks with
> no work would "effectively suppress" via requests (e.g. "no work? set a 0
> request so nothing matches"). Note though, that "effectively suppressing"
> via requests has the same incentive structure as suppress itself, right?
>
> I was also wondering about how what I suggested would fit here as we have
> two concepts controlling if and which offers a framework gets (a single
> global flag for suppress, and a zoo of many fine-grained filters).
> Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems that
> explicitly adding framework control over filters to that might restrict
> what we can do internally in the future. Right now the API gives us some
> freedom how we interpret declines, we could e.g., merge filters which
> expire at the same time, or even interpret filters on all cluster resources
> interchangebly with a suppressed state (the API would actually allow us to
> put a framework into suppressed state -- maybe for some time -- even before
> it has seen all resources). If we exposed filters we loose some of that
> implementation freedom, and we should make sure it is worth it.
>
> As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow
> frameworks to make their interaction with Mesos more declarative yet
> conceptually not much harder. Even if we (Mesos) wouldn’t be able to
> implement optimal handling right away, it should could already be useful
> with an MVP implementation on the Mesos side. Also, it would open up
> potential for future optimizations with frameworks already "speaking the
> right protocol".
>
>
>
> Cheers,
>
> Benjamin
>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Bannier <be...@mesosphere.io>.

Hi Ben et al.,

I'd expect frameworks to *always* know how to accept or decline offers in general. More involved frameworks might know how to suppress offers. I don't expect that any framework models filters and their associated durations in detail (that's why I called them a Mesos implementation detail) since there is not much benefit to a framework's primary goal of running tasks as quickly as possible.

> I couldn't quite tell how you were imagining this would work, but let me spell out the two models that I've been considering, and you can tell me if one of these matches what you had in mind or if you had a different model in mind:

> (1) "Effective limit" or "give me this much more" ...

This sounds more like an operator-type than a framework-type API to me. I'd assume that frameworks would not worry about their total limit the way an operator would, but instead care about getting resources to run a certain task at a point in time. I could also imagine this being easy to use incorrectly as frameworks would likely need to understand their total limit when issuing the call which could require state or coordination among internal framework components (think: multi-purpose frameworks like Marathon or Aurora).

> (2) "Matchers" or "give me things that look like this": when a scheduler expresses its "request" for a role, it would act as a "matcher" (opposite of filter). When mesos is allocating resources, it only proceeds if (requests.matches(resources) && !filters.filtered(resources)). The open ended aspect here is what a matcher would consist of. Consider a case where a matcher is a resource quantity and multiple are allowed; if any matcher matches, the result is a match. This would be equivalent to letting frameworks specify their own --min_allocatable_resources for a role (which is something that has been considered). The "matchers" could be more sophisticated: full resource objects just like filters (but global), full resource objects but with quantities for non-scalar resources like ports, etc.

I was thinking in this direction, but what you described is more involved than what I had in mind as a possible first attempt. I'd expect that frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not as a way to manage their filter state tracked in the allocator. Assuming we have some way to express resource quantities (i.e., MESOS-9314), we should be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which clears all filters for resource containing the requested resources (or all filters if no explicit resource request). Even if that let to more offers than needed it would likely still perform better than `REVIVE` (or `CLEAR_FILTERS` which has similar semantics). If we keep the scope of these calls narrow and clear we have freedom to be smarter in the future internally.

This should not only be pretty straight-forward to implement in Mesos, but I'd imagine also map pretty well onto framework use cases (i.e., I assume frameworks are interested in controlling the resources they are offered, not in managing filters we maintain for them).

> With regard to incentives, the incentive today for adhering to suppress is that your framework will be doing less processing of offers when it has no work to do and that other instances of your own framework as well as other frameworks would get resources faster. The second aspect is indeed indirect. The incentive structure with "request" / "demand" does indeed seem to be more direct (while still having the indirect benefit on other frameworks / roles): "I'll tell you what to show me so that I get it faster".

Additionally, by potentially explicitly introducing filters as a framework API concept, we ask the majority of framework authors to reason about an aspect they didn't have to worry about up until then (previously: "if work arrives, revive, and decline until an offer can be accepted, then suppress"). If we provided them something which fits their *current mental model* while also gives them more control, we have a higher chance of it being globally useful and adopted than if we'd add an expert-level knob.

> However, as far as performance is concerned, we still need suppress adoption and not just request adoption. Suppress is actually the bigger performance win at the current time, unless we think that frameworks with no work would "effectively suppress" via requests (e.g. "no work? set a 0 request so nothing matches"). Note though, that "effectively suppressing" via requests has the same incentive structure as suppress itself, right?

I was also wondering about how what I suggested would fit here as we have two concepts controlling if and which offers a framework gets (a single global flag for suppress, and a zoo of many fine-grained filters). Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems that explicitly adding framework control over filters to that might restrict what we can do internally in the future. Right now the API gives us some freedom how we interpret declines, we could e.g., merge filters which expire at the same time, or even interpret filters on all cluster resources interchangebly with a suppressed state (the API would actually allow us to put a framework into suppressed state -- maybe for some time -- even before it has seen all resources). If we exposed filters we loose some of that implementation freedom, and we should make sure it is worth it.

As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow frameworks to make their interaction with Mesos more declarative yet conceptually not much harder. Even if we (Mesos) wouldn’t be able to implement optimal handling right away, it should could already be useful with an MVP implementation on the Mesos side. Also, it would open up potential for future optimizations with frameworks already "speaking the right protocol". 



Cheers,

Benjamin

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Bannier <be...@mesosphere.io>.

Hi Ben et al.,

I'd expect frameworks to *always* know how to accept or decline offers in general. More involved frameworks might know how to suppress offers. I don't expect that any framework models filters and their associated durations in detail (that's why I called them a Mesos implementation detail) since there is not much benefit to a framework's primary goal of running tasks as quickly as possible.

> I couldn't quite tell how you were imagining this would work, but let me spell out the two models that I've been considering, and you can tell me if one of these matches what you had in mind or if you had a different model in mind:

> (1) "Effective limit" or "give me this much more" ...

This sounds more like an operator-type than a framework-type API to me. I'd assume that frameworks would not worry about their total limit the way an operator would, but instead care about getting resources to run a certain task at a point in time. I could also imagine this being easy to use incorrectly as frameworks would likely need to understand their total limit when issuing the call which could require state or coordination among internal framework components (think: multi-purpose frameworks like Marathon or Aurora).

> (2) "Matchers" or "give me things that look like this": when a scheduler expresses its "request" for a role, it would act as a "matcher" (opposite of filter). When mesos is allocating resources, it only proceeds if (requests.matches(resources) && !filters.filtered(resources)). The open ended aspect here is what a matcher would consist of. Consider a case where a matcher is a resource quantity and multiple are allowed; if any matcher matches, the result is a match. This would be equivalent to letting frameworks specify their own --min_allocatable_resources for a role (which is something that has been considered). The "matchers" could be more sophisticated: full resource objects just like filters (but global), full resource objects but with quantities for non-scalar resources like ports, etc.

I was thinking in this direction, but what you described is more involved than what I had in mind as a possible first attempt. I'd expect that frameworks currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not as a way to manage their filter state tracked in the allocator. Assuming we have some way to express resource quantities (i.e., MESOS-9314), we should be able to improve on `REVIVE` by providing a `REQUEST_RESOURCES` which clears all filters for resource containing the requested resources (or all filters if no explicit resource request). Even if that let to more offers than needed it would likely still perform better than `REVIVE` (or `CLEAR_FILTERS` which has similar semantics). If we keep the scope of these calls narrow and clear we have freedom to be smarter in the future internally.

This should not only be pretty straight-forward to implement in Mesos, but I'd imagine also map pretty well onto framework use cases (i.e., I assume frameworks are interested in controlling the resources they are offered, not in managing filters we maintain for them).

> With regard to incentives, the incentive today for adhering to suppress is that your framework will be doing less processing of offers when it has no work to do and that other instances of your own framework as well as other frameworks would get resources faster. The second aspect is indeed indirect. The incentive structure with "request" / "demand" does indeed seem to be more direct (while still having the indirect benefit on other frameworks / roles): "I'll tell you what to show me so that I get it faster".

Additionally, by potentially explicitly introducing filters as a framework API concept, we ask the majority of framework authors to reason about an aspect they didn't have to worry about up until then (previously: "if work arrives, revive, and decline until an offer can be accepted, then suppress"). If we provided them something which fits their *current mental model* while also gives them more control, we have a higher chance of it being globally useful and adopted than if we'd add an expert-level knob.

> However, as far as performance is concerned, we still need suppress adoption and not just request adoption. Suppress is actually the bigger performance win at the current time, unless we think that frameworks with no work would "effectively suppress" via requests (e.g. "no work? set a 0 request so nothing matches"). Note though, that "effectively suppressing" via requests has the same incentive structure as suppress itself, right?

I was also wondering about how what I suggested would fit here as we have two concepts controlling if and which offers a framework gets (a single global flag for suppress, and a zoo of many fine-grained filters). Currently we only expose `SUPPRESS`, `DECLINE`, and `REVIVE`. It seems that explicitly adding framework control over filters to that might restrict what we can do internally in the future. Right now the API gives us some freedom how we interpret declines, we could e.g., merge filters which expire at the same time, or even interpret filters on all cluster resources interchangebly with a suppressed state (the API would actually allow us to put a framework into suppressed state -- maybe for some time -- even before it has seen all resources). If we exposed filters we loose some of that implementation freedom, and we should make sure it is worth it.

As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow frameworks to make their interaction with Mesos more declarative yet conceptually not much harder. Even if we (Mesos) wouldn’t be able to implement optimal handling right away, it should could already be useful with an MVP implementation on the Mesos side. Also, it would open up potential for future optimizations with frameworks already "speaking the right protocol". 



Cheers,

Benjamin

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

Hi Benjamin:

Thanks for the great feedback.

I like the idea of giving frameworks more meaningful and fined grained
control over which filters to remove, especially this is likely to help
adoption. For example, letting the framework send an optional agentID which
instructs Mesos to only clear filters on that agent might help a task
launch with agent constraint.

However, when it comes to framework sent desired resource profiles, we
should give more thoughts. There is always the question that to what degree
do we support the various meta-data in the resource schema. I feel the
current schema is too complex for expressing resource needs, let alone
respecting it in the allocator (even just for the purpose of removing
filters). We probably want to first introduce a more concise format (such
as resourceQuantity) for all purposes of specifying desired resource
profiles (clear filters, quota guarantee, min_allocatable_resources and
etc) and start from there.

I suggest to just add the optional agentID atm and we can always add
support for specifying resource requirements in the future. And since its
semantic is far away from "requesting resources", I suggest keeping the
name of CLEAR(or REMOVE)_FILTERS.

What do you think?

-Meng

On Tue, Dec 4, 2018 at 1:50 AM Benjamin Bannier <
benjamin.bannier@mesosphere.io> wrote:

> Hi Meng,
>
> thanks for the proposal, I agree that the way these two aspects are
> currently entangled is an issue (e.g., for master/allocator performance
> reasons). At the same time, the workflow we currently expect frameworks to
> follow is conceptually not hard to grasp,
>
> (1) If framework has work then
> (i) put framework in unsuppressed state,
> (ii) decline not matching offers with a long filter duration.
> (2) If an offer matches, accept.
> (3) If there is no more work, suppress. GOTO (1).
>
> Here the framework does not need to track its filters across allocation
> cycles (they are an unexposed implementation detail of the hierarchical
> allocator anyway) which e.g., allows metaschedulers like Marathon or Apache
> Aurora to decouple the scheduling of different workloads. A downside of
> this interface is that
>
> * there is little incentive for frameworks to use SUPPRESS in addition to
> filters, and
> * unsupression is all-or-nothing, forcing the master to send potentially
> all unused resources to one framework, even if it is only interested in a
> fraction. This can cause, at least temporal, non-optimal allocation
> behavior.
>
> It seems to me that even though adding UNSUPPRESS and CLEAR_FILTERS would
> give frameworks more control, it would only be a small improvement. In
> above framework workflow we would allow a small improvement if the
> framework knows that a new workload matches a previously running workflow
> (i.e., it can infer that no filters for the resources it is interested in
> is active) so that it can issue UNSUPPRESS instead of CLEAR_FILTERS.
> Incidentally, there seems little local benefit for frameworks to use these
> new calls as they’d mostly help the master and I’d imagine we wouldn’t want
> to imply that clearing filters would unsuppress the framework. This seems
> too little to me, and we run the danger that frameworks would just always
> pair UNSUPPRESS and CLEAR_FILTERS (or keep using REVIVE) to simplify their
> workflow. If we’d model the interface more along framework needs, there
> would be clear benefit which would help adoption.
>
> A more interesting call for me would be REQUEST_RESOURCES. It maps very
> well onto framework needs (e.g., “I want to launch a task requiring these
> resources”), and clearly communicates a requirement to the master so that
> it e.g., doesn’t need to remove all filters for a framework. It also seems
> to fit the allocator model pretty well which doesn’t explicitly expose
> filters. I believe implementing it should not be too hard if we'd restrict
> its semantics to only communicate to the master that a framework _is
> interested in a certain resource_ without promising that the framework
> _will get them in any amount of time_ (i.e., no need to rethink DRF
> fairness semantics in the hierarchical allocator). I also feel that if we
> have REQUEST_RESOURCES we would have some freedom to perform further
> improvements around filters in the master/allocator (e.g., filter
> compatification, work around increasing the default filter duration, …).
>
>
> A possible zeroth implementation for REQUEST_RESOURCES with the
> hierarchical allocator would be to have it remove any filters containing
> the requested resource and likely to unsuppress the framework. A
> REQUEST_RESOURCES call would hold an optional resource and an optional
> AgentID; the case where both are empty would map onto CLEAR_FILTERS.
>
>
> That being said, it might still be useful to in the future expose a
> low-level knob for framework allowing them to explicitly manage their
> filters.
>
>
> Cheers,
>
> Benjamin
>
>
> On Dec 4, 2018, at 5:44 AM, Meng Zhu <mz...@mesosphere.com> wrote:
> >
> > See my comments inline.
> >
> > On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:
> >
> >> Thanks Meng for the explanation.
> >>
> >> I imagine most frameworks do not remember what stuff they filtered much
> >> less figure out how previously filtered stuff  can satisfy new
> operations.
> >> That sounds complicated!
> >>
> >
> > Frameworks do not need to remember what filters they currently have. Only
> > knowing
> > the resource profiles of the current vs. the previous operation would
> help
> > a lot.
> > But yeah, even this may be too much complexity.
> >
> >>
> >> But I like your example. So a suggestion we could make to frameworks
> could
> >> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down,
> new
> >> app (they might want to use this even if they aren't suppressed!); and
> to
> >> use UNSUPPRESS when they are rescheduling old work?
> >>
> >
> > Yeah, these are the general guideline.
> >
> > I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
> > suppression.
> > Framework should consider clearing filters regardless of suppression.
> >
> > Ideally, when there is new different work, old irelavent filters should
> be
> > cleared. This helps
> > framework to get more offers and makes the allocator run faster (filter
> > could take up
> > bulk of the allocation time when they build up). On the flip side,
> calling
> > CLEAR_FILTERS too often
> > might also have performance implications (esp. if the master/allocator
> > actors are already stressed).
> >
> > Thoughts?
> >>
> >> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>
> >>> Hi Vinod:
> >>>
> >>> Yeah, `CLEAR_FILTERS` sounds good.
> >>>
> >>> UNSUPPRESS should be used whenever currently suppressed framework wants
> >> to
> >>> resume getting offers after a previous SUPPRESS call.
> >>>
> >>> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is
> to
> >>> call it whenever the framework wants to clear all the existing filters.
> >>>
> >>> To elaborate it, frameworks decline and accumulate filters when it is
> >>> trying to satisfy a particular set of requirements/constraints to
> perform
> >>> an operation. Once the operation is done and the next operation comes,
> if
> >>> the new operation has the same (or strictly more) resource
> >>> requirements/constraints compared to the last one, then it is more
> >>> efficient to KEEP the existing filters instead of getting useless
> offers
> >>> and rebuild the filters again.
> >>>
> >>> On the other hand, if the requirements/constraints are different (i.e.
> >> some
> >>> of the previous requirements could be loosened), then it means the
> >> existing
> >>> filter no longer make sense. Then it might be a good idea to clear all
> >> the
> >>> existing filters to improve the chance of getting more offers.
> >>>
> >>> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> >>> `REVIVE` call, its usage should be independent of suppression/revival.
> >> The
> >>> decision to clear the filters only depends on whether the existing
> >> filters
> >>> make sense for the current operation constraints/requirements.
> >>>
> >>> Examples:
> >>> If a framework first launches a task, then wants to launch a
> replacement
> >>> task (because the first task failed), then it should keep the filters
> >> built
> >>> up during the first launch. However, if the framework wants to launch a
> >>> second task with a completely different resource profile, then clearing
> >>> filters might help to get more (otherwise filtered) offers and hence
> >> speed
> >>> up the deployment.
> >>>
> >>> -Meng
> >>>
> >>> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org>
> wrote:
> >>>
> >>>> Hi Meng,
> >>>>
> >>>> What would be the recommendation for framework authors on when to use
> >>>> UNSUPPRESS vs CLEAR_FILTER?
> >>>>
> >>>> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >>>>
> >>>> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>>>
> >>>>> Hi:
> >>>>>
> >>>>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
> >> and
> >>>>> clear_filter in order to decouple the dual-semantics of the current
> >>> revive
> >>>>> call.
> >>>>>
> >>>>> As pointed out in the Mesos framework scalability guide
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >>>> ,
> >>>>> utilizing the suppress
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >>>>> call is the key to get your cluster to a large number of frameworks
> >>>>> <
> >>>
> >>
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >>>> .
> >>>>> In short, when a framework is idling with no intention to launch any
> >>> tasks,
> >>>>> it should suppress to inform the Mesos to stop sending any more
> >> offers.
> >>> And
> >>>>> the framework should revive
> >>>>> <
> >>>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >>>>> when new work arrives. This way, the allocator will skip the
> framework
> >>> when
> >>>>> performing resource allocations. As a result, thorny issues such as
> >>> offer
> >>>>> starvation and resource fragmentation would be greatly mitigated.
> >>>>>
> >>>>> That being said. The suppress/revive calls currently are a little bit
> >>>>> unwieldy due to MESOS-9028
> >>>>> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>>>>
> >>>>> The revive call has two semantics. It unsuppresses the framework AND
> >>>>> clears all the existing filters. The later makes the revive call
> >>>>> non-idempotent. And sometimes users may want to keep the existing
> >>> filters
> >>>>> when reiving which is not possible atm.
> >>>>>
> >>>>> To decouple the semantics, as suggested in the ticket, we propose to
> >> add
> >>>>> two new V1 scheduler calls:
> >>>>>
> >>>>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> >>>>> (2) `CLEAR_FILTER` call will explicitly clear all the existing
> >> filters.
> >>>>>
> >>>>> To make life easier, both calls will return 200 OK (as opposed to 202
> >>>>> returned by most existing scheduler calls, including `SUPPRESS` and
> >>>>> `REVIVE`).
> >>>>>
> >>>>> We will keep the revive call and its semantics (i.e. unsupppress AND
> >>>>> clear filters) for backward compatibility.
> >>>>>
> >>>>> Note, the changes are proposed for V1 API only. Thus, once the
> changes
> >>>>> are landed, framework developers are encouraged to move to V1 API to
> >>> take
> >>>>> advantage of the new calls (among many other benefits).
> >>>>>
> >>>>> Any feedback/comments are welcome.
> >>>>>
> >>>>> -Meng
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Mahler <bm...@apache.org>.

Thanks for bringing REQUEST_RESOURCES up for discussion, it's one of the
mechanisms that we've been considering for further scaling pessimistic
offers before we make the migration to optimistic offers. It's also been
referred to as "demand" rather than "request", but for the sake of this
discussion consider them the same.

I couldn't quite tell how you were imagining this would work, but let me
spell out the two models that I've been considering, and you can tell me if
one of these matches what you had in mind or if you had a different model
in mind:

(1) "Effective limit" or "give me this much more": when a scheduler
expresses its "request" for a role, it would be equivalent to setting an
"effective limit" on the framework leaf node underneath the role node (i.e.
.../role/<framework>). The effective limit would probably be set to
(request + existing .../role/<framework allocation). Due to this, the
demand would be expressed only as a quantity with no metadata and no
"chunks". When mesos performs allocation, it would simply enforce the limit
or the effective limit if applicable, whichever is lower. Of course, this
wouldn't allow a framework to say that it is specifically interested in
say, a set of reservations in a role.

(2) "Matchers" or "give me things that look like this": when a scheduler
expresses its "request" for a role, it would act as a "matcher" (opposite
of filter). When mesos is allocating resources, it only proceeds if
(requests.matches(resources) && !filters.filtered(resources)). The open
ended aspect here is what a matcher would consist of. Consider a case where
a matcher is a resource quantity and multiple are allowed; if any matcher
matches, the result is a match. This would be equivalent to letting
frameworks specify their own --min_allocatable_resources for a role (which
is something that has been considered). The "matchers" could be more
sophisticated: full resource objects just like filters (but global), full
resource objects but with quantities for non-scalar resources like ports,
etc.

I think in both approaches, we could explore where it gets expressed (e.g.
inside unsuppress, inside revive, inside subscribe, etc).

With regard to incentives, the incentive today for adhering to suppress is
that your framework will be doing less processing of offers when it has no
work to do and that other instances of your own framework as well as other
frameworks would get resources faster. The second aspect is indeed
indirect. The incentive structure with "request" / "demand" does indeed
seem to be more direct (while still having the indirect benefit on other
frameworks / roles): "I'll tell you what to show me so that I get it
faster".

However, as far as performance is concerned, we still need suppress
adoption and not just request adoption. Suppress is actually the bigger
performance win at the current time, unless we think that frameworks with
no work would "effectively suppress" via requests (e.g. "no work? set a 0
request so nothing matches"). Note though, that "effectively suppressing"
via requests has the same incentive structure as suppress itself, right?

On Tue, Dec 4, 2018 at 4:50 AM Benjamin Bannier <
benjamin.bannier@mesosphere.io> wrote:

> Hi Meng,
>
> thanks for the proposal, I agree that the way these two aspects are
> currently entangled is an issue (e.g., for master/allocator performance
> reasons). At the same time, the workflow we currently expect frameworks to
> follow is conceptually not hard to grasp,
>
> (1) If framework has work then
> (i) put framework in unsuppressed state,
> (ii) decline not matching offers with a long filter duration.
> (2) If an offer matches, accept.
> (3) If there is no more work, suppress. GOTO (1).
>
> Here the framework does not need to track its filters across allocation
> cycles (they are an unexposed implementation detail of the hierarchical
> allocator anyway) which e.g., allows metaschedulers like Marathon or Apache
> Aurora to decouple the scheduling of different workloads. A downside of
> this interface is that
>
> * there is little incentive for frameworks to use SUPPRESS in addition to
> filters, and
> * unsupression is all-or-nothing, forcing the master to send potentially
> all unused resources to one framework, even if it is only interested in a
> fraction. This can cause, at least temporal, non-optimal allocation
> behavior.
>
> It seems to me that even though adding UNSUPPRESS and CLEAR_FILTERS would
> give frameworks more control, it would only be a small improvement. In
> above framework workflow we would allow a small improvement if the
> framework knows that a new workload matches a previously running workflow
> (i.e., it can infer that no filters for the resources it is interested in
> is active) so that it can issue UNSUPPRESS instead of CLEAR_FILTERS.
> Incidentally, there seems little local benefit for frameworks to use these
> new calls as they’d mostly help the master and I’d imagine we wouldn’t want
> to imply that clearing filters would unsuppress the framework. This seems
> too little to me, and we run the danger that frameworks would just always
> pair UNSUPPRESS and CLEAR_FILTERS (or keep using REVIVE) to simplify their
> workflow. If we’d model the interface more along framework needs, there
> would be clear benefit which would help adoption.
>
> A more interesting call for me would be REQUEST_RESOURCES. It maps very
> well onto framework needs (e.g., “I want to launch a task requiring these
> resources”), and clearly communicates a requirement to the master so that
> it e.g., doesn’t need to remove all filters for a framework. It also seems
> to fit the allocator model pretty well which doesn’t explicitly expose
> filters. I believe implementing it should not be too hard if we'd restrict
> its semantics to only communicate to the master that a framework _is
> interested in a certain resource_ without promising that the framework
> _will get them in any amount of time_ (i.e., no need to rethink DRF
> fairness semantics in the hierarchical allocator). I also feel that if we
> have REQUEST_RESOURCES we would have some freedom to perform further
> improvements around filters in the master/allocator (e.g., filter
> compatification, work around increasing the default filter duration, …).
>
>
> A possible zeroth implementation for REQUEST_RESOURCES with the
> hierarchical allocator would be to have it remove any filters containing
> the requested resource and likely to unsuppress the framework. A
> REQUEST_RESOURCES call would hold an optional resource and an optional
> AgentID; the case where both are empty would map onto CLEAR_FILTERS.
>
>
> That being said, it might still be useful to in the future expose a
> low-level knob for framework allowing them to explicitly manage their
> filters.
>
>
> Cheers,
>
> Benjamin
>
>
> On Dec 4, 2018, at 5:44 AM, Meng Zhu <mz...@mesosphere.com> wrote:
> >
> > See my comments inline.
> >
> > On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:
> >
> >> Thanks Meng for the explanation.
> >>
> >> I imagine most frameworks do not remember what stuff they filtered much
> >> less figure out how previously filtered stuff  can satisfy new
> operations.
> >> That sounds complicated!
> >>
> >
> > Frameworks do not need to remember what filters they currently have. Only
> > knowing
> > the resource profiles of the current vs. the previous operation would
> help
> > a lot.
> > But yeah, even this may be too much complexity.
> >
> >>
> >> But I like your example. So a suggestion we could make to frameworks
> could
> >> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down,
> new
> >> app (they might want to use this even if they aren't suppressed!); and
> to
> >> use UNSUPPRESS when they are rescheduling old work?
> >>
> >
> > Yeah, these are the general guideline.
> >
> > I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
> > suppression.
> > Framework should consider clearing filters regardless of suppression.
> >
> > Ideally, when there is new different work, old irelavent filters should
> be
> > cleared. This helps
> > framework to get more offers and makes the allocator run faster (filter
> > could take up
> > bulk of the allocation time when they build up). On the flip side,
> calling
> > CLEAR_FILTERS too often
> > might also have performance implications (esp. if the master/allocator
> > actors are already stressed).
> >
> > Thoughts?
> >>
> >> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>
> >>> Hi Vinod:
> >>>
> >>> Yeah, `CLEAR_FILTERS` sounds good.
> >>>
> >>> UNSUPPRESS should be used whenever currently suppressed framework wants
> >> to
> >>> resume getting offers after a previous SUPPRESS call.
> >>>
> >>> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is
> to
> >>> call it whenever the framework wants to clear all the existing filters.
> >>>
> >>> To elaborate it, frameworks decline and accumulate filters when it is
> >>> trying to satisfy a particular set of requirements/constraints to
> perform
> >>> an operation. Once the operation is done and the next operation comes,
> if
> >>> the new operation has the same (or strictly more) resource
> >>> requirements/constraints compared to the last one, then it is more
> >>> efficient to KEEP the existing filters instead of getting useless
> offers
> >>> and rebuild the filters again.
> >>>
> >>> On the other hand, if the requirements/constraints are different (i.e.
> >> some
> >>> of the previous requirements could be loosened), then it means the
> >> existing
> >>> filter no longer make sense. Then it might be a good idea to clear all
> >> the
> >>> existing filters to improve the chance of getting more offers.
> >>>
> >>> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> >>> `REVIVE` call, its usage should be independent of suppression/revival.
> >> The
> >>> decision to clear the filters only depends on whether the existing
> >> filters
> >>> make sense for the current operation constraints/requirements.
> >>>
> >>> Examples:
> >>> If a framework first launches a task, then wants to launch a
> replacement
> >>> task (because the first task failed), then it should keep the filters
> >> built
> >>> up during the first launch. However, if the framework wants to launch a
> >>> second task with a completely different resource profile, then clearing
> >>> filters might help to get more (otherwise filtered) offers and hence
> >> speed
> >>> up the deployment.
> >>>
> >>> -Meng
> >>>
> >>> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org>
> wrote:
> >>>
> >>>> Hi Meng,
> >>>>
> >>>> What would be the recommendation for framework authors on when to use
> >>>> UNSUPPRESS vs CLEAR_FILTER?
> >>>>
> >>>> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >>>>
> >>>> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>>>
> >>>>> Hi:
> >>>>>
> >>>>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
> >> and
> >>>>> clear_filter in order to decouple the dual-semantics of the current
> >>> revive
> >>>>> call.
> >>>>>
> >>>>> As pointed out in the Mesos framework scalability guide
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >>>> ,
> >>>>> utilizing the suppress
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >>>>> call is the key to get your cluster to a large number of frameworks
> >>>>> <
> >>>
> >>
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >>>> .
> >>>>> In short, when a framework is idling with no intention to launch any
> >>> tasks,
> >>>>> it should suppress to inform the Mesos to stop sending any more
> >> offers.
> >>> And
> >>>>> the framework should revive
> >>>>> <
> >>>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >>>>> when new work arrives. This way, the allocator will skip the
> framework
> >>> when
> >>>>> performing resource allocations. As a result, thorny issues such as
> >>> offer
> >>>>> starvation and resource fragmentation would be greatly mitigated.
> >>>>>
> >>>>> That being said. The suppress/revive calls currently are a little bit
> >>>>> unwieldy due to MESOS-9028
> >>>>> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>>>>
> >>>>> The revive call has two semantics. It unsuppresses the framework AND
> >>>>> clears all the existing filters. The later makes the revive call
> >>>>> non-idempotent. And sometimes users may want to keep the existing
> >>> filters
> >>>>> when reiving which is not possible atm.
> >>>>>
> >>>>> To decouple the semantics, as suggested in the ticket, we propose to
> >> add
> >>>>> two new V1 scheduler calls:
> >>>>>
> >>>>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> >>>>> (2) `CLEAR_FILTER` call will explicitly clear all the existing
> >> filters.
> >>>>>
> >>>>> To make life easier, both calls will return 200 OK (as opposed to 202
> >>>>> returned by most existing scheduler calls, including `SUPPRESS` and
> >>>>> `REVIVE`).
> >>>>>
> >>>>> We will keep the revive call and its semantics (i.e. unsupppress AND
> >>>>> clear filters) for backward compatibility.
> >>>>>
> >>>>> Note, the changes are proposed for V1 API only. Thus, once the
> changes
> >>>>> are landed, framework developers are encouraged to move to V1 API to
> >>> take
> >>>>> advantage of the new calls (among many other benefits).
> >>>>>
> >>>>> Any feedback/comments are welcome.
> >>>>>
> >>>>> -Meng
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

Hi Benjamin:

Thanks for the great feedback.

I like the idea of giving frameworks more meaningful and fined grained
control over which filters to remove, especially this is likely to help
adoption. For example, letting the framework send an optional agentID which
instructs Mesos to only clear filters on that agent might help a task
launch with agent constraint.

However, when it comes to framework sent desired resource profiles, we
should give more thoughts. There is always the question that to what degree
do we support the various meta-data in the resource schema. I feel the
current schema is too complex for expressing resource needs, let alone
respecting it in the allocator (even just for the purpose of removing
filters). We probably want to first introduce a more concise format (such
as resourceQuantity) for all purposes of specifying desired resource
profiles (clear filters, quota guarantee, min_allocatable_resources and
etc) and start from there.

I suggest to just add the optional agentID atm and we can always add
support for specifying resource requirements in the future. And since its
semantic is far away from "requesting resources", I suggest keeping the
name of CLEAR(or REMOVE)_FILTERS.

What do you think?

-Meng

On Tue, Dec 4, 2018 at 1:50 AM Benjamin Bannier <
benjamin.bannier@mesosphere.io> wrote:

> Hi Meng,
>
> thanks for the proposal, I agree that the way these two aspects are
> currently entangled is an issue (e.g., for master/allocator performance
> reasons). At the same time, the workflow we currently expect frameworks to
> follow is conceptually not hard to grasp,
>
> (1) If framework has work then
> (i) put framework in unsuppressed state,
> (ii) decline not matching offers with a long filter duration.
> (2) If an offer matches, accept.
> (3) If there is no more work, suppress. GOTO (1).
>
> Here the framework does not need to track its filters across allocation
> cycles (they are an unexposed implementation detail of the hierarchical
> allocator anyway) which e.g., allows metaschedulers like Marathon or Apache
> Aurora to decouple the scheduling of different workloads. A downside of
> this interface is that
>
> * there is little incentive for frameworks to use SUPPRESS in addition to
> filters, and
> * unsupression is all-or-nothing, forcing the master to send potentially
> all unused resources to one framework, even if it is only interested in a
> fraction. This can cause, at least temporal, non-optimal allocation
> behavior.
>
> It seems to me that even though adding UNSUPPRESS and CLEAR_FILTERS would
> give frameworks more control, it would only be a small improvement. In
> above framework workflow we would allow a small improvement if the
> framework knows that a new workload matches a previously running workflow
> (i.e., it can infer that no filters for the resources it is interested in
> is active) so that it can issue UNSUPPRESS instead of CLEAR_FILTERS.
> Incidentally, there seems little local benefit for frameworks to use these
> new calls as they’d mostly help the master and I’d imagine we wouldn’t want
> to imply that clearing filters would unsuppress the framework. This seems
> too little to me, and we run the danger that frameworks would just always
> pair UNSUPPRESS and CLEAR_FILTERS (or keep using REVIVE) to simplify their
> workflow. If we’d model the interface more along framework needs, there
> would be clear benefit which would help adoption.
>
> A more interesting call for me would be REQUEST_RESOURCES. It maps very
> well onto framework needs (e.g., “I want to launch a task requiring these
> resources”), and clearly communicates a requirement to the master so that
> it e.g., doesn’t need to remove all filters for a framework. It also seems
> to fit the allocator model pretty well which doesn’t explicitly expose
> filters. I believe implementing it should not be too hard if we'd restrict
> its semantics to only communicate to the master that a framework _is
> interested in a certain resource_ without promising that the framework
> _will get them in any amount of time_ (i.e., no need to rethink DRF
> fairness semantics in the hierarchical allocator). I also feel that if we
> have REQUEST_RESOURCES we would have some freedom to perform further
> improvements around filters in the master/allocator (e.g., filter
> compatification, work around increasing the default filter duration, …).
>
>
> A possible zeroth implementation for REQUEST_RESOURCES with the
> hierarchical allocator would be to have it remove any filters containing
> the requested resource and likely to unsuppress the framework. A
> REQUEST_RESOURCES call would hold an optional resource and an optional
> AgentID; the case where both are empty would map onto CLEAR_FILTERS.
>
>
> That being said, it might still be useful to in the future expose a
> low-level knob for framework allowing them to explicitly manage their
> filters.
>
>
> Cheers,
>
> Benjamin
>
>
> On Dec 4, 2018, at 5:44 AM, Meng Zhu <mz...@mesosphere.com> wrote:
> >
> > See my comments inline.
> >
> > On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:
> >
> >> Thanks Meng for the explanation.
> >>
> >> I imagine most frameworks do not remember what stuff they filtered much
> >> less figure out how previously filtered stuff  can satisfy new
> operations.
> >> That sounds complicated!
> >>
> >
> > Frameworks do not need to remember what filters they currently have. Only
> > knowing
> > the resource profiles of the current vs. the previous operation would
> help
> > a lot.
> > But yeah, even this may be too much complexity.
> >
> >>
> >> But I like your example. So a suggestion we could make to frameworks
> could
> >> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down,
> new
> >> app (they might want to use this even if they aren't suppressed!); and
> to
> >> use UNSUPPRESS when they are rescheduling old work?
> >>
> >
> > Yeah, these are the general guideline.
> >
> > I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
> > suppression.
> > Framework should consider clearing filters regardless of suppression.
> >
> > Ideally, when there is new different work, old irelavent filters should
> be
> > cleared. This helps
> > framework to get more offers and makes the allocator run faster (filter
> > could take up
> > bulk of the allocation time when they build up). On the flip side,
> calling
> > CLEAR_FILTERS too often
> > might also have performance implications (esp. if the master/allocator
> > actors are already stressed).
> >
> > Thoughts?
> >>
> >> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>
> >>> Hi Vinod:
> >>>
> >>> Yeah, `CLEAR_FILTERS` sounds good.
> >>>
> >>> UNSUPPRESS should be used whenever currently suppressed framework wants
> >> to
> >>> resume getting offers after a previous SUPPRESS call.
> >>>
> >>> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is
> to
> >>> call it whenever the framework wants to clear all the existing filters.
> >>>
> >>> To elaborate it, frameworks decline and accumulate filters when it is
> >>> trying to satisfy a particular set of requirements/constraints to
> perform
> >>> an operation. Once the operation is done and the next operation comes,
> if
> >>> the new operation has the same (or strictly more) resource
> >>> requirements/constraints compared to the last one, then it is more
> >>> efficient to KEEP the existing filters instead of getting useless
> offers
> >>> and rebuild the filters again.
> >>>
> >>> On the other hand, if the requirements/constraints are different (i.e.
> >> some
> >>> of the previous requirements could be loosened), then it means the
> >> existing
> >>> filter no longer make sense. Then it might be a good idea to clear all
> >> the
> >>> existing filters to improve the chance of getting more offers.
> >>>
> >>> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> >>> `REVIVE` call, its usage should be independent of suppression/revival.
> >> The
> >>> decision to clear the filters only depends on whether the existing
> >> filters
> >>> make sense for the current operation constraints/requirements.
> >>>
> >>> Examples:
> >>> If a framework first launches a task, then wants to launch a
> replacement
> >>> task (because the first task failed), then it should keep the filters
> >> built
> >>> up during the first launch. However, if the framework wants to launch a
> >>> second task with a completely different resource profile, then clearing
> >>> filters might help to get more (otherwise filtered) offers and hence
> >> speed
> >>> up the deployment.
> >>>
> >>> -Meng
> >>>
> >>> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org>
> wrote:
> >>>
> >>>> Hi Meng,
> >>>>
> >>>> What would be the recommendation for framework authors on when to use
> >>>> UNSUPPRESS vs CLEAR_FILTER?
> >>>>
> >>>> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >>>>
> >>>> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >>>>
> >>>>> Hi:
> >>>>>
> >>>>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
> >> and
> >>>>> clear_filter in order to decouple the dual-semantics of the current
> >>> revive
> >>>>> call.
> >>>>>
> >>>>> As pointed out in the Mesos framework scalability guide
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >>>> ,
> >>>>> utilizing the suppress
> >>>>> <
> >>>
> >>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >>>>> call is the key to get your cluster to a large number of frameworks
> >>>>> <
> >>>
> >>
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >>>> .
> >>>>> In short, when a framework is idling with no intention to launch any
> >>> tasks,
> >>>>> it should suppress to inform the Mesos to stop sending any more
> >> offers.
> >>> And
> >>>>> the framework should revive
> >>>>> <
> >>>
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >>>>> when new work arrives. This way, the allocator will skip the
> framework
> >>> when
> >>>>> performing resource allocations. As a result, thorny issues such as
> >>> offer
> >>>>> starvation and resource fragmentation would be greatly mitigated.
> >>>>>
> >>>>> That being said. The suppress/revive calls currently are a little bit
> >>>>> unwieldy due to MESOS-9028
> >>>>> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>>>>
> >>>>> The revive call has two semantics. It unsuppresses the framework AND
> >>>>> clears all the existing filters. The later makes the revive call
> >>>>> non-idempotent. And sometimes users may want to keep the existing
> >>> filters
> >>>>> when reiving which is not possible atm.
> >>>>>
> >>>>> To decouple the semantics, as suggested in the ticket, we propose to
> >> add
> >>>>> two new V1 scheduler calls:
> >>>>>
> >>>>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> >>>>> (2) `CLEAR_FILTER` call will explicitly clear all the existing
> >> filters.
> >>>>>
> >>>>> To make life easier, both calls will return 200 OK (as opposed to 202
> >>>>> returned by most existing scheduler calls, including `SUPPRESS` and
> >>>>> `REVIVE`).
> >>>>>
> >>>>> We will keep the revive call and its semantics (i.e. unsupppress AND
> >>>>> clear filters) for backward compatibility.
> >>>>>
> >>>>> Note, the changes are proposed for V1 API only. Thus, once the
> changes
> >>>>> are landed, framework developers are encouraged to move to V1 API to
> >>> take
> >>>>> advantage of the new calls (among many other benefits).
> >>>>>
> >>>>> Any feedback/comments are welcome.
> >>>>>
> >>>>> -Meng
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Bannier <be...@mesosphere.io>.

Hi Meng,

thanks for the proposal, I agree that the way these two aspects are currently entangled is an issue (e.g., for master/allocator performance reasons). At the same time, the workflow we currently expect frameworks to follow is conceptually not hard to grasp,

(1) If framework has work then
(i) put framework in unsuppressed state,
(ii) decline not matching offers with a long filter duration.
(2) If an offer matches, accept.
(3) If there is no more work, suppress. GOTO (1).

Here the framework does not need to track its filters across allocation cycles (they are an unexposed implementation detail of the hierarchical allocator anyway) which e.g., allows metaschedulers like Marathon or Apache Aurora to decouple the scheduling of different workloads. A downside of this interface is that

* there is little incentive for frameworks to use SUPPRESS in addition to filters, and
* unsupression is all-or-nothing, forcing the master to send potentially all unused resources to one framework, even if it is only interested in a fraction. This can cause, at least temporal, non-optimal allocation behavior.

It seems to me that even though adding UNSUPPRESS and CLEAR_FILTERS would give frameworks more control, it would only be a small improvement. In above framework workflow we would allow a small improvement if the framework knows that a new workload matches a previously running workflow (i.e., it can infer that no filters for the resources it is interested in is active) so that it can issue UNSUPPRESS instead of CLEAR_FILTERS. Incidentally, there seems little local benefit for frameworks to use these new calls as they’d mostly help the master and I’d imagine we wouldn’t want to imply that clearing filters would unsuppress the framework. This seems too little to me, and we run the danger that frameworks would just always pair UNSUPPRESS and CLEAR_FILTERS (or keep using REVIVE) to simplify their workflow. If we’d model the interface more along framework needs, there would be clear benefit which would help adoption.

A more interesting call for me would be REQUEST_RESOURCES. It maps very well onto framework needs (e.g., “I want to launch a task requiring these resources”), and clearly communicates a requirement to the master so that it e.g., doesn’t need to remove all filters for a framework. It also seems to fit the allocator model pretty well which doesn’t explicitly expose filters. I believe implementing it should not be too hard if we'd restrict its semantics to only communicate to the master that a framework _is interested in a certain resource_ without promising that the framework _will get them in any amount of time_ (i.e., no need to rethink DRF fairness semantics in the hierarchical allocator). I also feel that if we have REQUEST_RESOURCES we would have some freedom to perform further improvements around filters in the master/allocator (e.g., filter compatification, work around increasing the default filter duration, …).

A possible zeroth implementation for REQUEST_RESOURCES with the hierarchical allocator would be to have it remove any filters containing the requested resource and likely to unsuppress the framework. A REQUEST_RESOURCES call would hold an optional resource and an optional AgentID; the case where both are empty would map onto CLEAR_FILTERS.

That being said, it might still be useful to in the future expose a low-level knob for framework allowing them to explicitly manage their filters.

Cheers,

Benjamin

On Dec 4, 2018, at 5:44 AM, Meng Zhu <mz...@mesosphere.com> wrote:
> 
> See my comments inline.
> 
> On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:
> 
>> Thanks Meng for the explanation.
>> 
>> I imagine most frameworks do not remember what stuff they filtered much
>> less figure out how previously filtered stuff  can satisfy new operations.
>> That sounds complicated!
>> 
> 
> Frameworks do not need to remember what filters they currently have. Only
> knowing
> the resource profiles of the current vs. the previous operation would help
> a lot.
> But yeah, even this may be too much complexity.
> 
>> 
>> But I like your example. So a suggestion we could make to frameworks could
>> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
>> app (they might want to use this even if they aren't suppressed!); and to
>> use UNSUPPRESS when they are rescheduling old work?
>> 
> 
> Yeah, these are the general guideline.
> 
> I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
> suppression.
> Framework should consider clearing filters regardless of suppression.
> 
> Ideally, when there is new different work, old irelavent filters should be
> cleared. This helps
> framework to get more offers and makes the allocator run faster (filter
> could take up
> bulk of the allocation time when they build up). On the flip side, calling
> CLEAR_FILTERS too often
> might also have performance implications (esp. if the master/allocator
> actors are already stressed).
> 
> Thoughts?
>> 
>> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
>> 
>>> Hi Vinod:
>>> 
>>> Yeah, `CLEAR_FILTERS` sounds good.
>>> 
>>> UNSUPPRESS should be used whenever currently suppressed framework wants
>> to
>>> resume getting offers after a previous SUPPRESS call.
>>> 
>>> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
>>> call it whenever the framework wants to clear all the existing filters.
>>> 
>>> To elaborate it, frameworks decline and accumulate filters when it is
>>> trying to satisfy a particular set of requirements/constraints to perform
>>> an operation. Once the operation is done and the next operation comes, if
>>> the new operation has the same (or strictly more) resource
>>> requirements/constraints compared to the last one, then it is more
>>> efficient to KEEP the existing filters instead of getting useless offers
>>> and rebuild the filters again.
>>> 
>>> On the other hand, if the requirements/constraints are different (i.e.
>> some
>>> of the previous requirements could be loosened), then it means the
>> existing
>>> filter no longer make sense. Then it might be a good idea to clear all
>> the
>>> existing filters to improve the chance of getting more offers.
>>> 
>>> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
>>> `REVIVE` call, its usage should be independent of suppression/revival.
>> The
>>> decision to clear the filters only depends on whether the existing
>> filters
>>> make sense for the current operation constraints/requirements.
>>> 
>>> Examples:
>>> If a framework first launches a task, then wants to launch a replacement
>>> task (because the first task failed), then it should keep the filters
>> built
>>> up during the first launch. However, if the framework wants to launch a
>>> second task with a completely different resource profile, then clearing
>>> filters might help to get more (otherwise filtered) offers and hence
>> speed
>>> up the deployment.
>>> 
>>> -Meng
>>> 
>>> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:
>>> 
>>>> Hi Meng,
>>>> 
>>>> What would be the recommendation for framework authors on when to use
>>>> UNSUPPRESS vs CLEAR_FILTER?
>>>> 
>>>> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
>>>> 
>>>> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
>>>> 
>>>>> Hi:
>>>>> 
>>>>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
>> and
>>>>> clear_filter in order to decouple the dual-semantics of the current
>>> revive
>>>>> call.
>>>>> 
>>>>> As pointed out in the Mesos framework scalability guide
>>>>> <
>>> 
>> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
>>>> ,
>>>>> utilizing the suppress
>>>>> <
>>> 
>> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
>>>>> call is the key to get your cluster to a large number of frameworks
>>>>> <
>>> 
>> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
>>>> .
>>>>> In short, when a framework is idling with no intention to launch any
>>> tasks,
>>>>> it should suppress to inform the Mesos to stop sending any more
>> offers.
>>> And
>>>>> the framework should revive
>>>>> <
>>> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
>>>>> when new work arrives. This way, the allocator will skip the framework
>>> when
>>>>> performing resource allocations. As a result, thorny issues such as
>>> offer
>>>>> starvation and resource fragmentation would be greatly mitigated.
>>>>> 
>>>>> That being said. The suppress/revive calls currently are a little bit
>>>>> unwieldy due to MESOS-9028
>>>>> <https://issues.apache.org/jira/browse/MESOS-9028>:
>>>>> 
>>>>> The revive call has two semantics. It unsuppresses the framework AND
>>>>> clears all the existing filters. The later makes the revive call
>>>>> non-idempotent. And sometimes users may want to keep the existing
>>> filters
>>>>> when reiving which is not possible atm.
>>>>> 
>>>>> To decouple the semantics, as suggested in the ticket, we propose to
>> add
>>>>> two new V1 scheduler calls:
>>>>> 
>>>>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
>>>>> (2) `CLEAR_FILTER` call will explicitly clear all the existing
>> filters.
>>>>> 
>>>>> To make life easier, both calls will return 200 OK (as opposed to 202
>>>>> returned by most existing scheduler calls, including `SUPPRESS` and
>>>>> `REVIVE`).
>>>>> 
>>>>> We will keep the revive call and its semantics (i.e. unsupppress AND
>>>>> clear filters) for backward compatibility.
>>>>> 
>>>>> Note, the changes are proposed for V1 API only. Thus, once the changes
>>>>> are landed, framework developers are encouraged to move to V1 API to
>>> take
>>>>> advantage of the new calls (among many other benefits).
>>>>> 
>>>>> Any feedback/comments are welcome.
>>>>> 
>>>>> -Meng
>>>>> 
>>>> 
>>> 
>>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Benjamin Bannier <be...@mesosphere.io>.

Hi Meng,

thanks for the proposal, I agree that the way these two aspects are currently entangled is an issue (e.g., for master/allocator performance reasons). At the same time, the workflow we currently expect frameworks to follow is conceptually not hard to grasp,

(1) If framework has work then
(i) put framework in unsuppressed state,
(ii) decline not matching offers with a long filter duration.
(2) If an offer matches, accept.
(3) If there is no more work, suppress. GOTO (1).

Here the framework does not need to track its filters across allocation cycles (they are an unexposed implementation detail of the hierarchical allocator anyway) which e.g., allows metaschedulers like Marathon or Apache Aurora to decouple the scheduling of different workloads. A downside of this interface is that

* there is little incentive for frameworks to use SUPPRESS in addition to filters, and
* unsupression is all-or-nothing, forcing the master to send potentially all unused resources to one framework, even if it is only interested in a fraction. This can cause, at least temporal, non-optimal allocation behavior.

It seems to me that even though adding UNSUPPRESS and CLEAR_FILTERS would give frameworks more control, it would only be a small improvement. In above framework workflow we would allow a small improvement if the framework knows that a new workload matches a previously running workflow (i.e., it can infer that no filters for the resources it is interested in is active) so that it can issue UNSUPPRESS instead of CLEAR_FILTERS. Incidentally, there seems little local benefit for frameworks to use these new calls as they’d mostly help the master and I’d imagine we wouldn’t want to imply that clearing filters would unsuppress the framework. This seems too little to me, and we run the danger that frameworks would just always pair UNSUPPRESS and CLEAR_FILTERS (or keep using REVIVE) to simplify their workflow. If we’d model the interface more along framework needs, there would be clear benefit which would help adoption.

A more interesting call for me would be REQUEST_RESOURCES. It maps very well onto framework needs (e.g., “I want to launch a task requiring these resources”), and clearly communicates a requirement to the master so that it e.g., doesn’t need to remove all filters for a framework. It also seems to fit the allocator model pretty well which doesn’t explicitly expose filters. I believe implementing it should not be too hard if we'd restrict its semantics to only communicate to the master that a framework _is interested in a certain resource_ without promising that the framework _will get them in any amount of time_ (i.e., no need to rethink DRF fairness semantics in the hierarchical allocator). I also feel that if we have REQUEST_RESOURCES we would have some freedom to perform further improvements around filters in the master/allocator (e.g., filter compatification, work around increasing the default filter duration, …).

A possible zeroth implementation for REQUEST_RESOURCES with the hierarchical allocator would be to have it remove any filters containing the requested resource and likely to unsuppress the framework. A REQUEST_RESOURCES call would hold an optional resource and an optional AgentID; the case where both are empty would map onto CLEAR_FILTERS.

That being said, it might still be useful to in the future expose a low-level knob for framework allowing them to explicitly manage their filters.

Cheers,

Benjamin

On Dec 4, 2018, at 5:44 AM, Meng Zhu <mz...@mesosphere.com> wrote:
> 
> See my comments inline.
> 
> On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:
> 
>> Thanks Meng for the explanation.
>> 
>> I imagine most frameworks do not remember what stuff they filtered much
>> less figure out how previously filtered stuff  can satisfy new operations.
>> That sounds complicated!
>> 
> 
> Frameworks do not need to remember what filters they currently have. Only
> knowing
> the resource profiles of the current vs. the previous operation would help
> a lot.
> But yeah, even this may be too much complexity.
> 
>> 
>> But I like your example. So a suggestion we could make to frameworks could
>> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
>> app (they might want to use this even if they aren't suppressed!); and to
>> use UNSUPPRESS when they are rescheduling old work?
>> 
> 
> Yeah, these are the general guideline.
> 
> I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
> suppression.
> Framework should consider clearing filters regardless of suppression.
> 
> Ideally, when there is new different work, old irelavent filters should be
> cleared. This helps
> framework to get more offers and makes the allocator run faster (filter
> could take up
> bulk of the allocation time when they build up). On the flip side, calling
> CLEAR_FILTERS too often
> might also have performance implications (esp. if the master/allocator
> actors are already stressed).
> 
> Thoughts?
>> 
>> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
>> 
>>> Hi Vinod:
>>> 
>>> Yeah, `CLEAR_FILTERS` sounds good.
>>> 
>>> UNSUPPRESS should be used whenever currently suppressed framework wants
>> to
>>> resume getting offers after a previous SUPPRESS call.
>>> 
>>> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
>>> call it whenever the framework wants to clear all the existing filters.
>>> 
>>> To elaborate it, frameworks decline and accumulate filters when it is
>>> trying to satisfy a particular set of requirements/constraints to perform
>>> an operation. Once the operation is done and the next operation comes, if
>>> the new operation has the same (or strictly more) resource
>>> requirements/constraints compared to the last one, then it is more
>>> efficient to KEEP the existing filters instead of getting useless offers
>>> and rebuild the filters again.
>>> 
>>> On the other hand, if the requirements/constraints are different (i.e.
>> some
>>> of the previous requirements could be loosened), then it means the
>> existing
>>> filter no longer make sense. Then it might be a good idea to clear all
>> the
>>> existing filters to improve the chance of getting more offers.
>>> 
>>> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
>>> `REVIVE` call, its usage should be independent of suppression/revival.
>> The
>>> decision to clear the filters only depends on whether the existing
>> filters
>>> make sense for the current operation constraints/requirements.
>>> 
>>> Examples:
>>> If a framework first launches a task, then wants to launch a replacement
>>> task (because the first task failed), then it should keep the filters
>> built
>>> up during the first launch. However, if the framework wants to launch a
>>> second task with a completely different resource profile, then clearing
>>> filters might help to get more (otherwise filtered) offers and hence
>> speed
>>> up the deployment.
>>> 
>>> -Meng
>>> 
>>> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:
>>> 
>>>> Hi Meng,
>>>> 
>>>> What would be the recommendation for framework authors on when to use
>>>> UNSUPPRESS vs CLEAR_FILTER?
>>>> 
>>>> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
>>>> 
>>>> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
>>>> 
>>>>> Hi:
>>>>> 
>>>>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
>> and
>>>>> clear_filter in order to decouple the dual-semantics of the current
>>> revive
>>>>> call.
>>>>> 
>>>>> As pointed out in the Mesos framework scalability guide
>>>>> <
>>> 
>> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
>>>> ,
>>>>> utilizing the suppress
>>>>> <
>>> 
>> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
>>>>> call is the key to get your cluster to a large number of frameworks
>>>>> <
>>> 
>> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
>>>> .
>>>>> In short, when a framework is idling with no intention to launch any
>>> tasks,
>>>>> it should suppress to inform the Mesos to stop sending any more
>> offers.
>>> And
>>>>> the framework should revive
>>>>> <
>>> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
>>>>> when new work arrives. This way, the allocator will skip the framework
>>> when
>>>>> performing resource allocations. As a result, thorny issues such as
>>> offer
>>>>> starvation and resource fragmentation would be greatly mitigated.
>>>>> 
>>>>> That being said. The suppress/revive calls currently are a little bit
>>>>> unwieldy due to MESOS-9028
>>>>> <https://issues.apache.org/jira/browse/MESOS-9028>:
>>>>> 
>>>>> The revive call has two semantics. It unsuppresses the framework AND
>>>>> clears all the existing filters. The later makes the revive call
>>>>> non-idempotent. And sometimes users may want to keep the existing
>>> filters
>>>>> when reiving which is not possible atm.
>>>>> 
>>>>> To decouple the semantics, as suggested in the ticket, we propose to
>> add
>>>>> two new V1 scheduler calls:
>>>>> 
>>>>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
>>>>> (2) `CLEAR_FILTER` call will explicitly clear all the existing
>> filters.
>>>>> 
>>>>> To make life easier, both calls will return 200 OK (as opposed to 202
>>>>> returned by most existing scheduler calls, including `SUPPRESS` and
>>>>> `REVIVE`).
>>>>> 
>>>>> We will keep the revive call and its semantics (i.e. unsupppress AND
>>>>> clear filters) for backward compatibility.
>>>>> 
>>>>> Note, the changes are proposed for V1 API only. Thus, once the changes
>>>>> are landed, framework developers are encouraged to move to V1 API to
>>> take
>>>>> advantage of the new calls (among many other benefits).
>>>>> 
>>>>> Any feedback/comments are welcome.
>>>>> 
>>>>> -Meng
>>>>> 
>>>> 
>>> 
>>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

See my comments inline.

On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:

> Thanks Meng for the explanation.
>
> I imagine most frameworks do not remember what stuff they filtered much
> less figure out how previously filtered stuff  can satisfy new operations.
> That sounds complicated!
>

Frameworks do not need to remember what filters they currently have. Only
knowing
the resource profiles of the current vs. the previous operation would help
a lot.
But yeah, even this may be too much complexity.

>
> But I like your example. So a suggestion we could make to frameworks could
> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
> app (they might want to use this even if they aren't suppressed!); and to
> use UNSUPPRESS when they are rescheduling old work?
>

Yeah, these are the general guideline.

I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
suppression.
Framework should consider clearing filters regardless of suppression.

Ideally, when there is new different work, old irelavent filters should be
cleared. This helps
framework to get more offers and makes the allocator run faster (filter
could take up
bulk of the allocation time when they build up). On the flip side, calling
CLEAR_FILTERS too often
might also have performance implications (esp. if the master/allocator
actors are already stressed).

Thoughts?
>
> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
>
> > Hi Vinod:
> >
> > Yeah, `CLEAR_FILTERS` sounds good.
> >
> > UNSUPPRESS should be used whenever currently suppressed framework wants
> to
> > resume getting offers after a previous SUPPRESS call.
> >
> > As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> > call it whenever the framework wants to clear all the existing filters.
> >
> > To elaborate it, frameworks decline and accumulate filters when it is
> > trying to satisfy a particular set of requirements/constraints to perform
> > an operation. Once the operation is done and the next operation comes, if
> > the new operation has the same (or strictly more) resource
> > requirements/constraints compared to the last one, then it is more
> > efficient to KEEP the existing filters instead of getting useless offers
> > and rebuild the filters again.
> >
> > On the other hand, if the requirements/constraints are different (i.e.
> some
> > of the previous requirements could be loosened), then it means the
> existing
> > filter no longer make sense. Then it might be a good idea to clear all
> the
> > existing filters to improve the chance of getting more offers.
> >
> > Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> > `REVIVE` call, its usage should be independent of suppression/revival.
> The
> > decision to clear the filters only depends on whether the existing
> filters
> > make sense for the current operation constraints/requirements.
> >
> > Examples:
> > If a framework first launches a task, then wants to launch a replacement
> > task (because the first task failed), then it should keep the filters
> built
> > up during the first launch. However, if the framework wants to launch a
> > second task with a completely different resource profile, then clearing
> > filters might help to get more (otherwise filtered) offers and hence
> speed
> > up the deployment.
> >
> > -Meng
> >
> > On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:
> >
> > > Hi Meng,
> > >
> > > What would be the recommendation for framework authors on when to use
> > > UNSUPPRESS vs CLEAR_FILTER?
> > >
> > > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> > >
> > > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> > >
> > >> Hi:
> > >>
> > >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
> and
> > >> clear_filter in order to decouple the dual-semantics of the current
> > revive
> > >> call.
> > >>
> > >> As pointed out in the Mesos framework scalability guide
> > >> <
> >
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> > >,
> > >> utilizing the suppress
> > >> <
> >
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> > >> call is the key to get your cluster to a large number of frameworks
> > >> <
> >
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> > >.
> > >> In short, when a framework is idling with no intention to launch any
> > tasks,
> > >> it should suppress to inform the Mesos to stop sending any more
> offers.
> > And
> > >> the framework should revive
> > >> <
> > http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> > >> when new work arrives. This way, the allocator will skip the framework
> > when
> > >> performing resource allocations. As a result, thorny issues such as
> > offer
> > >> starvation and resource fragmentation would be greatly mitigated.
> > >>
> > >> That being said. The suppress/revive calls currently are a little bit
> > >> unwieldy due to MESOS-9028
> > >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> > >>
> > >> The revive call has two semantics. It unsuppresses the framework AND
> > >> clears all the existing filters. The later makes the revive call
> > >> non-idempotent. And sometimes users may want to keep the existing
> > filters
> > >> when reiving which is not possible atm.
> > >>
> > >> To decouple the semantics, as suggested in the ticket, we propose to
> add
> > >> two new V1 scheduler calls:
> > >>
> > >> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> > >> (2) `CLEAR_FILTER` call will explicitly clear all the existing
> filters.
> > >>
> > >> To make life easier, both calls will return 200 OK (as opposed to 202
> > >> returned by most existing scheduler calls, including `SUPPRESS` and
> > >> `REVIVE`).
> > >>
> > >> We will keep the revive call and its semantics (i.e. unsupppress AND
> > >> clear filters) for backward compatibility.
> > >>
> > >> Note, the changes are proposed for V1 API only. Thus, once the changes
> > >> are landed, framework developers are encouraged to move to V1 API to
> > take
> > >> advantage of the new calls (among many other benefits).
> > >>
> > >> Any feedback/comments are welcome.
> > >>
> > >> -Meng
> > >>
> > >
> >
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

See my comments inline.

On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vi...@apache.org> wrote:

> Thanks Meng for the explanation.
>
> I imagine most frameworks do not remember what stuff they filtered much
> less figure out how previously filtered stuff  can satisfy new operations.
> That sounds complicated!
>

Frameworks do not need to remember what filters they currently have. Only
knowing
the resource profiles of the current vs. the previous operation would help
a lot.
But yeah, even this may be too much complexity.

>
> But I like your example. So a suggestion we could make to frameworks could
> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
> app (they might want to use this even if they aren't suppressed!); and to
> use UNSUPPRESS when they are rescheduling old work?
>

Yeah, these are the general guideline.

I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
suppression.
Framework should consider clearing filters regardless of suppression.

Ideally, when there is new different work, old irelavent filters should be
cleared. This helps
framework to get more offers and makes the allocator run faster (filter
could take up
bulk of the allocation time when they build up). On the flip side, calling
CLEAR_FILTERS too often
might also have performance implications (esp. if the master/allocator
actors are already stressed).

Thoughts?
>
> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:
>
> > Hi Vinod:
> >
> > Yeah, `CLEAR_FILTERS` sounds good.
> >
> > UNSUPPRESS should be used whenever currently suppressed framework wants
> to
> > resume getting offers after a previous SUPPRESS call.
> >
> > As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> > call it whenever the framework wants to clear all the existing filters.
> >
> > To elaborate it, frameworks decline and accumulate filters when it is
> > trying to satisfy a particular set of requirements/constraints to perform
> > an operation. Once the operation is done and the next operation comes, if
> > the new operation has the same (or strictly more) resource
> > requirements/constraints compared to the last one, then it is more
> > efficient to KEEP the existing filters instead of getting useless offers
> > and rebuild the filters again.
> >
> > On the other hand, if the requirements/constraints are different (i.e.
> some
> > of the previous requirements could be loosened), then it means the
> existing
> > filter no longer make sense. Then it might be a good idea to clear all
> the
> > existing filters to improve the chance of getting more offers.
> >
> > Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> > `REVIVE` call, its usage should be independent of suppression/revival.
> The
> > decision to clear the filters only depends on whether the existing
> filters
> > make sense for the current operation constraints/requirements.
> >
> > Examples:
> > If a framework first launches a task, then wants to launch a replacement
> > task (because the first task failed), then it should keep the filters
> built
> > up during the first launch. However, if the framework wants to launch a
> > second task with a completely different resource profile, then clearing
> > filters might help to get more (otherwise filtered) offers and hence
> speed
> > up the deployment.
> >
> > -Meng
> >
> > On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:
> >
> > > Hi Meng,
> > >
> > > What would be the recommendation for framework authors on when to use
> > > UNSUPPRESS vs CLEAR_FILTER?
> > >
> > > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> > >
> > > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> > >
> > >> Hi:
> > >>
> > >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
> and
> > >> clear_filter in order to decouple the dual-semantics of the current
> > revive
> > >> call.
> > >>
> > >> As pointed out in the Mesos framework scalability guide
> > >> <
> >
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> > >,
> > >> utilizing the suppress
> > >> <
> >
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> > >> call is the key to get your cluster to a large number of frameworks
> > >> <
> >
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> > >.
> > >> In short, when a framework is idling with no intention to launch any
> > tasks,
> > >> it should suppress to inform the Mesos to stop sending any more
> offers.
> > And
> > >> the framework should revive
> > >> <
> > http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> > >> when new work arrives. This way, the allocator will skip the framework
> > when
> > >> performing resource allocations. As a result, thorny issues such as
> > offer
> > >> starvation and resource fragmentation would be greatly mitigated.
> > >>
> > >> That being said. The suppress/revive calls currently are a little bit
> > >> unwieldy due to MESOS-9028
> > >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> > >>
> > >> The revive call has two semantics. It unsuppresses the framework AND
> > >> clears all the existing filters. The later makes the revive call
> > >> non-idempotent. And sometimes users may want to keep the existing
> > filters
> > >> when reiving which is not possible atm.
> > >>
> > >> To decouple the semantics, as suggested in the ticket, we propose to
> add
> > >> two new V1 scheduler calls:
> > >>
> > >> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> > >> (2) `CLEAR_FILTER` call will explicitly clear all the existing
> filters.
> > >>
> > >> To make life easier, both calls will return 200 OK (as opposed to 202
> > >> returned by most existing scheduler calls, including `SUPPRESS` and
> > >> `REVIVE`).
> > >>
> > >> We will keep the revive call and its semantics (i.e. unsupppress AND
> > >> clear filters) for backward compatibility.
> > >>
> > >> Note, the changes are proposed for V1 API only. Thus, once the changes
> > >> are landed, framework developers are encouraged to move to V1 API to
> > take
> > >> advantage of the new calls (among many other benefits).
> > >>
> > >> Any feedback/comments are welcome.
> > >>
> > >> -Meng
> > >>
> > >
> >
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Vinod Kone <vi...@apache.org>.

Thanks Meng for the explanation.

I imagine most frameworks do not remember what stuff they filtered much
less figure out how previously filtered stuff  can satisfy new operations.
That sounds complicated!

But I like your example. So a suggestion we could make to frameworks could
be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
app (they might want to use this even if they aren't suppressed!); and to
use UNSUPPRESS when they are rescheduling old work?

Thoughts?

On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:

> Hi Vinod:
>
> Yeah, `CLEAR_FILTERS` sounds good.
>
> UNSUPPRESS should be used whenever currently suppressed framework wants to
> resume getting offers after a previous SUPPRESS call.
>
> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> call it whenever the framework wants to clear all the existing filters.
>
> To elaborate it, frameworks decline and accumulate filters when it is
> trying to satisfy a particular set of requirements/constraints to perform
> an operation. Once the operation is done and the next operation comes, if
> the new operation has the same (or strictly more) resource
> requirements/constraints compared to the last one, then it is more
> efficient to KEEP the existing filters instead of getting useless offers
> and rebuild the filters again.
>
> On the other hand, if the requirements/constraints are different (i.e. some
> of the previous requirements could be loosened), then it means the existing
> filter no longer make sense. Then it might be a good idea to clear all the
> existing filters to improve the chance of getting more offers.
>
> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> `REVIVE` call, its usage should be independent of suppression/revival. The
> decision to clear the filters only depends on whether the existing filters
> make sense for the current operation constraints/requirements.
>
> Examples:
> If a framework first launches a task, then wants to launch a replacement
> task (because the first task failed), then it should keep the filters built
> up during the first launch. However, if the framework wants to launch a
> second task with a completely different resource profile, then clearing
> filters might help to get more (otherwise filtered) offers and hence speed
> up the deployment.
>
> -Meng
>
> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:
>
> > Hi Meng,
> >
> > What would be the recommendation for framework authors on when to use
> > UNSUPPRESS vs CLEAR_FILTER?
> >
> > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >
> > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >
> >> Hi:
> >>
> >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> >> clear_filter in order to decouple the dual-semantics of the current
> revive
> >> call.
> >>
> >> As pointed out in the Mesos framework scalability guide
> >> <
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >,
> >> utilizing the suppress
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >> call is the key to get your cluster to a large number of frameworks
> >> <
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >.
> >> In short, when a framework is idling with no intention to launch any
> tasks,
> >> it should suppress to inform the Mesos to stop sending any more offers.
> And
> >> the framework should revive
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >> when new work arrives. This way, the allocator will skip the framework
> when
> >> performing resource allocations. As a result, thorny issues such as
> offer
> >> starvation and resource fragmentation would be greatly mitigated.
> >>
> >> That being said. The suppress/revive calls currently are a little bit
> >> unwieldy due to MESOS-9028
> >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>
> >> The revive call has two semantics. It unsuppresses the framework AND
> >> clears all the existing filters. The later makes the revive call
> >> non-idempotent. And sometimes users may want to keep the existing
> filters
> >> when reiving which is not possible atm.
> >>
> >> To decouple the semantics, as suggested in the ticket, we propose to add
> >> two new V1 scheduler calls:
> >>
> >> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> >> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
> >>
> >> To make life easier, both calls will return 200 OK (as opposed to 202
> >> returned by most existing scheduler calls, including `SUPPRESS` and
> >> `REVIVE`).
> >>
> >> We will keep the revive call and its semantics (i.e. unsupppress AND
> >> clear filters) for backward compatibility.
> >>
> >> Note, the changes are proposed for V1 API only. Thus, once the changes
> >> are landed, framework developers are encouraged to move to V1 API to
> take
> >> advantage of the new calls (among many other benefits).
> >>
> >> Any feedback/comments are welcome.
> >>
> >> -Meng
> >>
> >
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Vinod Kone <vi...@apache.org>.

Thanks Meng for the explanation.

I imagine most frameworks do not remember what stuff they filtered much
less figure out how previously filtered stuff  can satisfy new operations.
That sounds complicated!

But I like your example. So a suggestion we could make to frameworks could
be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
app (they might want to use this even if they aren't suppressed!); and to
use UNSUPPRESS when they are rescheduling old work?

Thoughts?

On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <mz...@mesosphere.com> wrote:

> Hi Vinod:
>
> Yeah, `CLEAR_FILTERS` sounds good.
>
> UNSUPPRESS should be used whenever currently suppressed framework wants to
> resume getting offers after a previous SUPPRESS call.
>
> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> call it whenever the framework wants to clear all the existing filters.
>
> To elaborate it, frameworks decline and accumulate filters when it is
> trying to satisfy a particular set of requirements/constraints to perform
> an operation. Once the operation is done and the next operation comes, if
> the new operation has the same (or strictly more) resource
> requirements/constraints compared to the last one, then it is more
> efficient to KEEP the existing filters instead of getting useless offers
> and rebuild the filters again.
>
> On the other hand, if the requirements/constraints are different (i.e. some
> of the previous requirements could be loosened), then it means the existing
> filter no longer make sense. Then it might be a good idea to clear all the
> existing filters to improve the chance of getting more offers.
>
> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> `REVIVE` call, its usage should be independent of suppression/revival. The
> decision to clear the filters only depends on whether the existing filters
> make sense for the current operation constraints/requirements.
>
> Examples:
> If a framework first launches a task, then wants to launch a replacement
> task (because the first task failed), then it should keep the filters built
> up during the first launch. However, if the framework wants to launch a
> second task with a completely different resource profile, then clearing
> filters might help to get more (otherwise filtered) offers and hence speed
> up the deployment.
>
> -Meng
>
> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:
>
> > Hi Meng,
> >
> > What would be the recommendation for framework authors on when to use
> > UNSUPPRESS vs CLEAR_FILTER?
> >
> > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >
> > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
> >
> >> Hi:
> >>
> >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> >> clear_filter in order to decouple the dual-semantics of the current
> revive
> >> call.
> >>
> >> As pointed out in the Mesos framework scalability guide
> >> <
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >,
> >> utilizing the suppress
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >> call is the key to get your cluster to a large number of frameworks
> >> <
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >.
> >> In short, when a framework is idling with no intention to launch any
> tasks,
> >> it should suppress to inform the Mesos to stop sending any more offers.
> And
> >> the framework should revive
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >> when new work arrives. This way, the allocator will skip the framework
> when
> >> performing resource allocations. As a result, thorny issues such as
> offer
> >> starvation and resource fragmentation would be greatly mitigated.
> >>
> >> That being said. The suppress/revive calls currently are a little bit
> >> unwieldy due to MESOS-9028
> >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>
> >> The revive call has two semantics. It unsuppresses the framework AND
> >> clears all the existing filters. The later makes the revive call
> >> non-idempotent. And sometimes users may want to keep the existing
> filters
> >> when reiving which is not possible atm.
> >>
> >> To decouple the semantics, as suggested in the ticket, we propose to add
> >> two new V1 scheduler calls:
> >>
> >> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> >> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
> >>
> >> To make life easier, both calls will return 200 OK (as opposed to 202
> >> returned by most existing scheduler calls, including `SUPPRESS` and
> >> `REVIVE`).
> >>
> >> We will keep the revive call and its semantics (i.e. unsupppress AND
> >> clear filters) for backward compatibility.
> >>
> >> Note, the changes are proposed for V1 API only. Thus, once the changes
> >> are landed, framework developers are encouraged to move to V1 API to
> take
> >> advantage of the new calls (among many other benefits).
> >>
> >> Any feedback/comments are welcome.
> >>
> >> -Meng
> >>
> >
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

Hi Vinod:

Yeah, `CLEAR_FILTERS` sounds good.

UNSUPPRESS should be used whenever currently suppressed framework wants to
resume getting offers after a previous SUPPRESS call.

As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
call it whenever the framework wants to clear all the existing filters.

To elaborate it, frameworks decline and accumulate filters when it is
trying to satisfy a particular set of requirements/constraints to perform
an operation. Once the operation is done and the next operation comes, if
the new operation has the same (or strictly more) resource
requirements/constraints compared to the last one, then it is more
efficient to KEEP the existing filters instead of getting useless offers
and rebuild the filters again.

On the other hand, if the requirements/constraints are different (i.e. some
of the previous requirements could be loosened), then it means the existing
filter no longer make sense. Then it might be a good idea to clear all the
existing filters to improve the chance of getting more offers.

Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
`REVIVE` call, its usage should be independent of suppression/revival. The
decision to clear the filters only depends on whether the existing filters
make sense for the current operation constraints/requirements.

Examples:
If a framework first launches a task, then wants to launch a replacement
task (because the first task failed), then it should keep the filters built
up during the first launch. However, if the framework wants to launch a
second task with a completely different resource profile, then clearing
filters might help to get more (otherwise filtered) offers and hence speed
up the deployment.

-Meng

On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:

> Hi Meng,
>
> What would be the recommendation for framework authors on when to use
> UNSUPPRESS vs CLEAR_FILTER?
>
> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
>
> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
>
>> Hi:
>>
>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
>> clear_filter in order to decouple the dual-semantics of the current revive
>> call.
>>
>> As pointed out in the Mesos framework scalability guide
>> <http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>,
>> utilizing the suppress
>> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
>> call is the key to get your cluster to a large number of frameworks
>> <https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>.
>> In short, when a framework is idling with no intention to launch any tasks,
>> it should suppress to inform the Mesos to stop sending any more offers. And
>> the framework should revive
>> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
>> when new work arrives. This way, the allocator will skip the framework when
>> performing resource allocations. As a result, thorny issues such as offer
>> starvation and resource fragmentation would be greatly mitigated.
>>
>> That being said. The suppress/revive calls currently are a little bit
>> unwieldy due to MESOS-9028
>> <https://issues.apache.org/jira/browse/MESOS-9028>:
>>
>> The revive call has two semantics. It unsuppresses the framework AND
>> clears all the existing filters. The later makes the revive call
>> non-idempotent. And sometimes users may want to keep the existing filters
>> when reiving which is not possible atm.
>>
>> To decouple the semantics, as suggested in the ticket, we propose to add
>> two new V1 scheduler calls:
>>
>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
>> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
>>
>> To make life easier, both calls will return 200 OK (as opposed to 202
>> returned by most existing scheduler calls, including `SUPPRESS` and
>> `REVIVE`).
>>
>> We will keep the revive call and its semantics (i.e. unsupppress AND
>> clear filters) for backward compatibility.
>>
>> Note, the changes are proposed for V1 API only. Thus, once the changes
>> are landed, framework developers are encouraged to move to V1 API to take
>> advantage of the new calls (among many other benefits).
>>
>> Any feedback/comments are welcome.
>>
>> -Meng
>>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Meng Zhu <mz...@mesosphere.com>.

Hi Vinod:

Yeah, `CLEAR_FILTERS` sounds good.

UNSUPPRESS should be used whenever currently suppressed framework wants to
resume getting offers after a previous SUPPRESS call.

As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
call it whenever the framework wants to clear all the existing filters.

To elaborate it, frameworks decline and accumulate filters when it is
trying to satisfy a particular set of requirements/constraints to perform
an operation. Once the operation is done and the next operation comes, if
the new operation has the same (or strictly more) resource
requirements/constraints compared to the last one, then it is more
efficient to KEEP the existing filters instead of getting useless offers
and rebuild the filters again.

On the other hand, if the requirements/constraints are different (i.e. some
of the previous requirements could be loosened), then it means the existing
filter no longer make sense. Then it might be a good idea to clear all the
existing filters to improve the chance of getting more offers.

Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
`REVIVE` call, its usage should be independent of suppression/revival. The
decision to clear the filters only depends on whether the existing filters
make sense for the current operation constraints/requirements.

Examples:
If a framework first launches a task, then wants to launch a replacement
task (because the first task failed), then it should keep the filters built
up during the first launch. However, if the framework wants to launch a
second task with a completely different resource profile, then clearing
filters might help to get more (otherwise filtered) offers and hence speed
up the deployment.

-Meng

On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vi...@apache.org> wrote:

> Hi Meng,
>
> What would be the recommendation for framework authors on when to use
> UNSUPPRESS vs CLEAR_FILTER?
>
> Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
>
> On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:
>
>> Hi:
>>
>> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
>> clear_filter in order to decouple the dual-semantics of the current revive
>> call.
>>
>> As pointed out in the Mesos framework scalability guide
>> <http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>,
>> utilizing the suppress
>> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
>> call is the key to get your cluster to a large number of frameworks
>> <https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>.
>> In short, when a framework is idling with no intention to launch any tasks,
>> it should suppress to inform the Mesos to stop sending any more offers. And
>> the framework should revive
>> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
>> when new work arrives. This way, the allocator will skip the framework when
>> performing resource allocations. As a result, thorny issues such as offer
>> starvation and resource fragmentation would be greatly mitigated.
>>
>> That being said. The suppress/revive calls currently are a little bit
>> unwieldy due to MESOS-9028
>> <https://issues.apache.org/jira/browse/MESOS-9028>:
>>
>> The revive call has two semantics. It unsuppresses the framework AND
>> clears all the existing filters. The later makes the revive call
>> non-idempotent. And sometimes users may want to keep the existing filters
>> when reiving which is not possible atm.
>>
>> To decouple the semantics, as suggested in the ticket, we propose to add
>> two new V1 scheduler calls:
>>
>> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
>> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
>>
>> To make life easier, both calls will return 200 OK (as opposed to 202
>> returned by most existing scheduler calls, including `SUPPRESS` and
>> `REVIVE`).
>>
>> We will keep the revive call and its semantics (i.e. unsupppress AND
>> clear filters) for backward compatibility.
>>
>> Note, the changes are proposed for V1 API only. Thus, once the changes
>> are landed, framework developers are encouraged to move to V1 API to take
>> advantage of the new calls (among many other benefits).
>>
>> Any feedback/comments are welcome.
>>
>> -Meng
>>
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Vinod Kone <vi...@apache.org>.

Hi Meng,

What would be the recommendation for framework authors on when to use
UNSUPPRESS vs CLEAR_FILTER?

Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?

On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:

> Hi:
>
> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> clear_filter in order to decouple the dual-semantics of the current revive
> call.
>
> As pointed out in the Mesos framework scalability guide
> <http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>,
> utilizing the suppress
> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> call is the key to get your cluster to a large number of frameworks
> <https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>.
> In short, when a framework is idling with no intention to launch any tasks,
> it should suppress to inform the Mesos to stop sending any more offers. And
> the framework should revive
> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> when new work arrives. This way, the allocator will skip the framework when
> performing resource allocations. As a result, thorny issues such as offer
> starvation and resource fragmentation would be greatly mitigated.
>
> That being said. The suppress/revive calls currently are a little bit
> unwieldy due to MESOS-9028
> <https://issues.apache.org/jira/browse/MESOS-9028>:
>
> The revive call has two semantics. It unsuppresses the framework AND
> clears all the existing filters. The later makes the revive call
> non-idempotent. And sometimes users may want to keep the existing filters
> when reiving which is not possible atm.
>
> To decouple the semantics, as suggested in the ticket, we propose to add
> two new V1 scheduler calls:
>
> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
>
> To make life easier, both calls will return 200 OK (as opposed to 202
> returned by most existing scheduler calls, including `SUPPRESS` and
> `REVIVE`).
>
> We will keep the revive call and its semantics (i.e. unsupppress AND clear
> filters) for backward compatibility.
>
> Note, the changes are proposed for V1 API only. Thus, once the changes are
> landed, framework developers are encouraged to move to V1 API to take
> advantage of the new calls (among many other benefits).
>
> Any feedback/comments are welcome.
>
> -Meng
>

Re: New scheduler API proposal: unsuppress and clear_filter

Posted by Vinod Kone <vi...@apache.org>.

Hi Meng,

What would be the recommendation for framework authors on when to use
UNSUPPRESS vs CLEAR_FILTER?

Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?

On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <mz...@mesosphere.com> wrote:

> Hi:
>
> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> clear_filter in order to decouple the dual-semantics of the current revive
> call.
>
> As pointed out in the Mesos framework scalability guide
> <http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>,
> utilizing the suppress
> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> call is the key to get your cluster to a large number of frameworks
> <https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>.
> In short, when a framework is idling with no intention to launch any tasks,
> it should suppress to inform the Mesos to stop sending any more offers. And
> the framework should revive
> <http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> when new work arrives. This way, the allocator will skip the framework when
> performing resource allocations. As a result, thorny issues such as offer
> starvation and resource fragmentation would be greatly mitigated.
>
> That being said. The suppress/revive calls currently are a little bit
> unwieldy due to MESOS-9028
> <https://issues.apache.org/jira/browse/MESOS-9028>:
>
> The revive call has two semantics. It unsuppresses the framework AND
> clears all the existing filters. The later makes the revive call
> non-idempotent. And sometimes users may want to keep the existing filters
> when reiving which is not possible atm.
>
> To decouple the semantics, as suggested in the ticket, we propose to add
> two new V1 scheduler calls:
>
> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
>
> To make life easier, both calls will return 200 OK (as opposed to 202
> returned by most existing scheduler calls, including `SUPPRESS` and
> `REVIVE`).
>
> We will keep the revive call and its semantics (i.e. unsupppress AND clear
> filters) for backward compatibility.
>
> Note, the changes are proposed for V1 API only. Thus, once the changes are
> landed, framework developers are encouraged to move to V1 API to take
> advantage of the new calls (among many other benefits).
>
> Any feedback/comments are welcome.
>
> -Meng
>