You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Alex Rukletsov <al...@mesosphere.com> on 2015/07/04 12:15:42 UTC

Quota Design Doc v1

Folks,

Jörg and I are working on adding *quota* support to Mesos. Quota can be
described as cluster-wide dynamic reservation. I would like to share the
design doc [1] to gather community feedback early in the design phase.

The doc is work in progress, especially the part related to quota support
in the allocator. We think we can start working on adding quota support to
Mesos Master while fleshing out the design for how quota is handled by the
built-in allocator.

While working on the design, we faced some challenges and design questions.
One of them is what decisions should be deferred to allocator and what can
be decided by the Master. We elaborate on this in the doc.

Looking forward to your feedback!

[1]:
https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing

Re: Quota Design Doc v1

Posted by Alex Rukletsov <al...@mesosphere.com>.
First off, I would like to thank everybody for the valuable feedback on the
design doc! We will update it soon and will call for the next round.

Now, there are some high level question that I would like to discuss prior
updating the doc. I have put them together based on comments in the design
doc.

1. "Won't allocators want to be involved in granting quota requests? some
users will want custom logic that determines whether quota can be granted
(e.g. queries LDAP, or a quota database, etc). This will lead to two
choices AFAICT: (1) have customizable quota management, separate from the
allocator interface, interaction between allocator and quota management is
not clear, (2) have quota management be part of implementing an allocator,
easier but more tedious to customize allocation or quota only)."; "The
externally stored quota (e.g. in a quota database, file, etc) will get fed
into mesos continually, so it seems a bit odd for mesos to be rejecting
these requests, rather than accepting them and trying our best to
eventually satisfy them?"

I would like us to distinguish between two actions: *accepting* the quota
request and *satisfying* the quota. In the design doc, we imply that we
accept the request only if we can satisfy it, that is why we call it "grant
a request". However, we may rethink this and eliminate the problem by
accepting all requests and notifying an allocator (or whatever entity is
responsible for quota management). Obviously it may lead to requests that
may never be satisfied.

If we do want to check whether a quota request is "grantable" prior
accepting it, should we defer this decision to an allocator (or any "quota
management entity") or we can do it in the Master? I do not have a strong
stance on that, both have their pros and cons. However, Jörg came up with
an interesting observation, that I may have failed to describe properly in
the doc: the built-in allocator does not persist its state, hence after
failover it does not have more data to decide than the Master has (though
it may apply a different decision algorithm to the same data). I see this
problem can be formulated this way: how deep we want to generalize quota
decision part for MVP? Shall we accept all requests (as mentioned above)?
Shall we do a rough resource estimate in the Master and notify an allocator
(as currently proposed in the design doc)? Shall we defer the decision to
an allocator? Shall we defer the decision to an allocator and persist some
allocator state which is needed for the allocator to make this decision? Or
shall we introduce a separate module for that as per BenM's and Tomás'
suggestion (In that case an allocator could also use other means for
persisting state than the Master replicated log)?


2. "How come the master not know if a role is under quota considering it
knows 'quota' and allocations? What extra information does the allocator
have? Since master stores outstanding offers it won't be hard for master to
club allocated and offered resources for quota decisions."

This question is related to the first one, but is a bit different though :
). I think the Master *can* *decide* whether to accept or grant a quota
request, think of it as a best guess that is not guaranteed to be satisfied
by an allocator. But for exactly the same reason, the Master *should not*
judge whether a role is within or under quota. It is an allocator, who
satisfies quota, therefore we should query allocator for quota status.



3. "I imagine some folks may not want to use an endpoint for this. For
example, they may want the master to consult an external quota database,
file, etc as it is a bit easier to guarantee convergence with this
approach."; "Make the quota management module composable with allocation
modules."


I am not convinced we should design a separate module for quota management
for the MVP. I see some value in making such module pluggable, but I think
it is good for now to let quota be handled by an allocator and have a
Master endpoint for managing it. This may require some scripts from
operators, but it seems like a fair trade-off for the MVP.


I think we should agree on these questions before we proceed with the
design.

On Tue, Jul 7, 2015 at 7:55 AM, Benjamin Mahler <be...@gmail.com>
wrote:

> Great work guys! Couple of related high level questions, mostly around
> customization:
>
> (1) I imagine some folks may not want to use an endpoint for this. For
> example, they may want the master to consult an external quota database,
> file, etc as it is a bit easier to guarantee convergence with this
> approach. Also might be easier when there are a large number of users (e.g.
> O(1000s)). How does this fit into the current design?
>
> (2) The design has the master acting as a quota database, which seems to
> imply that quota policy lies outside mesos (e.g. it's not up to mesos to
> determine whether ben is _allowed_ to have 100GB of memory in the cluster,
> or if that is _too much_), since these policies can be arbitrarily
> complicated. If this is the case, the externally stored quota (e.g. in a
> quota database, file, etc) will get fed into mesos continually, so it seems
> a bit odd for mesos to be rejecting these requests, rather than accepting
> them and trying our best to eventually satisfy them?
>
> Perhaps we're missing a user story related to this?
>
> On Sat, Jul 4, 2015 at 3:15 AM, Alex Rukletsov <al...@mesosphere.com>
> wrote:
>
> > Folks,
> >
> > Jörg and I are working on adding *quota* support to Mesos. Quota can be
> > described as cluster-wide dynamic reservation. I would like to share the
> > design doc [1] to gather community feedback early in the design phase.
> >
> > The doc is work in progress, especially the part related to quota support
> > in the allocator. We think we can start working on adding quota support
> to
> > Mesos Master while fleshing out the design for how quota is handled by
> the
> > built-in allocator.
> >
> > While working on the design, we faced some challenges and design
> questions.
> > One of them is what decisions should be deferred to allocator and what
> can
> > be decided by the Master. We elaborate on this in the doc.
> >
> > Looking forward to your feedback!
> >
> > [1]:
> >
> >
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing
> >
>

Re: Quota Design Doc v1

Posted by Benjamin Mahler <be...@gmail.com>.
Great work guys! Couple of related high level questions, mostly around
customization:

(1) I imagine some folks may not want to use an endpoint for this. For
example, they may want the master to consult an external quota database,
file, etc as it is a bit easier to guarantee convergence with this
approach. Also might be easier when there are a large number of users (e.g.
O(1000s)). How does this fit into the current design?

(2) The design has the master acting as a quota database, which seems to
imply that quota policy lies outside mesos (e.g. it's not up to mesos to
determine whether ben is _allowed_ to have 100GB of memory in the cluster,
or if that is _too much_), since these policies can be arbitrarily
complicated. If this is the case, the externally stored quota (e.g. in a
quota database, file, etc) will get fed into mesos continually, so it seems
a bit odd for mesos to be rejecting these requests, rather than accepting
them and trying our best to eventually satisfy them?

Perhaps we're missing a user story related to this?

On Sat, Jul 4, 2015 at 3:15 AM, Alex Rukletsov <al...@mesosphere.com> wrote:

> Folks,
>
> Jörg and I are working on adding *quota* support to Mesos. Quota can be
> described as cluster-wide dynamic reservation. I would like to share the
> design doc [1] to gather community feedback early in the design phase.
>
> The doc is work in progress, especially the part related to quota support
> in the allocator. We think we can start working on adding quota support to
> Mesos Master while fleshing out the design for how quota is handled by the
> built-in allocator.
>
> While working on the design, we faced some challenges and design questions.
> One of them is what decisions should be deferred to allocator and what can
> be decided by the Master. We elaborate on this in the doc.
>
> Looking forward to your feedback!
>
> [1]:
>
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing
>

Re: Quota Design Doc v1

Posted by Tomás Senart <to...@mesosphere.io>.
What about "Global Reservations"?

On Thu, Jul 9, 2015 at 3:25 PM, Marco Massenzio <ma...@mesosphere.io> wrote:

> I've added my twocent in the doc - my vote goes for "Guaranteed Allocation"
> - not as catchy as "Quota" (and will make classes' naming a challenge!) but
> maybe more helpful in the long-term.
>
> Anyone has a better suggestion, please do... I can't really say I'm
> super-excited by Guaranteed Allocation myself!
>
> *Marco Massenzio*
> *Distributed Systems Engineer*
>
> On Thu, Jul 9, 2015 at 1:48 AM, Alex Rukletsov <al...@mesosphere.com>
> wrote:
>
> > And you're not the only one who were confused by the terminology! One of
> > the alternatives that didn't make it to the public doc was "cluster-wide
> > dynamic reservations". The reason we preferred "quota" to " ...
> > reservation" is because the latter is already overloaded with meanings in
> > Mesos world (static reservations, dynamic reservations). I have hoped the
> > Terminology section would have helped to avoid the confusion, but I see
> it
> > doesn't. We'll think about how we can solve the problem, we definitely
> > don't want to create one more "libprocess process represented as a thread
> > in an OS process" ; ).
> >
> > I see your point regarding authorization, you're not alone here either :
> ).
> > Some folks mentioned that the lack of authz is a blocker and will prevent
> > them from upgrading the cluster. I would propose to treat MVP as
> > experimental feature: use it at your own risk or disable endpoints
> related
> > to quota and hence the entire feature. Does it make sense?
> >
> > On Wed, Jul 8, 2015 at 7:10 PM, James Peach <jo...@gmail.com> wrote:
> >
> > >
> > > > On Jul 4, 2015, at 3:15 AM, Alex Rukletsov <al...@mesosphere.com>
> > wrote:
> > > >
> > > > Folks,
> > > >
> > > > Jörg and I are working on adding *quota* support to Mesos. Quota can
> be
> > > > described as cluster-wide dynamic reservation. I would like to share
> > the
> > > > design doc [1] to gather community feedback early in the design
> phase.
> > >
> > > The most confusing part of this document to me was the 'quota'
> > > terminology. Quotas normally refer to administrative limits (esp. disk
> > > quotas with hard and soft limits), not reserving resources. Since what
> > you
> > > are describing is an extension to the resource reservation system, it
> > would
> > > be clearer if it was described in those terms.
> > >
> > > I was also concerned that access control / authorization is not planned
> > > for the initial implementation. I think that if Mesos is to have an
> > > authorization policy, it should be applied uniformly following the
> > > principle of least surprise.
> > >
> > > > The doc is work in progress, especially the part related to quota
> > support
> > > > in the allocator. We think we can start working on adding quota
> support
> > > to
> > > > Mesos Master while fleshing out the design for how quota is handled
> by
> > > the
> > > > built-in allocator.
> > > >
> > > > While working on the design, we faced some challenges and design
> > > questions.
> > > > One of them is what decisions should be deferred to allocator and
> what
> > > can
> > > > be decided by the Master. We elaborate on this in the doc.
> > > >
> > > > Looking forward to your feedback!
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing
> > >
> > >
> >
>

Re: Quota Design Doc v1

Posted by Marco Massenzio <ma...@mesosphere.io>.
I've added my twocent in the doc - my vote goes for "Guaranteed Allocation"
- not as catchy as "Quota" (and will make classes' naming a challenge!) but
maybe more helpful in the long-term.

Anyone has a better suggestion, please do... I can't really say I'm
super-excited by Guaranteed Allocation myself!

*Marco Massenzio*
*Distributed Systems Engineer*

On Thu, Jul 9, 2015 at 1:48 AM, Alex Rukletsov <al...@mesosphere.com> wrote:

> And you're not the only one who were confused by the terminology! One of
> the alternatives that didn't make it to the public doc was "cluster-wide
> dynamic reservations". The reason we preferred "quota" to " ...
> reservation" is because the latter is already overloaded with meanings in
> Mesos world (static reservations, dynamic reservations). I have hoped the
> Terminology section would have helped to avoid the confusion, but I see it
> doesn't. We'll think about how we can solve the problem, we definitely
> don't want to create one more "libprocess process represented as a thread
> in an OS process" ; ).
>
> I see your point regarding authorization, you're not alone here either : ).
> Some folks mentioned that the lack of authz is a blocker and will prevent
> them from upgrading the cluster. I would propose to treat MVP as
> experimental feature: use it at your own risk or disable endpoints related
> to quota and hence the entire feature. Does it make sense?
>
> On Wed, Jul 8, 2015 at 7:10 PM, James Peach <jo...@gmail.com> wrote:
>
> >
> > > On Jul 4, 2015, at 3:15 AM, Alex Rukletsov <al...@mesosphere.com>
> wrote:
> > >
> > > Folks,
> > >
> > > Jörg and I are working on adding *quota* support to Mesos. Quota can be
> > > described as cluster-wide dynamic reservation. I would like to share
> the
> > > design doc [1] to gather community feedback early in the design phase.
> >
> > The most confusing part of this document to me was the 'quota'
> > terminology. Quotas normally refer to administrative limits (esp. disk
> > quotas with hard and soft limits), not reserving resources. Since what
> you
> > are describing is an extension to the resource reservation system, it
> would
> > be clearer if it was described in those terms.
> >
> > I was also concerned that access control / authorization is not planned
> > for the initial implementation. I think that if Mesos is to have an
> > authorization policy, it should be applied uniformly following the
> > principle of least surprise.
> >
> > > The doc is work in progress, especially the part related to quota
> support
> > > in the allocator. We think we can start working on adding quota support
> > to
> > > Mesos Master while fleshing out the design for how quota is handled by
> > the
> > > built-in allocator.
> > >
> > > While working on the design, we faced some challenges and design
> > questions.
> > > One of them is what decisions should be deferred to allocator and what
> > can
> > > be decided by the Master. We elaborate on this in the doc.
> > >
> > > Looking forward to your feedback!
> > >
> > > [1]:
> > >
> >
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing
> >
> >
>

Re: Quota Design Doc v1

Posted by Alex Rukletsov <al...@mesosphere.com>.
And you're not the only one who were confused by the terminology! One of
the alternatives that didn't make it to the public doc was "cluster-wide
dynamic reservations". The reason we preferred "quota" to " ...
reservation" is because the latter is already overloaded with meanings in
Mesos world (static reservations, dynamic reservations). I have hoped the
Terminology section would have helped to avoid the confusion, but I see it
doesn't. We'll think about how we can solve the problem, we definitely
don't want to create one more "libprocess process represented as a thread
in an OS process" ; ).

I see your point regarding authorization, you're not alone here either : ).
Some folks mentioned that the lack of authz is a blocker and will prevent
them from upgrading the cluster. I would propose to treat MVP as
experimental feature: use it at your own risk or disable endpoints related
to quota and hence the entire feature. Does it make sense?

On Wed, Jul 8, 2015 at 7:10 PM, James Peach <jo...@gmail.com> wrote:

>
> > On Jul 4, 2015, at 3:15 AM, Alex Rukletsov <al...@mesosphere.com> wrote:
> >
> > Folks,
> >
> > Jörg and I are working on adding *quota* support to Mesos. Quota can be
> > described as cluster-wide dynamic reservation. I would like to share the
> > design doc [1] to gather community feedback early in the design phase.
>
> The most confusing part of this document to me was the 'quota'
> terminology. Quotas normally refer to administrative limits (esp. disk
> quotas with hard and soft limits), not reserving resources. Since what you
> are describing is an extension to the resource reservation system, it would
> be clearer if it was described in those terms.
>
> I was also concerned that access control / authorization is not planned
> for the initial implementation. I think that if Mesos is to have an
> authorization policy, it should be applied uniformly following the
> principle of least surprise.
>
> > The doc is work in progress, especially the part related to quota support
> > in the allocator. We think we can start working on adding quota support
> to
> > Mesos Master while fleshing out the design for how quota is handled by
> the
> > built-in allocator.
> >
> > While working on the design, we faced some challenges and design
> questions.
> > One of them is what decisions should be deferred to allocator and what
> can
> > be decided by the Master. We elaborate on this in the doc.
> >
> > Looking forward to your feedback!
> >
> > [1]:
> >
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing
>
>

Re: Quota Design Doc v1

Posted by James Peach <jo...@gmail.com>.
> On Jul 4, 2015, at 3:15 AM, Alex Rukletsov <al...@mesosphere.com> wrote:
> 
> Folks,
> 
> Jörg and I are working on adding *quota* support to Mesos. Quota can be
> described as cluster-wide dynamic reservation. I would like to share the
> design doc [1] to gather community feedback early in the design phase.

The most confusing part of this document to me was the 'quota' terminology. Quotas normally refer to administrative limits (esp. disk quotas with hard and soft limits), not reserving resources. Since what you are describing is an extension to the resource reservation system, it would be clearer if it was described in those terms.

I was also concerned that access control / authorization is not planned for the initial implementation. I think that if Mesos is to have an authorization policy, it should be applied uniformly following the principle of least surprise.

> The doc is work in progress, especially the part related to quota support
> in the allocator. We think we can start working on adding quota support to
> Mesos Master while fleshing out the design for how quota is handled by the
> built-in allocator.
> 
> While working on the design, we faced some challenges and design questions.
> One of them is what decisions should be deferred to allocator and what can
> be decided by the Master. We elaborate on this in the doc.
> 
> Looking forward to your feedback!
> 
> [1]:
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing