You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Alexander Rukletsov <ru...@gmail.com> on 2015/11/16 16:43:27 UTC

Review Request 40351: Quota: Added rescinding offers for set quota requests.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/
-----------------------------------------------------------

Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-3912
    https://issues.apache.org/jira/browse/MESOS-3912


Repository: mesos


Description
-------

See summary.


Diffs
-----

  src/master/master.hpp ead8520b7108a0f2c3a0bb11ae7b543897d111a2 
  src/master/quota_handler.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/40351/diff/


Testing
-------

make check (Mac OS X 10.10.4)


Thanks,

Alexander Rukletsov


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/
-----------------------------------------------------------

(Updated Nov. 19, 2015, 5:10 p.m.)


Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.


Changes
-------

Fixed a typo.


Bugs: MESOS-3912
    https://issues.apache.org/jira/browse/MESOS-3912


Repository: mesos


Description
-------

See summary.


Diffs (updated)
-----

  src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
  src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 

Diff: https://reviews.apache.org/r/40351/diff/


Testing
-------

make check (Mac OS X 10.10.4)


Thanks,

Alexander Rukletsov


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/
-----------------------------------------------------------

(Updated Nov. 19, 2015, 5:01 p.m.)


Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.


Changes
-------

Added a missing blank line.


Bugs: MESOS-3912
    https://issues.apache.org/jira/browse/MESOS-3912


Repository: mesos


Description
-------

See summary.


Diffs (updated)
-----

  src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
  src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 

Diff: https://reviews.apache.org/r/40351/diff/


Testing
-------

make check (Mac OS X 10.10.4)


Thanks,

Alexander Rukletsov


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Qian Zhang <zh...@cn.ibm.com>.

> On Nov. 18, 2015, 3:51 p.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to satisfying quota request?
> 
> Alexander Rukletsov wrote:
>     Because we may rescind more than necessary to satisfy quota request (remember minimal agent count). If we have a check in place, this will effectively prevent us from doing so. Does it make sense to you?

Suppose the quota request is to request 20GB disk for a role, and there is an offer which only include 2 CPU & 2GB memory and has no disk resources at all, so we will rescind this offer too? This seems a little unfair to me.
And can you please clarify a little more about why we want to rescind offers from at least `numF` agents? The reason is that we want to ensure each framework in that role will have a chance to get an offer in next allocation cycle?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 20, 2015, 1:15 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2015, 1:15 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, Joseph Wu, and Qian Zhang.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
>   src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Qian Zhang <zh...@cn.ibm.com>.

> On Nov. 18, 2015, 3:51 p.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to satisfying quota request?
> 
> Alexander Rukletsov wrote:
>     Because we may rescind more than necessary to satisfy quota request (remember minimal agent count). If we have a check in place, this will effectively prevent us from doing so. Does it make sense to you?
> 
> Qian Zhang wrote:
>     Suppose the quota request is to request 20GB disk for a role, and there is an offer which only include 2 CPU & 2GB memory and has no disk resources at all, so we will rescind this offer too? This seems a little unfair to me.
>     And can you please clarify a little more about why we want to rescind offers from at least `numF` agents? The reason is that we want to ensure each framework in that role will have a chance to get an offer in next allocation cycle?
> 
> Alexander Rukletsov wrote:
>     That's correct, we will rescind that offer and yes, it's a bit unfair. Let me explain why I decided to remove this check. Suppose we a quota request is for 6 CPUs for role with 3 frameworks. The first offer we rescind is 10 CPUs, 10GB MEM. Technically, we have enough resources to satisfy quota, but we would like to rescind offers from at least 2 more agents. Having a check in place will prevent us from doing so. Do you think greedy rescinding can be a problem?
>     
>     Yes, we would like to facilitate allocation for each framework in the role, for which quota is set.

The most unclear in my mind is why we need to rescind offers from at least numF agents, i.e., in your example above, why do we want to rescind offers from at least 2 more agents after quota has been satisfied? Can you please let me know the motivation behind it? I think quota is kind of global concept which should not have direct relation with agent and framework, it should stay in role level. So I am not sure why we want to facilitate allocation for each framework in the role, is that something that we mentioned in design doc? Maybe I forget ... :-)


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 23, 2015, 8:57 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 23, 2015, 8:57 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, Joseph Wu, and Qian Zhang.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp d4b1edde98925fd51e056f253758afea779be9ed 
>   src/master/quota_handler.cpp 86d7331aa79adb1d9a3009552fc4c2aed0229804 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Qian Zhang <zh...@cn.ibm.com>.

> On Nov. 18, 2015, 3:51 p.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to satisfying quota request?
> 
> Alexander Rukletsov wrote:
>     Because we may rescind more than necessary to satisfy quota request (remember minimal agent count). If we have a check in place, this will effectively prevent us from doing so. Does it make sense to you?
> 
> Qian Zhang wrote:
>     Suppose the quota request is to request 20GB disk for a role, and there is an offer which only include 2 CPU & 2GB memory and has no disk resources at all, so we will rescind this offer too? This seems a little unfair to me.
>     And can you please clarify a little more about why we want to rescind offers from at least `numF` agents? The reason is that we want to ensure each framework in that role will have a chance to get an offer in next allocation cycle?
> 
> Alexander Rukletsov wrote:
>     That's correct, we will rescind that offer and yes, it's a bit unfair. Let me explain why I decided to remove this check. Suppose we a quota request is for 6 CPUs for role with 3 frameworks. The first offer we rescind is 10 CPUs, 10GB MEM. Technically, we have enough resources to satisfy quota, but we would like to rescind offers from at least 2 more agents. Having a check in place will prevent us from doing so. Do you think greedy rescinding can be a problem?
>     
>     Yes, we would like to facilitate allocation for each framework in the role, for which quota is set.
> 
> Qian Zhang wrote:
>     The most unclear in my mind is why we need to rescind offers from at least numF agents, i.e., in your example above, why do we want to rescind offers from at least 2 more agents after quota has been satisfied? Can you please let me know the motivation behind it? I think quota is kind of global concept which should not have direct relation with agent and framework, it should stay in role level. So I am not sure why we want to facilitate allocation for each framework in the role, is that something that we mentioned in design doc? Maybe I forget ... :-)
> 
> Alexander Rukletsov wrote:
>     Nope, it wasn't in the design doc, that's something we decided recently. The main motivation is to improve user experience and simplify debugging. Because the built-in allocator is used in 99% of clusters, it makes sense to exploit some knowledge about how it works. Because of coarse-grained allocations, to facilitate fairness we may want to rescind from more agents than necessary to satisfy quota numbers.
> 
> Joris Van Remoortere wrote:
>     `why do we want to rescind offers from at least 2 more agents after quota has been satisfied?`
>     Just to be clear: it's not numF or more agents *on top of* quota. It's at least numF agents in case the quota itself doesn't already rescind offers from that many.
>     
>     I'm not sure this is really "un-fair", as these are *offers*, and not *allocations*. We are not pre-empting tasks. If the resources in the offers that are rescinded are not needed for quota, then they will be re-offered using the same fair-sharing logic that they were before. In fact, this is *more* fair, as we might end up making better offers due to information that has changed in the cluster.
>     
>     The argument for the `numF` condition that Alex is making is one I pushed for. We often end up debugging clusters around new features, even not so new features. Although the `numF` condition by no means guarantees that every framework in the role will receive an offer, it does increase the chances greatly. The fact that they will receive any offer at all means we will see messages flowing to the framework, and hopefully log lines at the framework after receiving the offer. If the offer is still too small to launch a task, at least we will see a message at the framework level to that regard. **what we are optimizing for** is the ability to eliminate quickly (in most cases) the possibility that there is a bug in quota because the framework didn't receive any offers.
>     
>     Please let me know if this is not clear, as I believe it is very important. The more of us understand why this extra condition is here, the fewer framework writers and cluster operators will be coming on IRC / dev list with debug logs that don't allow us to easily eliminate quota as the source of the problem.

Thanks Joris. So the motivation of this extra condition is to improve the debuggability, right? But I am still not clear about why increasing the chances for frameworks to receive offer will improve the debuggability of Mesos, can you please clarify more?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 25, 2015, 12:29 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 25, 2015, 12:29 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, Joseph Wu, and Qian Zhang.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp e5e0ed01a56d869cc535687c8dbb6b99f6295b66 
>   src/master/quota_handler.cpp b8e501be43de6bc02aebfa5bd415b4212a96da31 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Nov. 18, 2015, 7:51 a.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to satisfying quota request?
> 
> Alexander Rukletsov wrote:
>     Because we may rescind more than necessary to satisfy quota request (remember minimal agent count). If we have a check in place, this will effectively prevent us from doing so. Does it make sense to you?
> 
> Qian Zhang wrote:
>     Suppose the quota request is to request 20GB disk for a role, and there is an offer which only include 2 CPU & 2GB memory and has no disk resources at all, so we will rescind this offer too? This seems a little unfair to me.
>     And can you please clarify a little more about why we want to rescind offers from at least `numF` agents? The reason is that we want to ensure each framework in that role will have a chance to get an offer in next allocation cycle?

That's correct, we will rescind that offer and yes, it's a bit unfair. Let me explain why I decided to remove this check. Suppose we a quota request is for 6 CPUs for role with 3 frameworks. The first offer we rescind is 10 CPUs, 10GB MEM. Technically, we have enough resources to satisfy quota, but we would like to rescind offers from at least 2 more agents. Having a check in place will prevent us from doing so. Do you think greedy rescinding can be a problem?

Yes, we would like to facilitate allocation for each framework in the role, for which quota is set.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 19, 2015, 5:15 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2015, 5:15 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, Joseph Wu, and Qian Zhang.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
>   src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Nov. 18, 2015, 7:51 a.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to satisfying quota request?
> 
> Alexander Rukletsov wrote:
>     Because we may rescind more than necessary to satisfy quota request (remember minimal agent count). If we have a check in place, this will effectively prevent us from doing so. Does it make sense to you?
> 
> Qian Zhang wrote:
>     Suppose the quota request is to request 20GB disk for a role, and there is an offer which only include 2 CPU & 2GB memory and has no disk resources at all, so we will rescind this offer too? This seems a little unfair to me.
>     And can you please clarify a little more about why we want to rescind offers from at least `numF` agents? The reason is that we want to ensure each framework in that role will have a chance to get an offer in next allocation cycle?
> 
> Alexander Rukletsov wrote:
>     That's correct, we will rescind that offer and yes, it's a bit unfair. Let me explain why I decided to remove this check. Suppose we a quota request is for 6 CPUs for role with 3 frameworks. The first offer we rescind is 10 CPUs, 10GB MEM. Technically, we have enough resources to satisfy quota, but we would like to rescind offers from at least 2 more agents. Having a check in place will prevent us from doing so. Do you think greedy rescinding can be a problem?
>     
>     Yes, we would like to facilitate allocation for each framework in the role, for which quota is set.
> 
> Qian Zhang wrote:
>     The most unclear in my mind is why we need to rescind offers from at least numF agents, i.e., in your example above, why do we want to rescind offers from at least 2 more agents after quota has been satisfied? Can you please let me know the motivation behind it? I think quota is kind of global concept which should not have direct relation with agent and framework, it should stay in role level. So I am not sure why we want to facilitate allocation for each framework in the role, is that something that we mentioned in design doc? Maybe I forget ... :-)

Nope, it wasn't in the design doc, that's something we decided recently. The main motivation is to improve user experience and simplify debugging. Because the built-in allocator is used in 99% of clusters, it makes sense to exploit some knowledge about how it works. Because of coarse-grained allocations, to facilitate fairness we may want to rescind from more agents than necessary to satisfy quota numbers.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 24, 2015, 4:29 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2015, 4:29 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, Joseph Wu, and Qian Zhang.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp e5e0ed01a56d869cc535687c8dbb6b99f6295b66 
>   src/master/quota_handler.cpp b8e501be43de6bc02aebfa5bd415b4212a96da31 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Nov. 18, 2015, 7:51 a.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to satisfying quota request?

Because we may rescind more than necessary to satisfy quota request (remember minimal agent count). If we have a check in place, this will effectively prevent us from doing so. Does it make sense to you?


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 19, 2015, 5:10 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2015, 5:10 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
>   src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Joris Van Remoortere <jo...@gmail.com>.

> On Nov. 18, 2015, 7:51 a.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 180
> > <https://reviews.apache.org/r/40351/diff/3/?file=1128793#file1128793line180>
> >
> >     Why do we want to rescind the offeres that do not contribute to satisfying quota request?
> 
> Alexander Rukletsov wrote:
>     Because we may rescind more than necessary to satisfy quota request (remember minimal agent count). If we have a check in place, this will effectively prevent us from doing so. Does it make sense to you?
> 
> Qian Zhang wrote:
>     Suppose the quota request is to request 20GB disk for a role, and there is an offer which only include 2 CPU & 2GB memory and has no disk resources at all, so we will rescind this offer too? This seems a little unfair to me.
>     And can you please clarify a little more about why we want to rescind offers from at least `numF` agents? The reason is that we want to ensure each framework in that role will have a chance to get an offer in next allocation cycle?
> 
> Alexander Rukletsov wrote:
>     That's correct, we will rescind that offer and yes, it's a bit unfair. Let me explain why I decided to remove this check. Suppose we a quota request is for 6 CPUs for role with 3 frameworks. The first offer we rescind is 10 CPUs, 10GB MEM. Technically, we have enough resources to satisfy quota, but we would like to rescind offers from at least 2 more agents. Having a check in place will prevent us from doing so. Do you think greedy rescinding can be a problem?
>     
>     Yes, we would like to facilitate allocation for each framework in the role, for which quota is set.
> 
> Qian Zhang wrote:
>     The most unclear in my mind is why we need to rescind offers from at least numF agents, i.e., in your example above, why do we want to rescind offers from at least 2 more agents after quota has been satisfied? Can you please let me know the motivation behind it? I think quota is kind of global concept which should not have direct relation with agent and framework, it should stay in role level. So I am not sure why we want to facilitate allocation for each framework in the role, is that something that we mentioned in design doc? Maybe I forget ... :-)
> 
> Alexander Rukletsov wrote:
>     Nope, it wasn't in the design doc, that's something we decided recently. The main motivation is to improve user experience and simplify debugging. Because the built-in allocator is used in 99% of clusters, it makes sense to exploit some knowledge about how it works. Because of coarse-grained allocations, to facilitate fairness we may want to rescind from more agents than necessary to satisfy quota numbers.

`why do we want to rescind offers from at least 2 more agents after quota has been satisfied?`
Just to be clear: it's not numF or more agents *on top of* quota. It's at least numF agents in case the quota itself doesn't already rescind offers from that many.

I'm not sure this is really "un-fair", as these are *offers*, and not *allocations*. We are not pre-empting tasks. If the resources in the offers that are rescinded are not needed for quota, then they will be re-offered using the same fair-sharing logic that they were before. In fact, this is *more* fair, as we might end up making better offers due to information that has changed in the cluster.

The argument for the `numF` condition that Alex is making is one I pushed for. We often end up debugging clusters around new features, even not so new features. Although the `numF` condition by no means guarantees that every framework in the role will receive an offer, it does increase the chances greatly. The fact that they will receive any offer at all means we will see messages flowing to the framework, and hopefully log lines at the framework after receiving the offer. If the offer is still too small to launch a task, at least we will see a message at the framework level to that regard. **what we are optimizing for** is the ability to eliminate quickly (in most cases) the possibility that there is a bug in quota because the framework didn't receive any offers.

Please let me know if this is not clear, as I believe it is very important. The more of us understand why this extra condition is here, the fewer framework writers and cluster operators will be coming on IRC / dev list with debug logs that don't allow us to easily eliminate quota as the source of the problem.


- Joris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------


On Nov. 24, 2015, 4:29 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2015, 4:29 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, Joseph Wu, and Qian Zhang.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp e5e0ed01a56d869cc535687c8dbb6b99f6295b66 
>   src/master/quota_handler.cpp b8e501be43de6bc02aebfa5bd415b4212a96da31 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Qian Zhang <zh...@cn.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106977
-----------------------------------------------------------



src/master/master.hpp (line 904)
<https://reviews.apache.org/r/40351/#comment165853>

    s/make makes/make



src/master/quota_handler.cpp (line 180)
<https://reviews.apache.org/r/40351/#comment165870>

    Why do we want to rescind the offeres that do not contribute to satisfying quota request?


- Qian Zhang


On Nov. 18, 2015, 4:15 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2015, 4:15 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
>   src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/
-----------------------------------------------------------

(Updated Nov. 17, 2015, 8:15 p.m.)


Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.


Changes
-------

Refactored.


Bugs: MESOS-3912
    https://issues.apache.org/jira/browse/MESOS-3912


Repository: mesos


Description
-------

See summary.


Diffs (updated)
-----

  src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
  src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 

Diff: https://reviews.apache.org/r/40351/diff/


Testing
-------

make check (Mac OS X 10.10.4)


Thanks,

Alexander Rukletsov


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Nov. 17, 2015, 6:16 a.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 211
> > <https://reviews.apache.org/r/40351/diff/2/?file=1126317#file1126317line211>
> >
> >     For ```offer->resources()```, before doing the math, do we need to call ```flatten()``` to remove role too? For dynamic reservation, in ```Http::_operation()```, I see we do not call ```flatten()``` for offered resources too, is it a bug?
> 
> Alexander Rukletsov wrote:
>     In this case everything should be fine. IIUC, there is only one reason why `offer->resources()` has non '*' role: it's a statically reserved resource. Quota is orthogonal to static reservations, hence we should not rescind those offers.
>     
>     However, I think this check should be removed here. We may rescind more offers than necessary to satisfy remaining resources (because we want to rescind from a certain number of agents). I'll think about it.
> 
> Qian Zhang wrote:
>     But I think for the dynamic reserved resource, ```offer->resources()``` also has non * role and also has non-empty ```ReservationInfo```, right?

Here is "cheat sheet" from @MPark:
```
has_reservation && role == "*" : invalid
has_reservation && role != "*": dynamically reserved
!has_reservation && role == "*": unreserved
!has_reservation && role != "*": statically reserved
```

According to it, you're right.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106799
-----------------------------------------------------------


On Nov. 19, 2015, 5:10 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2015, 5:10 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
>   src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Qian Zhang <zh...@cn.ibm.com>.

> On Nov. 17, 2015, 2:16 p.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 211
> > <https://reviews.apache.org/r/40351/diff/2/?file=1126317#file1126317line211>
> >
> >     For ```offer->resources()```, before doing the math, do we need to call ```flatten()``` to remove role too? For dynamic reservation, in ```Http::_operation()```, I see we do not call ```flatten()``` for offered resources too, is it a bug?
> 
> Alexander Rukletsov wrote:
>     In this case everything should be fine. IIUC, there is only one reason why `offer->resources()` has non '*' role: it's a statically reserved resource. Quota is orthogonal to static reservations, hence we should not rescind those offers.
>     
>     However, I think this check should be removed here. We may rescind more offers than necessary to satisfy remaining resources (because we want to rescind from a certain number of agents). I'll think about it.

But I think for the dynamic reserved resource, ```offer->resources()``` also has non * role and also has non-empty ```ReservationInfo```, right?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106799
-----------------------------------------------------------


On Nov. 18, 2015, 4:15 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2015, 4:15 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5e5a575dc7dd49324f3c837028df8a7f75cd1f80 
>   src/master/quota_handler.cpp 03cef4117c52da7599a2800060f65483ca33bc3f 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.

> On Nov. 17, 2015, 6:16 a.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 171
> > <https://reviews.apache.org/r/40351/diff/2/?file=1126317#file1126317line171>
> >
> >     For ```request.guarantee()```, I think we need to call ```flatten()``` to remove role first.

That's exatly right. Good catch!


> On Nov. 17, 2015, 6:16 a.m., Qian Zhang wrote:
> > src/master/quota_handler.cpp, line 211
> > <https://reviews.apache.org/r/40351/diff/2/?file=1126317#file1126317line211>
> >
> >     For ```offer->resources()```, before doing the math, do we need to call ```flatten()``` to remove role too? For dynamic reservation, in ```Http::_operation()```, I see we do not call ```flatten()``` for offered resources too, is it a bug?

In this case everything should be fine. IIUC, there is only one reason why `offer->resources()` has non '*' role: it's a statically reserved resource. Quota is orthogonal to static reservations, hence we should not rescind those offers.

However, I think this check should be removed here. We may rescind more offers than necessary to satisfy remaining resources (because we want to rescind from a certain number of agents). I'll think about it.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106799
-----------------------------------------------------------


On Nov. 16, 2015, 4:46 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2015, 4:46 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp dc5790a5f0751f5f4644ef1105a0a0c5b2b30fc1 
>   src/master/quota_handler.cpp 3db3c55e51470392f72568a768efe8e66fa3dca0 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Qian Zhang <zh...@cn.ibm.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106799
-----------------------------------------------------------



src/master/quota_handler.cpp (line 171)
<https://reviews.apache.org/r/40351/#comment165575>

    For ```request.guarantee()```, I think we need to call ```flatten()``` to remove role first.



src/master/quota_handler.cpp (line 211)
<https://reviews.apache.org/r/40351/#comment165580>

    For ```offer->resources()```, before doing the math, do we need to call ```flatten()``` to remove role too? For dynamic reservation, in ```Http::_operation()```, I see we do not call ```flatten()``` for offered resources too, is it a bug?


- Qian Zhang


On Nov. 17, 2015, 12:46 a.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 17, 2015, 12:46 a.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp dc5790a5f0751f5f4644ef1105a0a0c5b2b30fc1 
>   src/master/quota_handler.cpp 3db3c55e51470392f72568a768efe8e66fa3dca0 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Alexander Rukletsov <ru...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/
-----------------------------------------------------------

(Updated Nov. 16, 2015, 4:46 p.m.)


Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.


Changes
-------

Rebased.


Bugs: MESOS-3912
    https://issues.apache.org/jira/browse/MESOS-3912


Repository: mesos


Description
-------

See summary.


Diffs (updated)
-----

  src/master/master.hpp dc5790a5f0751f5f4644ef1105a0a0c5b2b30fc1 
  src/master/quota_handler.cpp 3db3c55e51470392f72568a768efe8e66fa3dca0 

Diff: https://reviews.apache.org/r/40351/diff/


Testing
-------

make check (Mac OS X 10.10.4)


Thanks,

Alexander Rukletsov


Re: Review Request 40351: Quota: Added rescinding offers for set quota requests.

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40351/#review106672
-----------------------------------------------------------


Bad patch!

Reviews applied: [39211, 39018, 39102, 36913]

Failed command: ./support/apply-review.sh -n -r 36913

Error:
 2015-11-16 15:48:31 URL:https://reviews.apache.org/r/36913/diff/raw/ [7367/7367] -> "36913.patch" [1]
error: patch failed: src/Makefile.am:511
error: src/Makefile.am: patch does not apply
Failed to apply patch

- Mesos ReviewBot


On Nov. 16, 2015, 3:43 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40351/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2015, 3:43 p.m.)
> 
> 
> Review request for mesos, Bernd Mathiske, Joerg Schad, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-3912
>     https://issues.apache.org/jira/browse/MESOS-3912
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp ead8520b7108a0f2c3a0bb11ae7b543897d111a2 
>   src/master/quota_handler.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/40351/diff/
> 
> 
> Testing
> -------
> 
> make check (Mac OS X 10.10.4)
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>