You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Stephan Erb <se...@apache.org> on 2016/04/23 18:22:35 UTC

Review Request 46603: Introduce command line option to control the offer filter duration

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/
-----------------------------------------------------------

Review request for Aurora, Maxim Khutornenko and Bill Farner.


Bugs: AURORA-1658
    https://issues.apache.org/jira/browse/AURORA-1658


Repository: aurora


Description
-------

Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.

By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.


Diffs
-----

  RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
  src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
  src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
  src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 

Diff: https://reviews.apache.org/r/46603/diff/


Testing
-------

* ./gradlew -Pq build 
* ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
 
I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:

* 7s startup time for a filter duration of 0 seconds
* 29s startup time for the hardcoded former default of 5 seconds


Thanks,

Stephan Erb


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130260
-----------------------------------------------------------




src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java (line 51)
<https://reviews.apache.org/r/46603/#comment193979>

    For a deployment where Aurora is the only framework, 0 tends to be the preferred value. It is also used by Marathon when launching tasks.
    
    Within the linked Jira ticket, Maxim proposed to use the former default of 5 seconds here. What do other thinks? Maybe we can also get input from people running multiple frameworks next to Aurora.


- Stephan Erb


On April 23, 2016, 6:22 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 6:22 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130262
-----------------------------------------------------------


Ship it!




Master (d339036) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On April 23, 2016, 4:35 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 4:35 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130855
-----------------------------------------------------------


Ship it!




Master (7e30ebe) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On April 27, 2016, 10:12 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 27, 2016, 10:12 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Maxim Khutornenko <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130859
-----------------------------------------------------------


Ship it!




Ship It!

- Maxim Khutornenko


On April 27, 2016, 10:12 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 27, 2016, 10:12 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/
-----------------------------------------------------------

(Updated April 28, 2016, 12:12 a.m.)


Review request for Aurora, Maxim Khutornenko and Bill Farner.


Changes
-------

Use default value (as before the patch).


Bugs: AURORA-1658
    https://issues.apache.org/jira/browse/AURORA-1658


Repository: aurora


Description
-------

Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.

By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.


Diffs (updated)
-----

  RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
  src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
  src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
  src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 

Diff: https://reviews.apache.org/r/46603/diff/


Testing
-------

* ./gradlew -Pq build 
* ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
 
I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:

* 7s startup time for a filter duration of 0 seconds
* 29s startup time for the hardcoded former default of 5 seconds


Thanks,

Stephan Erb


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/
-----------------------------------------------------------

(Updated April 27, 2016, 11:49 p.m.)


Review request for Aurora, Maxim Khutornenko and Bill Farner.


Changes
-------

Bill's feedback.


Bugs: AURORA-1658
    https://issues.apache.org/jira/browse/AURORA-1658


Repository: aurora


Description
-------

Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.

By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.


Diffs (updated)
-----

  RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
  src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
  src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
  src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 

Diff: https://reviews.apache.org/r/46603/diff/


Testing
-------

* ./gradlew -Pq build 
* ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
 
I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:

* 7s startup time for a filter duration of 0 seconds
* 29s startup time for the hardcoded former default of 5 seconds


Thanks,

Stephan Erb


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Maxim Khutornenko <ma...@apache.org>.

> On April 24, 2016, 3:48 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java, line 51
> > <https://reviews.apache.org/r/46603/diff/2/?file=1358596#file1358596line51>
> >
> >     Does this default value effect the same behavior as before the patch?
> 
> Stephan Erb wrote:
>     Using a default of `0` is indeed a behaviour change. I am happy to discuss if we want this change or not. 
>     
>     With a timeout of `5` secs (this was the former hardcoded default):
>     
>     * When launching a task, Mesos will only re-offer the unused resources in the offer after 5 seconds. 
>     * When declining offers in order to merge two offers into one, Mesos will only re-offer resources of this slave after 5s.
>     
>     With timeout of `0` secs:
>     
>     * The resources can be returned instantly within the next offer-cycle of the Mesos allocator.
>     
>     We tend to have the problem that a timeout of 5 breaks the maintenance feature for us. We regularly schedule jobs with #instances > #nodes in the cluster. In this case, all available offers are quickly depleted and Aurora begins to schedule onto nodes which were supposed to be put into maintenance mode. Only after the timeout of 5 seonds has passed, Mesos will re-offer resources to Aurora. I believe we might not be the only one with this problem and therefore think 0 is a good default.

It would be great to reach out to Mesos folks to better understand the reasons behind chosing a 5 second default timeout. Last I checked, lower values _may_ result in an increased load on Mesos master. If that proves to be true I'd prefer holding on to the current behavior as a safer bet.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130306
-----------------------------------------------------------


On April 23, 2016, 4:35 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 4:35 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Stephan Erb <se...@apache.org>.

> On April 24, 2016, 5:48 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java, line 51
> > <https://reviews.apache.org/r/46603/diff/2/?file=1358596#file1358596line51>
> >
> >     Does this default value effect the same behavior as before the patch?

Using a default of `0` is indeed a behaviour change. I am happy to discuss if we want this change or not. 

With a timeout of `5` secs (this was the former hardcoded default):

* When launching a task, Mesos will only re-offer the unused resources in the offer after 5 seconds. 
* When declining offers in order to merge two offers into one, Mesos will only re-offer resources of this slave after 5s.

With timeout of `0` secs:

* The resources can be returned instantly within the next offer-cycle of the Mesos allocator.

We tend to have the problem that a timeout of 5 breaks the maintenance feature for us. We regularly schedule jobs with #instances > #nodes in the cluster. In this case, all available offers are quickly depleted and Aurora begins to schedule onto nodes which were supposed to be put into maintenance mode. Only after the timeout of 5 seonds has passed, Mesos will re-offer resources to Aurora. I believe we might not be the only one with this problem and therefore think 0 is a good default.


- Stephan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130306
-----------------------------------------------------------


On April 23, 2016, 6:35 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 6:35 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Stephan Erb <se...@apache.org>.

> On April 24, 2016, 5:48 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java, line 51
> > <https://reviews.apache.org/r/46603/diff/2/?file=1358596#file1358596line51>
> >
> >     Does this default value effect the same behavior as before the patch?
> 
> Stephan Erb wrote:
>     Using a default of `0` is indeed a behaviour change. I am happy to discuss if we want this change or not. 
>     
>     With a timeout of `5` secs (this was the former hardcoded default):
>     
>     * When launching a task, Mesos will only re-offer the unused resources in the offer after 5 seconds. 
>     * When declining offers in order to merge two offers into one, Mesos will only re-offer resources of this slave after 5s.
>     
>     With timeout of `0` secs:
>     
>     * The resources can be returned instantly within the next offer-cycle of the Mesos allocator.
>     
>     We tend to have the problem that a timeout of 5 breaks the maintenance feature for us. We regularly schedule jobs with #instances > #nodes in the cluster. In this case, all available offers are quickly depleted and Aurora begins to schedule onto nodes which were supposed to be put into maintenance mode. Only after the timeout of 5 seonds has passed, Mesos will re-offer resources to Aurora. I believe we might not be the only one with this problem and therefore think 0 is a good default.
> 
> Maxim Khutornenko wrote:
>     It would be great to reach out to Mesos folks to better understand the reasons behind chosing a 5 second default timeout. Last I checked, lower values _may_ result in an increased load on Mesos master. If that proves to be true I'd prefer holding on to the current behavior as a safer bet.

The 5s timeout is also described in the original Mesos publication. Let's just stick to it.


- Stephan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130306
-----------------------------------------------------------


On April 28, 2016, 12:12 a.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 28, 2016, 12:12 a.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130306
-----------------------------------------------------------


Fix it, then Ship it!




LGTM overall, one question that i've opened an issue for just to flag the discussion.


src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java (line 45)
<https://reviews.apache.org/r/46603/#comment194018>

    Please include a comment explaining this field.  At a quick glance, it's not clear why this is a supplier and the other field is not.  (Context - it's to allow injection of jitter.)



src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java (line 48)
<https://reviews.apache.org/r/46603/#comment194019>

    remove extra WS



src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java (line 51)
<https://reviews.apache.org/r/46603/#comment194020>

    Does this default value effect the same behavior as before the patch?


- Bill Farner


On April 23, 2016, 9:35 a.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 9:35 a.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/
-----------------------------------------------------------

(Updated April 23, 2016, 6:35 p.m.)


Review request for Aurora, Maxim Khutornenko and Bill Farner.


Changes
-------

Add missing file


Bugs: AURORA-1658
    https://issues.apache.org/jira/browse/AURORA-1658


Repository: aurora


Description
-------

Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.

By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.


Diffs (updated)
-----

  RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
  src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
  src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
  src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
  src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
  src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 

Diff: https://reviews.apache.org/r/46603/diff/


Testing
-------

* ./gradlew -Pq build 
* ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
 
I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:

* 7s startup time for a filter duration of 0 seconds
* 29s startup time for the hardcoded former default of 5 seconds


Thanks,

Stephan Erb


Re: Review Request 46603: Introduce command line option to control the offer filter duration

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46603/#review130261
-----------------------------------------------------------



Master (d339036) is red with this patch.
  ./build-support/jenkins/build.sh

Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/commons/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor
Note: /home/jenkins/jenkins-slave/workspace/AuroraBot/commons/src/main/java/org/apache/aurora/common/testing/easymock/EasyMockTest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

:commons:generateThriftResources
:commons:processResources
:commons:classes
:commons:jar
:compileJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java:74: Note: Wrote forwarder org.apache.aurora.scheduler.storage.log.WriteAheadStorageForwarder
@Forward({
^
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java:152: error: cannot find symbol
    private final OfferSettings offerSettings;
                  ^
  symbol:   class OfferSettings
  location: class OfferManagerImpl
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java:159: error: cannot find symbol
        OfferSettings offerSettings,
        ^
  symbol:   class OfferSettings
  location: class OfferManagerImpl
Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.2
Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor
2 errors
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 1 mins 20.361 secs


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On April 23, 2016, 4:22 p.m., Stephan Erb wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46603/
> -----------------------------------------------------------
> 
> (Updated April 23, 2016, 4:22 p.m.)
> 
> 
> Review request for Aurora, Maxim Khutornenko and Bill Farner.
> 
> 
> Bugs: AURORA-1658
>     https://issues.apache.org/jira/browse/AURORA-1658
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Aurora is declining Mesos offers implicitly when launching a task and explicitly when compacting multiple offers of a slave into a single one.
> The filter duration instructs Mesos to return the declined resources to us only after a timeout of X seconds, even if there is no other framework that wants them. If no filter is supplied, the hardcoded default of 5 seconds would be used.
> 
> By making this value configurable, Aurora can be tuned for either single or multi-framework deployment.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md 4b810f2d808cbf0d91c753147d98d1e389106d22 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 1d725c03d16116257e1c4242ebf60f5931d4600f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeDriver.java d1bb8f29c9bed42c27624204b9d34ab1893468f7 
>   src/main/java/org/apache/aurora/scheduler/mesos/Driver.java 013c50cf70fe45fc2a74c1ea5dccccfaba14225c 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverService.java 7ff3e3e5dc70187066b914f7feb65d99f2145303 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 452451f239a964c1b55ede3d6fbde0bd805e4b00 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 90f8abf830478ad48f9a8a62c1c42423ab0f8d57 
>   src/main/java/org/apache/aurora/scheduler/offers/RandomJitterReturnDelay.java a52fd4e8cd5c32d9560d4d72958a54bef820d81c 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 76da6d80d91221336e50d596cc2f49e890451fd1 
> 
> Diff: https://reviews.apache.org/r/46603/diff/
> 
> 
> Testing
> -------
> 
> * ./gradlew -Pq build 
> * ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
>  
> I have also conducted an (unscientific) benchmark in Vagrant and started a job with 5 instances and recorded the time from `PENDING` to `RUNNING` for the slowest ones:
> 
> * 7s startup time for a filter duration of 0 seconds
> * 29s startup time for the hardcoded former default of 5 seconds
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>