You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by David McLaughlin <da...@dmclaughlin.com> on 2017/02/18 00:13:53 UTC

Review Request 56797: Move task conversion during reconciliation into the delayed closure.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56797/
-----------------------------------------------------------

Review request for Aurora, Mehrdad Nurolahzade and Zameer Manji.


Repository: aurora


Description
-------

This is a small change to relieve GC pressure while explicit reconciliation runs. It moves the IScheduledTask -> TaskStatus conversion into the batch processing closure so that any object allocation and collection overhead is delayed until the batch is actually processed. It has a noticable effect on GC for large amounts of RUNNING tasks.


Diffs
-----

  src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java ec7ccafcd360c00beceb067963bc430b6b8d8256 

Diff: https://reviews.apache.org/r/56797/diff/


Testing
-------

This is running in prod at Twitter. Our post-snapshot stop the world GC hit is reduced dramatically maybe about 80% of the time with this change.


Thanks,

David McLaughlin


Re: Review Request 56797: Move task conversion during reconciliation into the delayed closure.

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56797/#review166041
-----------------------------------------------------------


Ship it!




Ship It!

- Reza Motamedi


On Feb. 18, 2017, 12:13 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56797/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2017, 12:13 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is a small change to relieve GC pressure while explicit reconciliation runs. It moves the IScheduledTask -> TaskStatus conversion into the batch processing closure so that any object allocation and collection overhead is delayed until the batch is actually processed. It has a noticable effect on GC for large amounts of RUNNING tasks.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java ec7ccafcd360c00beceb067963bc430b6b8d8256 
> 
> Diff: https://reviews.apache.org/r/56797/diff/
> 
> 
> Testing
> -------
> 
> This is running in prod at Twitter. Our post-snapshot stop the world GC hit is reduced dramatically maybe about 80% of the time with this change.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>


Re: Review Request 56797: Move task conversion during reconciliation into the delayed closure.

Posted by Mehrdad Nurolahzade <me...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56797/#review166014
-----------------------------------------------------------


Ship it!




This brings up the discussion we had around `TaskHistoryPruner` design alternatives ([rb](https://reviews.apache.org/r/56575/)):
1. Load all expired tasks at once, filter and delete.
2. Load in smaller batch sizes (perhaps per job), filter, and delete (maybe also add a `Thread.sleep()` pause).

The take away lesson here is converting tasks from `ISchedulerTask` to `TaskStatus` in smaller batches, with delays in between, releaves heap pressure. By the same logic, I would assume pruning expired tasks in batches (option 2 above) would produce less heap pressure (even though is not as efficient).

- Mehrdad Nurolahzade


On Feb. 17, 2017, 4:13 p.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56797/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2017, 4:13 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is a small change to relieve GC pressure while explicit reconciliation runs. It moves the IScheduledTask -> TaskStatus conversion into the batch processing closure so that any object allocation and collection overhead is delayed until the batch is actually processed. It has a noticable effect on GC for large amounts of RUNNING tasks.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java ec7ccafcd360c00beceb067963bc430b6b8d8256 
> 
> Diff: https://reviews.apache.org/r/56797/diff/
> 
> 
> Testing
> -------
> 
> This is running in prod at Twitter. Our post-snapshot stop the world GC hit is reduced dramatically maybe about 80% of the time with this change.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>


Re: Review Request 56797: Move task conversion during reconciliation into the delayed closure.

Posted by Zameer Manji <zm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56797/#review166007
-----------------------------------------------------------


Ship it!




Ship It!

- Zameer Manji


On Feb. 17, 2017, 4:13 p.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56797/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2017, 4:13 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is a small change to relieve GC pressure while explicit reconciliation runs. It moves the IScheduledTask -> TaskStatus conversion into the batch processing closure so that any object allocation and collection overhead is delayed until the batch is actually processed. It has a noticable effect on GC for large amounts of RUNNING tasks.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java ec7ccafcd360c00beceb067963bc430b6b8d8256 
> 
> Diff: https://reviews.apache.org/r/56797/diff/
> 
> 
> Testing
> -------
> 
> This is running in prod at Twitter. Our post-snapshot stop the world GC hit is reduced dramatically maybe about 80% of the time with this change.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>


Re: Review Request 56797: Move task conversion during reconciliation into the delayed closure.

Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56797/#review166013
-----------------------------------------------------------


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On Feb. 17, 2017, 4:13 p.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56797/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2017, 4:13 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is a small change to relieve GC pressure while explicit reconciliation runs. It moves the IScheduledTask -> TaskStatus conversion into the batch processing closure so that any object allocation and collection overhead is delayed until the batch is actually processed. It has a noticable effect on GC for large amounts of RUNNING tasks.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java ec7ccafcd360c00beceb067963bc430b6b8d8256 
> 
> Diff: https://reviews.apache.org/r/56797/diff/
> 
> 
> Testing
> -------
> 
> This is running in prod at Twitter. Our post-snapshot stop the world GC hit is reduced dramatically maybe about 80% of the time with this change.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>


Re: Review Request 56797: Move task conversion during reconciliation into the delayed closure.

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56797/#review166008
-----------------------------------------------------------



Master (4ab4b2b) is red with this patch.
  ./build-support/jenkins/build.sh

  Test coverage missing for org/apache/aurora/scheduler/events/Webhook
  Test coverage missing for org/apache/aurora/scheduler/events/WebhookInfo
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl
  Test coverage missing for org/apache/aurora/scheduler/storage/log/EntrySerializer$EntrySerializerImpl$1
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$8
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$7
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$4
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$3
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$6
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$5
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$2
  Test coverage missing for org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl$1
  Test coverage missing for org/apache/aurora/scheduler/storage/log/LogStorage$Settings
  Test coverage missing for org/apache/aurora/scheduler/storage/log/LogStorage$ScheduledExecutorSchedulingService
  Test coverage missing for org/apache/aurora/scheduler/storage/log/LogStorageModule
  Test coverage missing for org/apache/aurora/scheduler/storage/backup/TemporaryStorage$TemporaryStorageFactory$1
  Test coverage missing for org/apache/aurora/scheduler/storage/backup/BackupModule
  Test coverage missing for org/apache/aurora/scheduler/storage/backup/Recovery$RecoveryImpl
  Test coverage missing for org/apache/aurora/scheduler/storage/backup/TemporaryStorage$TemporaryStorageFactory
  Test coverage missing for org/apache/aurora/scheduler/storage/backup/Recovery$RecoveryImpl$PendingRecovery
  Test coverage missing for org/apache/aurora/scheduler/TaskVars
  Test coverage missing for org/apache/aurora/scheduler/SchedulerLifecycle$DefaultDelayedActions
  Test coverage missing for org/apache/aurora/scheduler/TierManager$TierManagerImpl$TierConfig
  Test coverage missing for org/apache/aurora/scheduler/TaskVars$Counter
  Test coverage missing for org/apache/aurora/scheduler/TaskVars$1
  Test coverage missing for org/apache/aurora/scheduler/SchedulerModule$TaskEventBatchWorker
  Test coverage missing for org/apache/aurora/scheduler/HostOffer$1
  Test coverage missing for org/apache/aurora/scheduler/SchedulerModule
  Test coverage missing for org/apache/aurora/scheduler/TaskIdGenerator$TaskIdGeneratorImpl
  Test coverage missing for org/apache/aurora/scheduler/SchedulerModule$1
  Test coverage missing for org/apache/aurora/scheduler/TaskStatusHandlerImpl
  Test coverage missing for org/apache/aurora/scheduler/TaskStatusHandlerImpl$1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
==============================================================================

BUILD FAILED

Total time: 5 mins 40.989 secs


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Feb. 18, 2017, 12:13 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56797/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2017, 12:13 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is a small change to relieve GC pressure while explicit reconciliation runs. It moves the IScheduledTask -> TaskStatus conversion into the batch processing closure so that any object allocation and collection overhead is delayed until the batch is actually processed. It has a noticable effect on GC for large amounts of RUNNING tasks.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java ec7ccafcd360c00beceb067963bc430b6b8d8256 
> 
> Diff: https://reviews.apache.org/r/56797/diff/
> 
> 
> Testing
> -------
> 
> This is running in prod at Twitter. Our post-snapshot stop the world GC hit is reduced dramatically maybe about 80% of the time with this change.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>