You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Bill Farner <wf...@apache.org> on 2018/01/24 00:32:28 UTC
Review Request 65303: Improve performance of MemTaskStore queries
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
-----------------------------------------------------------
Review request for Aurora and Jordan Ly.
Repository: aurora
Description
-------
Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
Diffs
-----
build.gradle 64af7ae
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c
Diff: https://reviews.apache.org/r/65303/diff/1/
Testing
-------
Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
```
Prior to this patch:
```console
Benchmark (numTasks) Score Error Units
10000 1066.632 ± 266.924 ops/s
·gc.alloc.rate.norm 10000 289227.205 ± 8888.051 B/op
·gc.count 10000 24.000 counts
·gc.time 10000 103.000 ms
50000 84.444 ± 32.620 ops/s
·gc.alloc.rate.norm 50000 3831210.967 ± 840844.713 B/op
·gc.count 50000 21.000 counts
·gc.time 50000 1407.000 ms
100000 38.645 ± 20.557 ops/s
·gc.alloc.rate.norm 100000 13555430.931 ± 6787344.701 B/op
·gc.count 100000 52.000 counts
·gc.time 100000 3304.000 ms
```
With this patch:
```console
Benchmark (numTasks) Score Error Units
10000 2851.288 ± 481.472 ops/s
·gc.alloc.rate.norm 10000 145281.908 ± 2223.621 B/op
·gc.count 10000 39.000 counts
·gc.time 10000 130.000 ms
50000 297.380 ± 35.681 ops/s
·gc.alloc.rate.norm 50000 1183791.866 ± 77487.278 B/op
·gc.count 50000 25.000 counts
·gc.time 50000 1821.000 ms
100000 122.211 ± 81.618 ops/s
·gc.alloc.rate.norm 100000 4364450.973 ± 2856586.882 B/op
·gc.count 100000 52.000 counts
·gc.time 100000 3698.000 ms
```
**Full benchmark output**
Prior to this patch:
```console
Benchmark (numTasks) Mode Cnt Score Error Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 1066.632 ± 266.924 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 286.647 ± 62.371 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 289227.205 ± 8888.051 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 291.263 ± 159.266 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 294277.617 ± 166069.041 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.218 ± 1.029 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 1220.540 ± 708.455 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 24.000 counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 103.000 ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 84.444 ± 32.620 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 267.018 ± 27.389 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 3831210.967 ± 840844.713 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 258.565 ± 149.845 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 3707563.530 ± 2262218.319 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 4.487 ± 18.053 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 63848.757 ± 264487.651 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.034 ± 3.651 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 87385.381 ± 75159.508 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 21.000 counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1407.000 ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 38.645 ± 20.557 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 381.453 ± 63.491 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 13555430.931 ± 6787344.701 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 389.816 ± 123.320 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 13823571.735 ± 6642604.600 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.947 ± 16.766 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 92330.241 ± 794991.221 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 11.934 ± 18.565 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 414896.926 ± 551658.959 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3304.000 ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
```
With this patch:
```console
Benchmark (numTasks) Mode Cnt Score Error Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 2851.288 ± 481.472 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 384.383 ± 58.697 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 145281.908 ± 2223.621 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 388.851 ± 114.120 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 147171.915 ± 50430.527 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.264 ± 0.980 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 479.848 ± 420.881 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 39.000 counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 130.000 ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 297.380 ± 35.681 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 288.839 ± 19.035 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 1183791.866 ± 77487.278 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 296.587 ± 125.148 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 1214497.578 ± 457975.153 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 6.942 ± 23.492 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 28880.733 ± 99593.659 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.440 ± 3.887 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 26354.762 ± 14876.857 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 25.000 counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1821.000 ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 122.211 ± 81.618 ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 377.099 ± 77.146 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 4364450.973 ± 2856586.882 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 381.570 ± 119.260 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 4415115.428 ± 3000198.792 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.914 ± 16.479 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 31833.830 ± 274098.881 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 12.117 ± 20.931 MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 136001.918 ± 196459.666 B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3698.000 ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
```
Thanks,
Bill Farner
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by Bill Farner <wf...@apache.org>.
> On Jan. 24, 2018, 3:40 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > Line 234 (original), 235 (patched)
> > <https://reviews.apache.org/r/65303/diff/1/?file=1944709#file1944709line237>
> >
> > Have you considered passing in the predicate filter in here? For index scans this should help to eliminate a large amount of allocations.
A fine idea! I will be out of contact for a few days, but will try this out when i get back.
- Bill
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196107
-----------------------------------------------------------
On Jan. 23, 2018, 4:32 p.m., Bill Farner wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
>
> (Updated Jan. 23, 2018, 4:32 p.m.)
>
>
> Review request for Aurora and Jordan Ly.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
>
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
>
>
> Diffs
> -----
>
> build.gradle 64af7ae
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c
>
>
> Diff: https://reviews.apache.org/r/65303/diff/1/
>
>
> Testing
> -------
>
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
>
> If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
>
> 10000 1066.632 ± 266.924 ops/s
> ·gc.alloc.rate.norm 10000 289227.205 ± 8888.051 B/op
> ·gc.count 10000 24.000 counts
> ·gc.time 10000 103.000 ms
>
> 50000 84.444 ± 32.620 ops/s
> ·gc.alloc.rate.norm 50000 3831210.967 ± 840844.713 B/op
> ·gc.count 50000 21.000 counts
> ·gc.time 50000 1407.000 ms
>
> 100000 38.645 ± 20.557 ops/s
> ·gc.alloc.rate.norm 100000 13555430.931 ± 6787344.701 B/op
> ·gc.count 100000 52.000 counts
> ·gc.time 100000 3304.000 ms
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
>
> 10000 2851.288 ± 481.472 ops/s
> ·gc.alloc.rate.norm 10000 145281.908 ± 2223.621 B/op
> ·gc.count 10000 39.000 counts
> ·gc.time 10000 130.000 ms
>
> 50000 297.380 ± 35.681 ops/s
> ·gc.alloc.rate.norm 50000 1183791.866 ± 77487.278 B/op
> ·gc.count 50000 25.000 counts
> ·gc.time 50000 1821.000 ms
>
> 100000 122.211 ± 81.618 ops/s
> ·gc.alloc.rate.norm 100000 4364450.973 ± 2856586.882 B/op
> ·gc.count 100000 52.000 counts
> ·gc.time 100000 3698.000 ms
> ```
>
>
> **Full benchmark output**
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 1066.632 ± 266.924 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 286.647 ± 62.371 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 289227.205 ± 8888.051 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 291.263 ± 159.266 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 294277.617 ± 166069.041 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.218 ± 1.029 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 1220.540 ± 708.455 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 24.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 103.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 84.444 ± 32.620 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 267.018 ± 27.389 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 3831210.967 ± 840844.713 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 258.565 ± 149.845 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 3707563.530 ± 2262218.319 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 4.487 ± 18.053 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 63848.757 ± 264487.651 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.034 ± 3.651 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 87385.381 ± 75159.508 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 21.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1407.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 38.645 ± 20.557 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 381.453 ± 63.491 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 13555430.931 ± 6787344.701 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 389.816 ± 123.320 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 13823571.735 ± 6642604.600 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.947 ± 16.766 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 92330.241 ± 794991.221 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 11.934 ± 18.565 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 414896.926 ± 551658.959 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3304.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 2851.288 ± 481.472 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 384.383 ± 58.697 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 145281.908 ± 2223.621 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 388.851 ± 114.120 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 147171.915 ± 50430.527 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.264 ± 0.980 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 479.848 ± 420.881 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 39.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 130.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 297.380 ± 35.681 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 288.839 ± 19.035 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 1183791.866 ± 77487.278 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 296.587 ± 125.148 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 1214497.578 ± 457975.153 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 6.942 ± 23.492 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 28880.733 ± 99593.659 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.440 ± 3.887 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 26354.762 ± 14876.857 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 25.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1821.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 122.211 ± 81.618 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 377.099 ± 77.146 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 4364450.973 ± 2856586.882 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 381.570 ± 119.260 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 4415115.428 ± 3000198.792 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.914 ± 16.479 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 31833.830 ± 274098.881 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 12.117 ± 20.931 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 136001.918 ± 196459.666 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3698.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
> ```
>
>
> Thanks,
>
> Bill Farner
>
>
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196107
-----------------------------------------------------------
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
Line 234 (original), 235 (patched)
<https://reviews.apache.org/r/65303/#comment275620>
Have you considered passing in the predicate filter in here? For index scans this should help to eliminate a large amount of allocations.
- Stephan Erb
On Jan. 24, 2018, 1:32 a.m., Bill Farner wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
>
> (Updated Jan. 24, 2018, 1:32 a.m.)
>
>
> Review request for Aurora and Jordan Ly.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
>
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
>
>
> Diffs
> -----
>
> build.gradle 64af7ae
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c
>
>
> Diff: https://reviews.apache.org/r/65303/diff/1/
>
>
> Testing
> -------
>
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
>
> If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
>
> 10000 1066.632 ± 266.924 ops/s
> ·gc.alloc.rate.norm 10000 289227.205 ± 8888.051 B/op
> ·gc.count 10000 24.000 counts
> ·gc.time 10000 103.000 ms
>
> 50000 84.444 ± 32.620 ops/s
> ·gc.alloc.rate.norm 50000 3831210.967 ± 840844.713 B/op
> ·gc.count 50000 21.000 counts
> ·gc.time 50000 1407.000 ms
>
> 100000 38.645 ± 20.557 ops/s
> ·gc.alloc.rate.norm 100000 13555430.931 ± 6787344.701 B/op
> ·gc.count 100000 52.000 counts
> ·gc.time 100000 3304.000 ms
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
>
> 10000 2851.288 ± 481.472 ops/s
> ·gc.alloc.rate.norm 10000 145281.908 ± 2223.621 B/op
> ·gc.count 10000 39.000 counts
> ·gc.time 10000 130.000 ms
>
> 50000 297.380 ± 35.681 ops/s
> ·gc.alloc.rate.norm 50000 1183791.866 ± 77487.278 B/op
> ·gc.count 50000 25.000 counts
> ·gc.time 50000 1821.000 ms
>
> 100000 122.211 ± 81.618 ops/s
> ·gc.alloc.rate.norm 100000 4364450.973 ± 2856586.882 B/op
> ·gc.count 100000 52.000 counts
> ·gc.time 100000 3698.000 ms
> ```
>
>
> **Full benchmark output**
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 1066.632 ± 266.924 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 286.647 ± 62.371 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 289227.205 ± 8888.051 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 291.263 ± 159.266 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 294277.617 ± 166069.041 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.218 ± 1.029 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 1220.540 ± 708.455 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 24.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 103.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 84.444 ± 32.620 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 267.018 ± 27.389 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 3831210.967 ± 840844.713 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 258.565 ± 149.845 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 3707563.530 ± 2262218.319 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 4.487 ± 18.053 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 63848.757 ± 264487.651 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.034 ± 3.651 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 87385.381 ± 75159.508 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 21.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1407.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 38.645 ± 20.557 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 381.453 ± 63.491 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 13555430.931 ± 6787344.701 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 389.816 ± 123.320 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 13823571.735 ± 6642604.600 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.947 ± 16.766 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 92330.241 ± 794991.221 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 11.934 ± 18.565 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 414896.926 ± 551658.959 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3304.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 2851.288 ± 481.472 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 384.383 ± 58.697 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 145281.908 ± 2223.621 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 388.851 ± 114.120 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 147171.915 ± 50430.527 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.264 ± 0.980 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 479.848 ± 420.881 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 39.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 130.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 297.380 ± 35.681 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 288.839 ± 19.035 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 1183791.866 ± 77487.278 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 296.587 ± 125.148 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 1214497.578 ± 457975.153 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 6.942 ± 23.492 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 28880.733 ± 99593.659 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.440 ± 3.887 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 26354.762 ± 14876.857 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 25.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1821.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 122.211 ± 81.618 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 377.099 ± 77.146 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 4364450.973 ± 2856586.882 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 381.570 ± 119.260 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 4415115.428 ± 3000198.792 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.914 ± 16.479 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 31833.830 ± 274098.881 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 12.117 ± 20.931 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 136001.918 ± 196459.666 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3698.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
> ```
>
>
> Thanks,
>
> Bill Farner
>
>
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196579
-----------------------------------------------------------
Master (787ccfe) is green with this patch.
./build-support/jenkins/build.sh
However, it appears that it might lack test coverage.
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
>
> (Updated Jan. 31, 2018, 6:12 p.m.)
>
>
> Review request for Aurora and Jordan Ly.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
>
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
>
>
> Diffs
> -----
>
> build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc
>
>
> Diff: https://reviews.apache.org/r/65303/diff/2/
>
>
> Testing
> -------
>
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
>
> If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> ```
>
>
> **Full benchmark output**
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate 10000 148.678 ± 42.890 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 146.991 ± 135.486 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 332983.005 ± 347401.950 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 0.804 ± 1.823 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 1784.147 ± 3904.546 B/op
> FetchAll.run:·gc.count 10000 9.000 counts
> FetchAll.run:·gc.time 10000 143.000 ms
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate 50000 250.771 ± 34.190 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 250.131 ± 144.214 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 3999003.844 ± 2907196.744 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.937 ± 20.180 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 111462.141 ± 322286.235 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 6.056 ± 4.371 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 96534.909 ± 73072.098 B/op
> FetchAll.run:·gc.count 50000 22.000 counts
> FetchAll.run:·gc.time 50000 3222.000 ms
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate 100000 343.280 ± 63.923 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 343.804 ± 147.542 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 13524848.537 ± 7132093.384 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 7.251 ± 26.847 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 286256.200 ± 1043939.286 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 11.448 ± 16.645 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 440924.671 ± 539369.420 B/op
> FetchAll.run:·gc.count 100000 53.000 counts
> FetchAll.run:·gc.time 100000 8664.000 ms
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 178.657 ± 96.891 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 181.829 ± 115.598 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 669894.533 ± 362265.228 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.017 ± 2.764 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 3509.419 ± 8933.232 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 174.000 ms
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 271.042 ± 35.522 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 278.006 ± 188.990 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 6835542.988 ± 4208216.383 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 7.836 ± 22.513 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 194944.435 ± 557587.333 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 6.063 ± 2.432 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 148960.731 ± 42282.391 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 24.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 3059.000 ms
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 336.740 ± 69.527 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 336.494 ± 88.830 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 28063164.240 ± 4888826.638 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 8.028 ± 37.263 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 672808.968 ± 2924497.150 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.351 ± 17.881 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 930977.737 ± 1252367.282 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 47.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 7245.000 ms
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate 10000 236.532 ± 98.709 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 247.755 ± 55.490 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 163873.606 ± 59092.580 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 1.328 ± 1.540 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 883.684 ± 1120.393 B/op
> FetchAll.run:·gc.count 10000 18.000 counts
> FetchAll.run:·gc.time 10000 191.000 ms
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate 50000 248.216 ± 15.196 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 239.336 ± 174.541 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 1409078.860 ± 1141224.117 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.504 ± 17.220 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 38644.950 ± 105262.889 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 5.994 ± 4.160 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 35246.411 ± 25958.915 B/op
> FetchAll.run:·gc.count 50000 21.000 counts
> FetchAll.run:·gc.time 50000 2875.000 ms
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate 100000 336.209 ± 80.094 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 342.190 ± 144.180 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 5167420.986 ± 1634774.992 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 11.783 ± 36.073 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 182947.872 ± 525172.467 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 12.299 ± 13.795 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 184635.309 ± 199254.266 B/op
> FetchAll.run:·gc.count 100000 46.000 counts
> FetchAll.run:·gc.time 100000 7778.000 ms
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 171.305 ± 57.968 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 176.084 ± 103.579 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 387100.753 ± 376481.454 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.305 ± 1.866 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 2812.059 ± 3518.689 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 170.000 ms
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 258.291 ± 30.111 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 250.887 ± 148.296 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 3308741.831 ± 2461004.974 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 5.218 ± 21.710 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 69254.269 ± 282577.478 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 5.803 ± 2.885 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 76523.177 ± 51120.227 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 21.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 2775.000 ms
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 331.638 ± 50.813 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 333.474 ± 116.673 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 12357891.009 ± 7285356.875 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 10.296 ± 27.573 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 371782.085 ± 910072.098 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.815 ± 10.161 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 428555.780 ± 184610.507 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 49.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 8602.000 ms
> ```
>
>
> Thanks,
>
> Bill Farner
>
>
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196581
-----------------------------------------------------------
Ship it!
Ship It!
- David McLaughlin
On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
>
> (Updated Jan. 31, 2018, 6:12 p.m.)
>
>
> Review request for Aurora and Jordan Ly.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
>
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
>
>
> Diffs
> -----
>
> build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc
>
>
> Diff: https://reviews.apache.org/r/65303/diff/2/
>
>
> Testing
> -------
>
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
>
> If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> ```
>
>
> **Full benchmark output**
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate 10000 148.678 ± 42.890 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 146.991 ± 135.486 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 332983.005 ± 347401.950 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 0.804 ± 1.823 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 1784.147 ± 3904.546 B/op
> FetchAll.run:·gc.count 10000 9.000 counts
> FetchAll.run:·gc.time 10000 143.000 ms
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate 50000 250.771 ± 34.190 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 250.131 ± 144.214 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 3999003.844 ± 2907196.744 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.937 ± 20.180 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 111462.141 ± 322286.235 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 6.056 ± 4.371 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 96534.909 ± 73072.098 B/op
> FetchAll.run:·gc.count 50000 22.000 counts
> FetchAll.run:·gc.time 50000 3222.000 ms
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate 100000 343.280 ± 63.923 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 343.804 ± 147.542 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 13524848.537 ± 7132093.384 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 7.251 ± 26.847 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 286256.200 ± 1043939.286 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 11.448 ± 16.645 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 440924.671 ± 539369.420 B/op
> FetchAll.run:·gc.count 100000 53.000 counts
> FetchAll.run:·gc.time 100000 8664.000 ms
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 178.657 ± 96.891 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 181.829 ± 115.598 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 669894.533 ± 362265.228 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.017 ± 2.764 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 3509.419 ± 8933.232 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 174.000 ms
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 271.042 ± 35.522 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 278.006 ± 188.990 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 6835542.988 ± 4208216.383 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 7.836 ± 22.513 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 194944.435 ± 557587.333 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 6.063 ± 2.432 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 148960.731 ± 42282.391 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 24.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 3059.000 ms
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 336.740 ± 69.527 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 336.494 ± 88.830 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 28063164.240 ± 4888826.638 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 8.028 ± 37.263 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 672808.968 ± 2924497.150 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.351 ± 17.881 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 930977.737 ± 1252367.282 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 47.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 7245.000 ms
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate 10000 236.532 ± 98.709 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 247.755 ± 55.490 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 163873.606 ± 59092.580 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 1.328 ± 1.540 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 883.684 ± 1120.393 B/op
> FetchAll.run:·gc.count 10000 18.000 counts
> FetchAll.run:·gc.time 10000 191.000 ms
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate 50000 248.216 ± 15.196 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 239.336 ± 174.541 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 1409078.860 ± 1141224.117 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.504 ± 17.220 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 38644.950 ± 105262.889 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 5.994 ± 4.160 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 35246.411 ± 25958.915 B/op
> FetchAll.run:·gc.count 50000 21.000 counts
> FetchAll.run:·gc.time 50000 2875.000 ms
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate 100000 336.209 ± 80.094 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 342.190 ± 144.180 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 5167420.986 ± 1634774.992 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 11.783 ± 36.073 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 182947.872 ± 525172.467 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 12.299 ± 13.795 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 184635.309 ± 199254.266 B/op
> FetchAll.run:·gc.count 100000 46.000 counts
> FetchAll.run:·gc.time 100000 7778.000 ms
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 171.305 ± 57.968 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 176.084 ± 103.579 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 387100.753 ± 376481.454 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.305 ± 1.866 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 2812.059 ± 3518.689 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 170.000 ms
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 258.291 ± 30.111 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 250.887 ± 148.296 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 3308741.831 ± 2461004.974 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 5.218 ± 21.710 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 69254.269 ± 282577.478 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 5.803 ± 2.885 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 76523.177 ± 51120.227 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 21.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 2775.000 ms
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 331.638 ± 50.813 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 333.474 ± 116.673 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 12357891.009 ± 7285356.875 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 10.296 ± 27.573 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 371782.085 ± 910072.098 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.815 ± 10.161 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 428555.780 ± 184610.507 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 49.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 8602.000 ms
> ```
>
>
> Thanks,
>
> Bill Farner
>
>
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by Jordan Ly <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196592
-----------------------------------------------------------
Ship it!
Ship It!
- Jordan Ly
On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
>
> (Updated Jan. 31, 2018, 6:12 p.m.)
>
>
> Review request for Aurora and Jordan Ly.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
>
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
>
>
> Diffs
> -----
>
> build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc
>
>
> Diff: https://reviews.apache.org/r/65303/diff/2/
>
>
> Testing
> -------
>
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
>
> If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> ```
>
>
> **Full benchmark output**
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate 10000 148.678 ± 42.890 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 146.991 ± 135.486 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 332983.005 ± 347401.950 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 0.804 ± 1.823 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 1784.147 ± 3904.546 B/op
> FetchAll.run:·gc.count 10000 9.000 counts
> FetchAll.run:·gc.time 10000 143.000 ms
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate 50000 250.771 ± 34.190 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 250.131 ± 144.214 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 3999003.844 ± 2907196.744 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.937 ± 20.180 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 111462.141 ± 322286.235 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 6.056 ± 4.371 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 96534.909 ± 73072.098 B/op
> FetchAll.run:·gc.count 50000 22.000 counts
> FetchAll.run:·gc.time 50000 3222.000 ms
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate 100000 343.280 ± 63.923 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 343.804 ± 147.542 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 13524848.537 ± 7132093.384 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 7.251 ± 26.847 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 286256.200 ± 1043939.286 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 11.448 ± 16.645 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 440924.671 ± 539369.420 B/op
> FetchAll.run:·gc.count 100000 53.000 counts
> FetchAll.run:·gc.time 100000 8664.000 ms
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 178.657 ± 96.891 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 181.829 ± 115.598 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 669894.533 ± 362265.228 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.017 ± 2.764 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 3509.419 ± 8933.232 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 174.000 ms
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 271.042 ± 35.522 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 278.006 ± 188.990 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 6835542.988 ± 4208216.383 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 7.836 ± 22.513 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 194944.435 ± 557587.333 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 6.063 ± 2.432 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 148960.731 ± 42282.391 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 24.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 3059.000 ms
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 336.740 ± 69.527 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 336.494 ± 88.830 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 28063164.240 ± 4888826.638 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 8.028 ± 37.263 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 672808.968 ± 2924497.150 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.351 ± 17.881 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 930977.737 ± 1252367.282 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 47.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 7245.000 ms
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate 10000 236.532 ± 98.709 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 247.755 ± 55.490 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 163873.606 ± 59092.580 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 1.328 ± 1.540 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 883.684 ± 1120.393 B/op
> FetchAll.run:·gc.count 10000 18.000 counts
> FetchAll.run:·gc.time 10000 191.000 ms
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate 50000 248.216 ± 15.196 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 239.336 ± 174.541 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 1409078.860 ± 1141224.117 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.504 ± 17.220 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 38644.950 ± 105262.889 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 5.994 ± 4.160 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 35246.411 ± 25958.915 B/op
> FetchAll.run:·gc.count 50000 21.000 counts
> FetchAll.run:·gc.time 50000 2875.000 ms
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate 100000 336.209 ± 80.094 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 342.190 ± 144.180 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 5167420.986 ± 1634774.992 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 11.783 ± 36.073 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 182947.872 ± 525172.467 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 12.299 ± 13.795 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 184635.309 ± 199254.266 B/op
> FetchAll.run:·gc.count 100000 46.000 counts
> FetchAll.run:·gc.time 100000 7778.000 ms
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 171.305 ± 57.968 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 176.084 ± 103.579 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 387100.753 ± 376481.454 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.305 ± 1.866 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 2812.059 ± 3518.689 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 170.000 ms
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 258.291 ± 30.111 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 250.887 ± 148.296 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 3308741.831 ± 2461004.974 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 5.218 ± 21.710 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 69254.269 ± 282577.478 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 5.803 ± 2.885 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 76523.177 ± 51120.227 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 21.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 2775.000 ms
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 331.638 ± 50.813 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 333.474 ± 116.673 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 12357891.009 ± 7285356.875 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 10.296 ± 27.573 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 371782.085 ± 910072.098 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.815 ± 10.161 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 428555.780 ± 184610.507 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 49.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 8602.000 ms
> ```
>
>
> Thanks,
>
> Bill Farner
>
>
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196583
-----------------------------------------------------------
Ship it!
Ship It!
- Stephan Erb
On Jan. 31, 2018, 7:12 nachm., Bill Farner wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
>
> (Updated Jan. 31, 2018, 7:12 nachm.)
>
>
> Review request for Aurora and Jordan Ly.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
>
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
>
>
> Diffs
> -----
>
> build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1
> src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc
>
>
> Diff: https://reviews.apache.org/r/65303/diff/2/
>
>
> Testing
> -------
>
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
>
> If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> ```
>
>
> **Full benchmark output**
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 481.529 ± 184.751 ops/s
> FetchAll.run:·gc.alloc.rate 10000 148.678 ± 42.890 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 146.991 ± 135.486 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 332983.005 ± 347401.950 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 0.804 ± 1.823 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 1784.147 ± 3904.546 B/op
> FetchAll.run:·gc.count 10000 9.000 counts
> FetchAll.run:·gc.time 10000 143.000 ms
>
> FetchAll.run 50000 78.652 ± 20.869 ops/s
> FetchAll.run:·gc.alloc.rate 50000 250.771 ± 34.190 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 250.131 ± 144.214 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 3999003.844 ± 2907196.744 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.937 ± 20.180 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 111462.141 ± 322286.235 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 6.056 ± 4.371 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 96534.909 ± 73072.098 B/op
> FetchAll.run:·gc.count 50000 22.000 counts
> FetchAll.run:·gc.time 50000 3222.000 ms
>
> FetchAll.run 100000 38.371 ± 11.710 ops/s
> FetchAll.run:·gc.alloc.rate 100000 343.280 ± 63.923 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 343.804 ± 147.542 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 13524848.537 ± 7132093.384 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 7.251 ± 26.847 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 286256.200 ± 1043939.286 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 11.448 ± 16.645 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 440924.671 ± 539369.420 B/op
> FetchAll.run:·gc.count 100000 53.000 counts
> FetchAll.run:·gc.time 100000 8664.000 ms
>
> IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 178.657 ± 96.891 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 181.829 ± 115.598 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 669894.533 ± 362265.228 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.017 ± 2.764 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 3509.419 ± 8933.232 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 174.000 ms
>
> IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 271.042 ± 35.522 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 278.006 ± 188.990 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 6835542.988 ± 4208216.383 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 7.836 ± 22.513 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 194944.435 ± 557587.333 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 6.063 ± 2.432 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 148960.731 ± 42282.391 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 24.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 3059.000 ms
>
> IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 336.740 ± 69.527 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 336.494 ± 88.830 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 28063164.240 ± 4888826.638 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 8.028 ± 37.263 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 672808.968 ± 2924497.150 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.351 ± 17.881 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 930977.737 ± 1252367.282 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 47.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 7245.000 ms
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
> FetchAll.run 10000 1653.572 ± 799.123 ops/s
> FetchAll.run:·gc.alloc.rate 10000 236.532 ± 98.709 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 10000 247.755 ± 55.490 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 163873.606 ± 59092.580 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 10000 1.328 ± 1.540 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 883.684 ± 1120.393 B/op
> FetchAll.run:·gc.count 10000 18.000 counts
> FetchAll.run:·gc.time 10000 191.000 ms
>
> FetchAll.run 50000 210.454 ± 54.340 ops/s
> FetchAll.run:·gc.alloc.rate 50000 248.216 ± 15.196 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 50000 239.336 ± 174.541 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 1409078.860 ± 1141224.117 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.504 ± 17.220 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 38644.950 ± 105262.889 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 50000 5.994 ± 4.160 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 35246.411 ± 25958.915 B/op
> FetchAll.run:·gc.count 50000 21.000 counts
> FetchAll.run:·gc.time 50000 2875.000 ms
>
> FetchAll.run 100000 97.783 ± 42.130 ops/s
> FetchAll.run:·gc.alloc.rate 100000 336.209 ± 80.094 MB/sec
> FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
> FetchAll.run:·gc.churn.PS_Eden_Space 100000 342.190 ± 144.180 MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 5167420.986 ± 1634774.992 B/op
> FetchAll.run:·gc.churn.PS_Old_Gen 100000 11.783 ± 36.073 MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 182947.872 ± 525172.467 B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space 100000 12.299 ± 13.795 MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 184635.309 ± 199254.266 B/op
> FetchAll.run:·gc.count 100000 46.000 counts
> FetchAll.run:·gc.time 100000 7778.000 ms
>
> IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 10000 171.305 ± 57.968 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 176.084 ± 103.579 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 387100.753 ± 376481.454 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.305 ± 1.866 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 2812.059 ± 3518.689 B/op
> IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
> IndexedFetchAndFilter.run:·gc.time 10000 170.000 ms
>
> IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 50000 258.291 ± 30.111 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 250.887 ± 148.296 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 3308741.831 ± 2461004.974 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 5.218 ± 21.710 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 69254.269 ± 282577.478 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 5.803 ± 2.885 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 76523.177 ± 51120.227 B/op
> IndexedFetchAndFilter.run:·gc.count 50000 21.000 counts
> IndexedFetchAndFilter.run:·gc.time 50000 2775.000 ms
>
> IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate 100000 331.638 ± 50.813 MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 333.474 ± 116.673 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 12357891.009 ± 7285356.875 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 10.296 ± 27.573 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 371782.085 ± 910072.098 B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.815 ± 10.161 MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 428555.780 ± 184610.507 B/op
> IndexedFetchAndFilter.run:·gc.count 100000 49.000 counts
> IndexedFetchAndFilter.run:·gc.time 100000 8602.000 ms
> ```
>
>
> Thanks,
>
> Bill Farner
>
>
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
-----------------------------------------------------------
(Updated Jan. 31, 2018, 10:12 a.m.)
Review request for Aurora and Jordan Ly.
Changes
-------
Applied Stephan's suggestion, added a benchmark to validate.
Repository: aurora
Description
-------
Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
Diffs (updated)
-----
build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1
src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc
Diff: https://reviews.apache.org/r/65303/diff/2/
Changes: https://reviews.apache.org/r/65303/diff/1-2/
Testing (updated)
-------
Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
```
Prior to this patch:
```console
Benchmark (numTasks) Score Error Units
FetchAll.run 10000 481.529 ± 184.751 ops/s
FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
FetchAll.run 50000 78.652 ± 20.869 ops/s
FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
FetchAll.run 100000 38.371 ± 11.710 ops/s
FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
```
With this patch:
```console
Benchmark (numTasks) Score Error Units
FetchAll.run 10000 1653.572 ± 799.123 ops/s
FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
FetchAll.run 50000 210.454 ± 54.340 ops/s
FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
FetchAll.run 100000 97.783 ± 42.130 ops/s
FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
```
**Full benchmark output**
Prior to this patch:
```console
Benchmark (numTasks) Score Error Units
FetchAll.run 10000 481.529 ± 184.751 ops/s
FetchAll.run:·gc.alloc.rate 10000 148.678 ± 42.890 MB/sec
FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± 33544.960 B/op
FetchAll.run:·gc.churn.PS_Eden_Space 10000 146.991 ± 135.486 MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 332983.005 ± 347401.950 B/op
FetchAll.run:·gc.churn.PS_Survivor_Space 10000 0.804 ± 1.823 MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 1784.147 ± 3904.546 B/op
FetchAll.run:·gc.count 10000 9.000 counts
FetchAll.run:·gc.time 10000 143.000 ms
FetchAll.run 50000 78.652 ± 20.869 ops/s
FetchAll.run:·gc.alloc.rate 50000 250.771 ± 34.190 MB/sec
FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± 701585.657 B/op
FetchAll.run:·gc.churn.PS_Eden_Space 50000 250.131 ± 144.214 MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 3999003.844 ± 2907196.744 B/op
FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.937 ± 20.180 MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 111462.141 ± 322286.235 B/op
FetchAll.run:·gc.churn.PS_Survivor_Space 50000 6.056 ± 4.371 MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 96534.909 ± 73072.098 B/op
FetchAll.run:·gc.count 50000 22.000 counts
FetchAll.run:·gc.time 50000 3222.000 ms
FetchAll.run 100000 38.371 ± 11.710 ops/s
FetchAll.run:·gc.alloc.rate 100000 343.280 ± 63.923 MB/sec
FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± 3369614.510 B/op
FetchAll.run:·gc.churn.PS_Eden_Space 100000 343.804 ± 147.542 MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 13524848.537 ± 7132093.384 B/op
FetchAll.run:·gc.churn.PS_Old_Gen 100000 7.251 ± 26.847 MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 286256.200 ± 1043939.286 B/op
FetchAll.run:·gc.churn.PS_Survivor_Space 100000 11.448 ± 16.645 MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 440924.671 ± 539369.420 B/op
FetchAll.run:·gc.count 100000 53.000 counts
FetchAll.run:·gc.time 100000 8664.000 ms
IndexedFetchAndFilter.run 10000 296.557 ± 198.389 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate 10000 178.657 ± 96.891 MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± 98138.360 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 181.829 ± 115.598 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 669894.533 ± 362265.228 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.017 ± 2.764 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 3509.419 ± 8933.232 B/op
IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
IndexedFetchAndFilter.run:·gc.time 10000 174.000 ms
IndexedFetchAndFilter.run 50000 50.300 ± 5.818 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate 50000 271.042 ± 35.522 MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± 452020.849 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 278.006 ± 188.990 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 6835542.988 ± 4208216.383 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 7.836 ± 22.513 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 194944.435 ± 557587.333 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 6.063 ± 2.432 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 148960.731 ± 42282.391 B/op
IndexedFetchAndFilter.run:·gc.count 50000 24.000 counts
IndexedFetchAndFilter.run:·gc.time 50000 3059.000 ms
IndexedFetchAndFilter.run 100000 17.637 ± 3.739 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate 100000 336.740 ± 69.527 MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± 4486308.188 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 336.494 ± 88.830 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 28063164.240 ± 4888826.638 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 8.028 ± 37.263 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 672808.968 ± 2924497.150 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.351 ± 17.881 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 930977.737 ± 1252367.282 B/op
IndexedFetchAndFilter.run:·gc.count 100000 47.000 counts
IndexedFetchAndFilter.run:·gc.time 100000 7245.000 ms
```
With this patch:
```console
Benchmark (numTasks) Score Error Units
FetchAll.run 10000 1653.572 ± 799.123 ops/s
FetchAll.run:·gc.alloc.rate 10000 236.532 ± 98.709 MB/sec
FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± 10345.657 B/op
FetchAll.run:·gc.churn.PS_Eden_Space 10000 247.755 ± 55.490 MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 163873.606 ± 59092.580 B/op
FetchAll.run:·gc.churn.PS_Survivor_Space 10000 1.328 ± 1.540 MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 883.684 ± 1120.393 B/op
FetchAll.run:·gc.count 10000 18.000 counts
FetchAll.run:·gc.time 10000 191.000 ms
FetchAll.run 50000 210.454 ± 54.340 ops/s
FetchAll.run:·gc.alloc.rate 50000 248.216 ± 15.196 MB/sec
FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± 228631.547 B/op
FetchAll.run:·gc.churn.PS_Eden_Space 50000 239.336 ± 174.541 MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 1409078.860 ± 1141224.117 B/op
FetchAll.run:·gc.churn.PS_Old_Gen 50000 6.504 ± 17.220 MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 38644.950 ± 105262.889 B/op
FetchAll.run:·gc.churn.PS_Survivor_Space 50000 5.994 ± 4.160 MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 35246.411 ± 25958.915 B/op
FetchAll.run:·gc.count 50000 21.000 counts
FetchAll.run:·gc.time 50000 2875.000 ms
FetchAll.run 100000 97.783 ± 42.130 ops/s
FetchAll.run:·gc.alloc.rate 100000 336.209 ± 80.094 MB/sec
FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± 1792136.191 B/op
FetchAll.run:·gc.churn.PS_Eden_Space 100000 342.190 ± 144.180 MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 5167420.986 ± 1634774.992 B/op
FetchAll.run:·gc.churn.PS_Old_Gen 100000 11.783 ± 36.073 MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 182947.872 ± 525172.467 B/op
FetchAll.run:·gc.churn.PS_Survivor_Space 100000 12.299 ± 13.795 MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 184635.309 ± 199254.266 B/op
FetchAll.run:·gc.count 100000 46.000 counts
FetchAll.run:·gc.time 100000 7778.000 ms
IndexedFetchAndFilter.run 10000 500.740 ± 210.675 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate 10000 171.305 ± 57.968 MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± 36813.071 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 176.084 ± 103.579 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 387100.753 ± 376481.454 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 1.305 ± 1.866 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 2812.059 ± 3518.689 B/op
IndexedFetchAndFilter.run:·gc.count 10000 11.000 counts
IndexedFetchAndFilter.run:·gc.time 10000 170.000 ms
IndexedFetchAndFilter.run 50000 95.316 ± 23.084 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate 50000 258.291 ± 30.111 MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± 550602.162 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 250.887 ± 148.296 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 3308741.831 ± 2461004.974 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 5.218 ± 21.710 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 69254.269 ± 282577.478 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 5.803 ± 2.885 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 76523.177 ± 51120.227 B/op
IndexedFetchAndFilter.run:·gc.count 50000 21.000 counts
IndexedFetchAndFilter.run:·gc.time 50000 2775.000 ms
IndexedFetchAndFilter.run 100000 41.572 ± 26.747 ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate 100000 331.638 ± 50.813 MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± 7537788.165 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 333.474 ± 116.673 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 12357891.009 ± 7285356.875 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 10.296 ± 27.573 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 371782.085 ± 910072.098 B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 11.815 ± 10.161 MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 428555.780 ± 184610.507 B/op
IndexedFetchAndFilter.run:·gc.count 100000 49.000 counts
IndexedFetchAndFilter.run:·gc.time 100000 8602.000 ms
```
Thanks,
Bill Farner
Re: Review Request 65303: Improve performance of MemTaskStore queries
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196085
-----------------------------------------------------------
Master (dbe7137) is green with this patch.
./build-support/jenkins/build.sh
However, it appears that it might lack test coverage.
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On Jan. 24, 2018, 12:32 a.m., Bill Farner wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
>
> (Updated Jan. 24, 2018, 12:32 a.m.)
>
>
> Review request for Aurora and Jordan Ly.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional. I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
>
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes. I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
>
>
> Diffs
> -----
>
> build.gradle 64af7ae
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c
>
>
> Diff: https://reviews.apache.org/r/65303/diff/1/
>
>
> Testing
> -------
>
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version. It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor. Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs. I chose to present this output as a caveat and a discussion point.
>
> If you scroll to the full output at the bottom, you will see some more granular allocation data. Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change. Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Score Error Units
>
> 10000 1066.632 ± 266.924 ops/s
> ·gc.alloc.rate.norm 10000 289227.205 ± 8888.051 B/op
> ·gc.count 10000 24.000 counts
> ·gc.time 10000 103.000 ms
>
> 50000 84.444 ± 32.620 ops/s
> ·gc.alloc.rate.norm 50000 3831210.967 ± 840844.713 B/op
> ·gc.count 50000 21.000 counts
> ·gc.time 50000 1407.000 ms
>
> 100000 38.645 ± 20.557 ops/s
> ·gc.alloc.rate.norm 100000 13555430.931 ± 6787344.701 B/op
> ·gc.count 100000 52.000 counts
> ·gc.time 100000 3304.000 ms
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Score Error Units
>
> 10000 2851.288 ± 481.472 ops/s
> ·gc.alloc.rate.norm 10000 145281.908 ± 2223.621 B/op
> ·gc.count 10000 39.000 counts
> ·gc.time 10000 130.000 ms
>
> 50000 297.380 ± 35.681 ops/s
> ·gc.alloc.rate.norm 50000 1183791.866 ± 77487.278 B/op
> ·gc.count 50000 25.000 counts
> ·gc.time 50000 1821.000 ms
>
> 100000 122.211 ± 81.618 ops/s
> ·gc.alloc.rate.norm 100000 4364450.973 ± 2856586.882 B/op
> ·gc.count 100000 52.000 counts
> ·gc.time 100000 3698.000 ms
> ```
>
>
> **Full benchmark output**
>
> Prior to this patch:
> ```console
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 1066.632 ± 266.924 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 286.647 ± 62.371 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 289227.205 ± 8888.051 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 291.263 ± 159.266 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 294277.617 ± 166069.041 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.218 ± 1.029 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 1220.540 ± 708.455 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 24.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 103.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 84.444 ± 32.620 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 267.018 ± 27.389 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 3831210.967 ± 840844.713 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 258.565 ± 149.845 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 3707563.530 ± 2262218.319 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 4.487 ± 18.053 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 63848.757 ± 264487.651 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.034 ± 3.651 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 87385.381 ± 75159.508 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 21.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1407.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 38.645 ± 20.557 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 381.453 ± 63.491 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 13555430.931 ± 6787344.701 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 389.816 ± 123.320 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 13823571.735 ± 6642604.600 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.947 ± 16.766 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 92330.241 ± 794991.221 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 11.934 ± 18.565 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 414896.926 ± 551658.959 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3304.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
> ```
>
> With this patch:
> ```console
> Benchmark (numTasks) Mode Cnt Score Error Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 10000 thrpt 5 2851.288 ± 481.472 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 10000 thrpt 5 384.383 ± 58.697 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 10000 thrpt 5 145281.908 ± 2223.621 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 10000 thrpt 5 388.851 ± 114.120 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 10000 thrpt 5 147171.915 ± 50430.527 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 10000 thrpt 5 1.264 ± 0.980 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 10000 thrpt 5 479.848 ± 420.881 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 10000 thrpt 5 39.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 10000 thrpt 5 130.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 10000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 50000 thrpt 5 297.380 ± 35.681 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 50000 thrpt 5 288.839 ± 19.035 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 50000 thrpt 5 1183791.866 ± 77487.278 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 50000 thrpt 5 296.587 ± 125.148 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 50000 thrpt 5 1214497.578 ± 457975.153 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 50000 thrpt 5 6.942 ± 23.492 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 50000 thrpt 5 28880.733 ± 99593.659 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 50000 thrpt 5 6.440 ± 3.887 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 50000 thrpt 5 26354.762 ± 14876.857 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 50000 thrpt 5 25.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 50000 thrpt 5 1821.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 50000 thrpt NaN ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run 100000 thrpt 5 122.211 ± 81.618 ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate 100000 thrpt 5 377.099 ± 77.146 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm 100000 thrpt 5 4364450.973 ± 2856586.882 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space 100000 thrpt 5 381.570 ± 119.260 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 100000 thrpt 5 4415115.428 ± 3000198.792 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen 100000 thrpt 5 1.914 ± 16.479 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm 100000 thrpt 5 31833.830 ± 274098.881 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space 100000 thrpt 5 12.117 ± 20.931 MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 100000 thrpt 5 136001.918 ± 196459.666 B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count 100000 thrpt 5 52.000 counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time 100000 thrpt 5 3698.000 ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack 100000 thrpt NaN ---
> ```
>
>
> Thanks,
>
> Bill Farner
>
>