You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Bill Farner <wf...@apache.org> on 2018/01/24 00:32:28 UTC

Review Request 65303: Improve performance of MemTaskStore queries

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
-----------------------------------------------------------

Review request for Aurora and Jordan Ly.


Repository: aurora


Description
-------

Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).

This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.


Diffs
-----

  build.gradle 64af7ae 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c 


Diff: https://reviews.apache.org/r/65303/diff/1/


Testing
-------

Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.

If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
```

Prior to this patch:
```console
Benchmark                 (numTasks)    Score         Error   Units

                          10000      1066.632 ±     266.924   ops/s
·gc.alloc.rate.norm       10000    289227.205 ±    8888.051    B/op
·gc.count                 10000        24.000                counts
·gc.time                  10000       103.000                    ms

                          50000        84.444 ±      32.620   ops/s
·gc.alloc.rate.norm       50000   3831210.967 ±  840844.713    B/op
·gc.count                 50000        21.000                counts
·gc.time                  50000      1407.000                    ms

                         100000        38.645 ±      20.557   ops/s
·gc.alloc.rate.norm      100000  13555430.931 ± 6787344.701    B/op
·gc.count                100000        52.000                counts
·gc.time                 100000      3304.000                    ms
```

With this patch:
```console
Benchmark               (numTasks)   Score         Error   Units

                         10000    2851.288 ±     481.472   ops/s
·gc.alloc.rate.norm      10000  145281.908 ±    2223.621    B/op
·gc.count                10000      39.000                counts
·gc.time                 10000     130.000                    ms

                         50000     297.380 ±      35.681   ops/s
·gc.alloc.rate.norm      50000 1183791.866 ±   77487.278    B/op
·gc.count                50000      25.000                counts
·gc.time                 50000    1821.000                    ms

                        100000     122.211 ±      81.618   ops/s                        
·gc.alloc.rate.norm     100000 4364450.973 ± 2856586.882    B/op
·gc.count               100000      52.000                counts
·gc.time                100000    3698.000                    ms
```


**Full benchmark output**

Prior to this patch:
```console
Benchmark                                                                        (numTasks)   Mode  Cnt         Score         Error   Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5      1066.632 ±     266.924   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5       286.647 ±      62.371  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5    289227.205 ±    8888.051    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5       291.263 ±     159.266  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5    294277.617 ±  166069.041    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5         1.218 ±       1.029  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      1220.540 ±     708.455    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5        24.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5       103.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt                NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5        84.444 ±      32.620   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5       267.018 ±      27.389  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5   3831210.967 ±  840844.713    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5       258.565 ±     149.845  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5   3707563.530 ± 2262218.319    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5         4.487 ±      18.053  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5     63848.757 ±  264487.651    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5         6.034 ±       3.651  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5     87385.381 ±   75159.508    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5        21.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5      1407.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt                NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5        38.645 ±      20.557   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5       381.453 ±      63.491  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  13555430.931 ± 6787344.701    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5       389.816 ±     123.320  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  13823571.735 ± 6642604.600    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5         1.947 ±      16.766  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5     92330.241 ±  794991.221    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5        11.934 ±      18.565  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5    414896.926 ±  551658.959    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5        52.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5      3304.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt                NaN                   ---
```

With this patch:
```console
Benchmark                                                                        (numTasks)   Mode  Cnt        Score         Error   Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5     2851.288 ±     481.472   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5      384.383 ±      58.697  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5   145281.908 ±    2223.621    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5      388.851 ±     114.120  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5   147171.915 ±   50430.527    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5        1.264 ±       0.980  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      479.848 ±     420.881    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5       39.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5      130.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt               NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5      297.380 ±      35.681   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5      288.839 ±      19.035  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5  1183791.866 ±   77487.278    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5      296.587 ±     125.148  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5  1214497.578 ±  457975.153    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5        6.942 ±      23.492  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5    28880.733 ±   99593.659    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5        6.440 ±       3.887  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5    26354.762 ±   14876.857    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5       25.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5     1821.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt               NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5      122.211 ±      81.618   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5      377.099 ±      77.146  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  4364450.973 ± 2856586.882    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5      381.570 ±     119.260  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  4415115.428 ± 3000198.792    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5        1.914 ±      16.479  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5    31833.830 ±  274098.881    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5       12.117 ±      20.931  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5   136001.918 ±  196459.666    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5       52.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5     3698.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt               NaN                   ---
```


Thanks,

Bill Farner


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by Bill Farner <wf...@apache.org>.

> On Jan. 24, 2018, 3:40 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > Line 234 (original), 235 (patched)
> > <https://reviews.apache.org/r/65303/diff/1/?file=1944709#file1944709line237>
> >
> >     Have you considered passing in the predicate filter in here? For index scans this should help to eliminate a large amount of allocations.

A fine idea!  I will be out of contact for a few days, but will try this out when i get back.


- Bill


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196107
-----------------------------------------------------------


On Jan. 23, 2018, 4:32 p.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2018, 4:32 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
> 
> 
> Diffs
> -----
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> -------
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark                 (numTasks)    Score         Error   Units
> 
>                           10000      1066.632 ±     266.924   ops/s
> ·gc.alloc.rate.norm       10000    289227.205 ±    8888.051    B/op
> ·gc.count                 10000        24.000                counts
> ·gc.time                  10000       103.000                    ms
> 
>                           50000        84.444 ±      32.620   ops/s
> ·gc.alloc.rate.norm       50000   3831210.967 ±  840844.713    B/op
> ·gc.count                 50000        21.000                counts
> ·gc.time                  50000      1407.000                    ms
> 
>                          100000        38.645 ±      20.557   ops/s
> ·gc.alloc.rate.norm      100000  13555430.931 ± 6787344.701    B/op
> ·gc.count                100000        52.000                counts
> ·gc.time                 100000      3304.000                    ms
> ```
> 
> With this patch:
> ```console
> Benchmark               (numTasks)   Score         Error   Units
> 
>                          10000    2851.288 ±     481.472   ops/s
> ·gc.alloc.rate.norm      10000  145281.908 ±    2223.621    B/op
> ·gc.count                10000      39.000                counts
> ·gc.time                 10000     130.000                    ms
> 
>                          50000     297.380 ±      35.681   ops/s
> ·gc.alloc.rate.norm      50000 1183791.866 ±   77487.278    B/op
> ·gc.count                50000      25.000                counts
> ·gc.time                 50000    1821.000                    ms
> 
>                         100000     122.211 ±      81.618   ops/s                        
> ·gc.alloc.rate.norm     100000 4364450.973 ± 2856586.882    B/op
> ·gc.count               100000      52.000                counts
> ·gc.time                100000    3698.000                    ms
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark                                                                        (numTasks)   Mode  Cnt         Score         Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5      1066.632 ±     266.924   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5       286.647 ±      62.371  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5    289227.205 ±    8888.051    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5       291.263 ±     159.266  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5    294277.617 ±  166069.041    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5         1.218 ±       1.029  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      1220.540 ±     708.455    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5        24.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5       103.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt                NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5        84.444 ±      32.620   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5       267.018 ±      27.389  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5   3831210.967 ±  840844.713    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5       258.565 ±     149.845  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5   3707563.530 ± 2262218.319    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5         4.487 ±      18.053  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5     63848.757 ±  264487.651    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5         6.034 ±       3.651  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5     87385.381 ±   75159.508    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5        21.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5      1407.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt                NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5        38.645 ±      20.557   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5       381.453 ±      63.491  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  13555430.931 ± 6787344.701    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5       389.816 ±     123.320  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  13823571.735 ± 6642604.600    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5         1.947 ±      16.766  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5     92330.241 ±  794991.221    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5        11.934 ±      18.565  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5    414896.926 ±  551658.959    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5        52.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5      3304.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt                NaN                   ---
> ```
> 
> With this patch:
> ```console
> Benchmark                                                                        (numTasks)   Mode  Cnt        Score         Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5     2851.288 ±     481.472   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5      384.383 ±      58.697  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5   145281.908 ±    2223.621    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5      388.851 ±     114.120  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5   147171.915 ±   50430.527    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5        1.264 ±       0.980  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      479.848 ±     420.881    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5       39.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5      130.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt               NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5      297.380 ±      35.681   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5      288.839 ±      19.035  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5  1183791.866 ±   77487.278    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5      296.587 ±     125.148  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5  1214497.578 ±  457975.153    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5        6.942 ±      23.492  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5    28880.733 ±   99593.659    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5        6.440 ±       3.887  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5    26354.762 ±   14876.857    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5       25.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5     1821.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt               NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5      122.211 ±      81.618   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5      377.099 ±      77.146  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  4364450.973 ± 2856586.882    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5      381.570 ±     119.260  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  4415115.428 ± 3000198.792    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5        1.914 ±      16.479  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5    31833.830 ±  274098.881    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5       12.117 ±      20.931  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5   136001.918 ±  196459.666    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5       52.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5     3698.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt               NaN                   ---
> ```
> 
> 
> Thanks,
> 
> Bill Farner
> 
>


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196107
-----------------------------------------------------------




src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
Line 234 (original), 235 (patched)
<https://reviews.apache.org/r/65303/#comment275620>

    Have you considered passing in the predicate filter in here? For index scans this should help to eliminate a large amount of allocations.


- Stephan Erb


On Jan. 24, 2018, 1:32 a.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2018, 1:32 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
> 
> 
> Diffs
> -----
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> -------
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark                 (numTasks)    Score         Error   Units
> 
>                           10000      1066.632 ±     266.924   ops/s
> ·gc.alloc.rate.norm       10000    289227.205 ±    8888.051    B/op
> ·gc.count                 10000        24.000                counts
> ·gc.time                  10000       103.000                    ms
> 
>                           50000        84.444 ±      32.620   ops/s
> ·gc.alloc.rate.norm       50000   3831210.967 ±  840844.713    B/op
> ·gc.count                 50000        21.000                counts
> ·gc.time                  50000      1407.000                    ms
> 
>                          100000        38.645 ±      20.557   ops/s
> ·gc.alloc.rate.norm      100000  13555430.931 ± 6787344.701    B/op
> ·gc.count                100000        52.000                counts
> ·gc.time                 100000      3304.000                    ms
> ```
> 
> With this patch:
> ```console
> Benchmark               (numTasks)   Score         Error   Units
> 
>                          10000    2851.288 ±     481.472   ops/s
> ·gc.alloc.rate.norm      10000  145281.908 ±    2223.621    B/op
> ·gc.count                10000      39.000                counts
> ·gc.time                 10000     130.000                    ms
> 
>                          50000     297.380 ±      35.681   ops/s
> ·gc.alloc.rate.norm      50000 1183791.866 ±   77487.278    B/op
> ·gc.count                50000      25.000                counts
> ·gc.time                 50000    1821.000                    ms
> 
>                         100000     122.211 ±      81.618   ops/s                        
> ·gc.alloc.rate.norm     100000 4364450.973 ± 2856586.882    B/op
> ·gc.count               100000      52.000                counts
> ·gc.time                100000    3698.000                    ms
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark                                                                        (numTasks)   Mode  Cnt         Score         Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5      1066.632 ±     266.924   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5       286.647 ±      62.371  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5    289227.205 ±    8888.051    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5       291.263 ±     159.266  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5    294277.617 ±  166069.041    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5         1.218 ±       1.029  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      1220.540 ±     708.455    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5        24.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5       103.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt                NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5        84.444 ±      32.620   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5       267.018 ±      27.389  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5   3831210.967 ±  840844.713    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5       258.565 ±     149.845  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5   3707563.530 ± 2262218.319    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5         4.487 ±      18.053  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5     63848.757 ±  264487.651    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5         6.034 ±       3.651  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5     87385.381 ±   75159.508    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5        21.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5      1407.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt                NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5        38.645 ±      20.557   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5       381.453 ±      63.491  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  13555430.931 ± 6787344.701    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5       389.816 ±     123.320  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  13823571.735 ± 6642604.600    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5         1.947 ±      16.766  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5     92330.241 ±  794991.221    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5        11.934 ±      18.565  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5    414896.926 ±  551658.959    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5        52.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5      3304.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt                NaN                   ---
> ```
> 
> With this patch:
> ```console
> Benchmark                                                                        (numTasks)   Mode  Cnt        Score         Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5     2851.288 ±     481.472   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5      384.383 ±      58.697  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5   145281.908 ±    2223.621    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5      388.851 ±     114.120  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5   147171.915 ±   50430.527    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5        1.264 ±       0.980  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      479.848 ±     420.881    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5       39.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5      130.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt               NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5      297.380 ±      35.681   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5      288.839 ±      19.035  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5  1183791.866 ±   77487.278    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5      296.587 ±     125.148  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5  1214497.578 ±  457975.153    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5        6.942 ±      23.492  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5    28880.733 ±   99593.659    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5        6.440 ±       3.887  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5    26354.762 ±   14876.857    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5       25.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5     1821.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt               NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5      122.211 ±      81.618   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5      377.099 ±      77.146  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  4364450.973 ± 2856586.882    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5      381.570 ±     119.260  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  4415115.428 ± 3000198.792    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5        1.914 ±      16.479  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5    31833.830 ±  274098.881    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5       12.117 ±      20.931  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5   136001.918 ±  196459.666    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5       52.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5     3698.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt               NaN                   ---
> ```
> 
> 
> Thanks,
> 
> Bill Farner
> 
>


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196579
-----------------------------------------------------------



Master (787ccfe) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2018, 6:12 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
> 
> 
> Diffs
> -----
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> -------
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    334970.771 ±   33544.960    B/op
> 
> FetchAll.run                                      50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   3991107.524 ±  701585.657    B/op
> 
> FetchAll.run                                     100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000  13487028.139 ± 3369614.510    B/op
> 
> IndexedFetchAndFilter.run                         10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    655319.005 ±   98138.360    B/op
> 
> IndexedFetchAndFilter.run                         50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   6671548.381 ±  452020.849    B/op
> 
> IndexedFetchAndFilter.run                        100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  28100173.458 ± 4486308.188    B/op
> ```
> 
> With this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    155426.052 ±   10345.657    B/op
> 
> FetchAll.run                                      50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   1457560.505 ±  228631.547    B/op
> 
> FetchAll.run                                     100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000   5096464.582 ± 1792136.191    B/op
> 
> IndexedFetchAndFilter.run                         10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    370760.068 ±   36813.071    B/op
> 
> IndexedFetchAndFilter.run                         50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   3389472.432 ±  550602.162    B/op
> 
> IndexedFetchAndFilter.run                        100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  12324183.188 ± 7537788.165    B/op
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       148.678 ±      42.890  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    334970.771 ±   33544.960    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       146.991 ±     135.486  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    332983.005 ±  347401.950    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         0.804 ±       1.823  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000      1784.147 ±    3904.546    B/op
> FetchAll.run:·gc.count                                           10000         9.000                counts
> FetchAll.run:·gc.time                                            10000       143.000                    ms
> 
> FetchAll.run                                                     50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       250.771 ±      34.190  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   3991107.524 ±  701585.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       250.131 ±     144.214  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   3999003.844 ± 2907196.744    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.937 ±      20.180  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000    111462.141 ±  322286.235    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         6.056 ±       4.371  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     96534.909 ±   73072.098    B/op
> FetchAll.run:·gc.count                                           50000        22.000                counts
> FetchAll.run:·gc.time                                            50000      3222.000                    ms
> 
> FetchAll.run                                                    100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       343.280 ±      63.923  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000  13487028.139 ± 3369614.510    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       343.804 ±     147.542  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000  13524848.537 ± 7132093.384    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000         7.251 ±      26.847  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    286256.200 ± 1043939.286    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        11.448 ±      16.645  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    440924.671 ±  539369.420    B/op
> FetchAll.run:·gc.count                                          100000        53.000                counts
> FetchAll.run:·gc.time                                           100000      8664.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       178.657 ±      96.891  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    655319.005 ±   98138.360    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       181.829 ±     115.598  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    669894.533 ±  362265.228    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.017 ±       2.764  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      3509.419 ±    8933.232    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       174.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       271.042 ±      35.522  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   6671548.381 ±  452020.849    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       278.006 ±     188.990  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   6835542.988 ± 4208216.383    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         7.836 ±      22.513  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000    194944.435 ±  557587.333    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         6.063 ±       2.432  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000    148960.731 ±   42282.391    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        24.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      3059.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       336.740 ±      69.527  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  28100173.458 ± 4486308.188    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       336.494 ±      88.830  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  28063164.240 ± 4888826.638    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000         8.028 ±      37.263  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    672808.968 ± 2924497.150    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.351 ±      17.881  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    930977.737 ± 1252367.282    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        47.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      7245.000                    ms
> ```
> 
> With this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       236.532 ±      98.709  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    155426.052 ±   10345.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       247.755 ±      55.490  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    163873.606 ±   59092.580    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         1.328 ±       1.540  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000       883.684 ±    1120.393    B/op
> FetchAll.run:·gc.count                                           10000        18.000                counts
> FetchAll.run:·gc.time                                            10000       191.000                    ms
> 
> FetchAll.run                                                     50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       248.216 ±      15.196  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   1457560.505 ±  228631.547    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       239.336 ±     174.541  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   1409078.860 ± 1141224.117    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.504 ±      17.220  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000     38644.950 ±  105262.889    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         5.994 ±       4.160  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     35246.411 ±   25958.915    B/op
> FetchAll.run:·gc.count                                           50000        21.000                counts
> FetchAll.run:·gc.time                                            50000      2875.000                    ms
> 
> FetchAll.run                                                    100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       336.209 ±      80.094  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000   5096464.582 ± 1792136.191    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       342.190 ±     144.180  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000   5167420.986 ± 1634774.992    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000        11.783 ±      36.073  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    182947.872 ±  525172.467    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        12.299 ±      13.795  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    184635.309 ±  199254.266    B/op
> FetchAll.run:·gc.count                                          100000        46.000                counts
> FetchAll.run:·gc.time                                           100000      7778.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       171.305 ±      57.968  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    370760.068 ±   36813.071    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       176.084 ±     103.579  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    387100.753 ±  376481.454    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.305 ±       1.866  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      2812.059 ±    3518.689    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       170.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       258.291 ±      30.111  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   3389472.432 ±  550602.162    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       250.887 ±     148.296  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   3308741.831 ± 2461004.974    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         5.218 ±      21.710  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000     69254.269 ±  282577.478    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         5.803 ±       2.885  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000     76523.177 ±   51120.227    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        21.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      2775.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       331.638 ±      50.813  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  12324183.188 ± 7537788.165    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       333.474 ±     116.673  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  12357891.009 ± 7285356.875    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000        10.296 ±      27.573  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    371782.085 ±  910072.098    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.815 ±      10.161  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    428555.780 ±  184610.507    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        49.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      8602.000                    ms
> ```
> 
> 
> Thanks,
> 
> Bill Farner
> 
>


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196581
-----------------------------------------------------------


Ship it!




Ship It!

- David McLaughlin


On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2018, 6:12 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
> 
> 
> Diffs
> -----
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> -------
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    334970.771 ±   33544.960    B/op
> 
> FetchAll.run                                      50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   3991107.524 ±  701585.657    B/op
> 
> FetchAll.run                                     100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000  13487028.139 ± 3369614.510    B/op
> 
> IndexedFetchAndFilter.run                         10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    655319.005 ±   98138.360    B/op
> 
> IndexedFetchAndFilter.run                         50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   6671548.381 ±  452020.849    B/op
> 
> IndexedFetchAndFilter.run                        100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  28100173.458 ± 4486308.188    B/op
> ```
> 
> With this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    155426.052 ±   10345.657    B/op
> 
> FetchAll.run                                      50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   1457560.505 ±  228631.547    B/op
> 
> FetchAll.run                                     100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000   5096464.582 ± 1792136.191    B/op
> 
> IndexedFetchAndFilter.run                         10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    370760.068 ±   36813.071    B/op
> 
> IndexedFetchAndFilter.run                         50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   3389472.432 ±  550602.162    B/op
> 
> IndexedFetchAndFilter.run                        100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  12324183.188 ± 7537788.165    B/op
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       148.678 ±      42.890  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    334970.771 ±   33544.960    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       146.991 ±     135.486  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    332983.005 ±  347401.950    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         0.804 ±       1.823  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000      1784.147 ±    3904.546    B/op
> FetchAll.run:·gc.count                                           10000         9.000                counts
> FetchAll.run:·gc.time                                            10000       143.000                    ms
> 
> FetchAll.run                                                     50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       250.771 ±      34.190  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   3991107.524 ±  701585.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       250.131 ±     144.214  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   3999003.844 ± 2907196.744    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.937 ±      20.180  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000    111462.141 ±  322286.235    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         6.056 ±       4.371  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     96534.909 ±   73072.098    B/op
> FetchAll.run:·gc.count                                           50000        22.000                counts
> FetchAll.run:·gc.time                                            50000      3222.000                    ms
> 
> FetchAll.run                                                    100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       343.280 ±      63.923  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000  13487028.139 ± 3369614.510    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       343.804 ±     147.542  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000  13524848.537 ± 7132093.384    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000         7.251 ±      26.847  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    286256.200 ± 1043939.286    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        11.448 ±      16.645  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    440924.671 ±  539369.420    B/op
> FetchAll.run:·gc.count                                          100000        53.000                counts
> FetchAll.run:·gc.time                                           100000      8664.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       178.657 ±      96.891  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    655319.005 ±   98138.360    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       181.829 ±     115.598  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    669894.533 ±  362265.228    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.017 ±       2.764  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      3509.419 ±    8933.232    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       174.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       271.042 ±      35.522  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   6671548.381 ±  452020.849    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       278.006 ±     188.990  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   6835542.988 ± 4208216.383    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         7.836 ±      22.513  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000    194944.435 ±  557587.333    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         6.063 ±       2.432  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000    148960.731 ±   42282.391    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        24.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      3059.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       336.740 ±      69.527  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  28100173.458 ± 4486308.188    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       336.494 ±      88.830  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  28063164.240 ± 4888826.638    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000         8.028 ±      37.263  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    672808.968 ± 2924497.150    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.351 ±      17.881  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    930977.737 ± 1252367.282    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        47.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      7245.000                    ms
> ```
> 
> With this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       236.532 ±      98.709  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    155426.052 ±   10345.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       247.755 ±      55.490  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    163873.606 ±   59092.580    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         1.328 ±       1.540  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000       883.684 ±    1120.393    B/op
> FetchAll.run:·gc.count                                           10000        18.000                counts
> FetchAll.run:·gc.time                                            10000       191.000                    ms
> 
> FetchAll.run                                                     50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       248.216 ±      15.196  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   1457560.505 ±  228631.547    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       239.336 ±     174.541  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   1409078.860 ± 1141224.117    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.504 ±      17.220  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000     38644.950 ±  105262.889    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         5.994 ±       4.160  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     35246.411 ±   25958.915    B/op
> FetchAll.run:·gc.count                                           50000        21.000                counts
> FetchAll.run:·gc.time                                            50000      2875.000                    ms
> 
> FetchAll.run                                                    100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       336.209 ±      80.094  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000   5096464.582 ± 1792136.191    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       342.190 ±     144.180  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000   5167420.986 ± 1634774.992    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000        11.783 ±      36.073  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    182947.872 ±  525172.467    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        12.299 ±      13.795  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    184635.309 ±  199254.266    B/op
> FetchAll.run:·gc.count                                          100000        46.000                counts
> FetchAll.run:·gc.time                                           100000      7778.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       171.305 ±      57.968  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    370760.068 ±   36813.071    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       176.084 ±     103.579  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    387100.753 ±  376481.454    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.305 ±       1.866  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      2812.059 ±    3518.689    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       170.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       258.291 ±      30.111  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   3389472.432 ±  550602.162    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       250.887 ±     148.296  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   3308741.831 ± 2461004.974    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         5.218 ±      21.710  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000     69254.269 ±  282577.478    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         5.803 ±       2.885  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000     76523.177 ±   51120.227    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        21.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      2775.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       331.638 ±      50.813  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  12324183.188 ± 7537788.165    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       333.474 ±     116.673  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  12357891.009 ± 7285356.875    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000        10.296 ±      27.573  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    371782.085 ±  910072.098    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.815 ±      10.161  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    428555.780 ±  184610.507    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        49.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      8602.000                    ms
> ```
> 
> 
> Thanks,
> 
> Bill Farner
> 
>


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by Jordan Ly <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196592
-----------------------------------------------------------


Ship it!




Ship It!

- Jordan Ly


On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2018, 6:12 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
> 
> 
> Diffs
> -----
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> -------
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    334970.771 ±   33544.960    B/op
> 
> FetchAll.run                                      50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   3991107.524 ±  701585.657    B/op
> 
> FetchAll.run                                     100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000  13487028.139 ± 3369614.510    B/op
> 
> IndexedFetchAndFilter.run                         10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    655319.005 ±   98138.360    B/op
> 
> IndexedFetchAndFilter.run                         50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   6671548.381 ±  452020.849    B/op
> 
> IndexedFetchAndFilter.run                        100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  28100173.458 ± 4486308.188    B/op
> ```
> 
> With this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    155426.052 ±   10345.657    B/op
> 
> FetchAll.run                                      50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   1457560.505 ±  228631.547    B/op
> 
> FetchAll.run                                     100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000   5096464.582 ± 1792136.191    B/op
> 
> IndexedFetchAndFilter.run                         10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    370760.068 ±   36813.071    B/op
> 
> IndexedFetchAndFilter.run                         50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   3389472.432 ±  550602.162    B/op
> 
> IndexedFetchAndFilter.run                        100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  12324183.188 ± 7537788.165    B/op
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       148.678 ±      42.890  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    334970.771 ±   33544.960    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       146.991 ±     135.486  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    332983.005 ±  347401.950    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         0.804 ±       1.823  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000      1784.147 ±    3904.546    B/op
> FetchAll.run:·gc.count                                           10000         9.000                counts
> FetchAll.run:·gc.time                                            10000       143.000                    ms
> 
> FetchAll.run                                                     50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       250.771 ±      34.190  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   3991107.524 ±  701585.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       250.131 ±     144.214  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   3999003.844 ± 2907196.744    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.937 ±      20.180  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000    111462.141 ±  322286.235    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         6.056 ±       4.371  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     96534.909 ±   73072.098    B/op
> FetchAll.run:·gc.count                                           50000        22.000                counts
> FetchAll.run:·gc.time                                            50000      3222.000                    ms
> 
> FetchAll.run                                                    100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       343.280 ±      63.923  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000  13487028.139 ± 3369614.510    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       343.804 ±     147.542  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000  13524848.537 ± 7132093.384    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000         7.251 ±      26.847  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    286256.200 ± 1043939.286    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        11.448 ±      16.645  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    440924.671 ±  539369.420    B/op
> FetchAll.run:·gc.count                                          100000        53.000                counts
> FetchAll.run:·gc.time                                           100000      8664.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       178.657 ±      96.891  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    655319.005 ±   98138.360    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       181.829 ±     115.598  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    669894.533 ±  362265.228    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.017 ±       2.764  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      3509.419 ±    8933.232    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       174.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       271.042 ±      35.522  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   6671548.381 ±  452020.849    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       278.006 ±     188.990  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   6835542.988 ± 4208216.383    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         7.836 ±      22.513  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000    194944.435 ±  557587.333    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         6.063 ±       2.432  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000    148960.731 ±   42282.391    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        24.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      3059.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       336.740 ±      69.527  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  28100173.458 ± 4486308.188    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       336.494 ±      88.830  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  28063164.240 ± 4888826.638    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000         8.028 ±      37.263  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    672808.968 ± 2924497.150    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.351 ±      17.881  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    930977.737 ± 1252367.282    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        47.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      7245.000                    ms
> ```
> 
> With this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       236.532 ±      98.709  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    155426.052 ±   10345.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       247.755 ±      55.490  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    163873.606 ±   59092.580    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         1.328 ±       1.540  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000       883.684 ±    1120.393    B/op
> FetchAll.run:·gc.count                                           10000        18.000                counts
> FetchAll.run:·gc.time                                            10000       191.000                    ms
> 
> FetchAll.run                                                     50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       248.216 ±      15.196  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   1457560.505 ±  228631.547    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       239.336 ±     174.541  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   1409078.860 ± 1141224.117    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.504 ±      17.220  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000     38644.950 ±  105262.889    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         5.994 ±       4.160  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     35246.411 ±   25958.915    B/op
> FetchAll.run:·gc.count                                           50000        21.000                counts
> FetchAll.run:·gc.time                                            50000      2875.000                    ms
> 
> FetchAll.run                                                    100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       336.209 ±      80.094  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000   5096464.582 ± 1792136.191    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       342.190 ±     144.180  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000   5167420.986 ± 1634774.992    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000        11.783 ±      36.073  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    182947.872 ±  525172.467    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        12.299 ±      13.795  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    184635.309 ±  199254.266    B/op
> FetchAll.run:·gc.count                                          100000        46.000                counts
> FetchAll.run:·gc.time                                           100000      7778.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       171.305 ±      57.968  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    370760.068 ±   36813.071    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       176.084 ±     103.579  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    387100.753 ±  376481.454    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.305 ±       1.866  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      2812.059 ±    3518.689    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       170.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       258.291 ±      30.111  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   3389472.432 ±  550602.162    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       250.887 ±     148.296  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   3308741.831 ± 2461004.974    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         5.218 ±      21.710  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000     69254.269 ±  282577.478    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         5.803 ±       2.885  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000     76523.177 ±   51120.227    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        21.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      2775.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       331.638 ±      50.813  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  12324183.188 ± 7537788.165    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       333.474 ±     116.673  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  12357891.009 ± 7285356.875    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000        10.296 ±      27.573  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    371782.085 ±  910072.098    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.815 ±      10.161  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    428555.780 ±  184610.507    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        49.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      8602.000                    ms
> ```
> 
> 
> Thanks,
> 
> Bill Farner
> 
>


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196583
-----------------------------------------------------------


Ship it!




Ship It!

- Stephan Erb


On Jan. 31, 2018, 7:12 nachm., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2018, 7:12 nachm.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
> 
> 
> Diffs
> -----
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> -------
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    334970.771 ±   33544.960    B/op
> 
> FetchAll.run                                      50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   3991107.524 ±  701585.657    B/op
> 
> FetchAll.run                                     100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000  13487028.139 ± 3369614.510    B/op
> 
> IndexedFetchAndFilter.run                         10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    655319.005 ±   98138.360    B/op
> 
> IndexedFetchAndFilter.run                         50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   6671548.381 ±  452020.849    B/op
> 
> IndexedFetchAndFilter.run                        100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  28100173.458 ± 4486308.188    B/op
> ```
> 
> With this patch:
> ```console
> Benchmark                                    (numTasks)         Score         Error   Units
> FetchAll.run                                      10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  10000    155426.052 ±   10345.657    B/op
> 
> FetchAll.run                                      50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm                  50000   1457560.505 ±  228631.547    B/op
> 
> FetchAll.run                                     100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm                 100000   5096464.582 ± 1792136.191    B/op
> 
> IndexedFetchAndFilter.run                         10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    370760.068 ±   36813.071    B/op
> 
> IndexedFetchAndFilter.run                         50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   3389472.432 ±  550602.162    B/op
> 
> IndexedFetchAndFilter.run                        100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  12324183.188 ± 7537788.165    B/op
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000       481.529 ±     184.751   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       148.678 ±      42.890  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    334970.771 ±   33544.960    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       146.991 ±     135.486  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    332983.005 ±  347401.950    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         0.804 ±       1.823  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000      1784.147 ±    3904.546    B/op
> FetchAll.run:·gc.count                                           10000         9.000                counts
> FetchAll.run:·gc.time                                            10000       143.000                    ms
> 
> FetchAll.run                                                     50000        78.652 ±      20.869   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       250.771 ±      34.190  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   3991107.524 ±  701585.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       250.131 ±     144.214  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   3999003.844 ± 2907196.744    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.937 ±      20.180  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000    111462.141 ±  322286.235    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         6.056 ±       4.371  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     96534.909 ±   73072.098    B/op
> FetchAll.run:·gc.count                                           50000        22.000                counts
> FetchAll.run:·gc.time                                            50000      3222.000                    ms
> 
> FetchAll.run                                                    100000        38.371 ±      11.710   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       343.280 ±      63.923  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000  13487028.139 ± 3369614.510    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       343.804 ±     147.542  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000  13524848.537 ± 7132093.384    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000         7.251 ±      26.847  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    286256.200 ± 1043939.286    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        11.448 ±      16.645  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    440924.671 ±  539369.420    B/op
> FetchAll.run:·gc.count                                          100000        53.000                counts
> FetchAll.run:·gc.time                                           100000      8664.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       296.557 ±     198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       178.657 ±      96.891  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    655319.005 ±   98138.360    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       181.829 ±     115.598  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    669894.533 ±  362265.228    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.017 ±       2.764  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      3509.419 ±    8933.232    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       174.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        50.300 ±       5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       271.042 ±      35.522  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   6671548.381 ±  452020.849    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       278.006 ±     188.990  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   6835542.988 ± 4208216.383    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         7.836 ±      22.513  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000    194944.435 ±  557587.333    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         6.063 ±       2.432  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000    148960.731 ±   42282.391    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        24.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      3059.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        17.637 ±       3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       336.740 ±      69.527  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  28100173.458 ± 4486308.188    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       336.494 ±      88.830  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  28063164.240 ± 4888826.638    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000         8.028 ±      37.263  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    672808.968 ± 2924497.150    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.351 ±      17.881  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    930977.737 ± 1252367.282    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        47.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      7245.000                    ms
> ```
> 
> With this patch:
> ```console
> Benchmark                                                   (numTasks)         Score         Error   Units
> FetchAll.run                                                     10000      1653.572 ±     799.123   ops/s
> FetchAll.run:·gc.alloc.rate                                      10000       236.532 ±      98.709  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 10000    155426.052 ±   10345.657    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             10000       247.755 ±      55.490  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    163873.606 ±   59092.580    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         1.328 ±       1.540  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000       883.684 ±    1120.393    B/op
> FetchAll.run:·gc.count                                           10000        18.000                counts
> FetchAll.run:·gc.time                                            10000       191.000                    ms
> 
> FetchAll.run                                                     50000       210.454 ±      54.340   ops/s
> FetchAll.run:·gc.alloc.rate                                      50000       248.216 ±      15.196  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                 50000   1457560.505 ±  228631.547    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                             50000       239.336 ±     174.541  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   1409078.860 ± 1141224.117    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.504 ±      17.220  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000     38644.950 ±  105262.889    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         5.994 ±       4.160  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     35246.411 ±   25958.915    B/op
> FetchAll.run:·gc.count                                           50000        21.000                counts
> FetchAll.run:·gc.time                                            50000      2875.000                    ms
> 
> FetchAll.run                                                    100000        97.783 ±      42.130   ops/s
> FetchAll.run:·gc.alloc.rate                                     100000       336.209 ±      80.094  MB/sec
> FetchAll.run:·gc.alloc.rate.norm                                100000   5096464.582 ± 1792136.191    B/op
> FetchAll.run:·gc.churn.PS_Eden_Space                            100000       342.190 ±     144.180  MB/sec
> FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000   5167420.986 ± 1634774.992    B/op
> FetchAll.run:·gc.churn.PS_Old_Gen                               100000        11.783 ±      36.073  MB/sec
> FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    182947.872 ±  525172.467    B/op
> FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        12.299 ±      13.795  MB/sec
> FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    184635.309 ±  199254.266    B/op
> FetchAll.run:·gc.count                                          100000        46.000                counts
> FetchAll.run:·gc.time                                           100000      7778.000                    ms
> 
> IndexedFetchAndFilter.run                                        10000       500.740 ±     210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       171.305 ±      57.968  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    370760.068 ±   36813.071    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       176.084 ±     103.579  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    387100.753 ±  376481.454    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.305 ±       1.866  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      2812.059 ±    3518.689    B/op
> IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
> IndexedFetchAndFilter.run:·gc.time                               10000       170.000                    ms
> 
> IndexedFetchAndFilter.run                                        50000        95.316 ±      23.084   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       258.291 ±      30.111  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   3389472.432 ±  550602.162    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       250.887 ±     148.296  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   3308741.831 ± 2461004.974    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         5.218 ±      21.710  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000     69254.269 ±  282577.478    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         5.803 ±       2.885  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000     76523.177 ±   51120.227    B/op
> IndexedFetchAndFilter.run:·gc.count                              50000        21.000                counts
> IndexedFetchAndFilter.run:·gc.time                               50000      2775.000                    ms
> 
> IndexedFetchAndFilter.run                                       100000        41.572 ±      26.747   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       331.638 ±      50.813  MB/sec
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  12324183.188 ± 7537788.165    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       333.474 ±     116.673  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  12357891.009 ± 7285356.875    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000        10.296 ±      27.573  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    371782.085 ±  910072.098    B/op
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.815 ±      10.161  MB/sec
> IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    428555.780 ±  184610.507    B/op
> IndexedFetchAndFilter.run:·gc.count                             100000        49.000                counts
> IndexedFetchAndFilter.run:·gc.time                              100000      8602.000                    ms
> ```
> 
> 
> Thanks,
> 
> Bill Farner
> 
>


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
-----------------------------------------------------------

(Updated Jan. 31, 2018, 10:12 a.m.)


Review request for Aurora and Jordan Ly.


Changes
-------

Applied Stephan's suggestion, added a benchmark to validate.


Repository: aurora


Description
-------

Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).

This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.


Diffs (updated)
-----

  build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
  src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999ca9a5185e240ad729fefc6638476a4aecc 


Diff: https://reviews.apache.org/r/65303/diff/2/

Changes: https://reviews.apache.org/r/65303/diff/1-2/


Testing (updated)
-------

Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by ~2x (mod error margins), and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.

If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
```

Prior to this patch:
```console
Benchmark                                    (numTasks)         Score         Error   Units
FetchAll.run                                      10000       481.529 ±     184.751   ops/s
FetchAll.run:·gc.alloc.rate.norm                  10000    334970.771 ±   33544.960    B/op

FetchAll.run                                      50000        78.652 ±      20.869   ops/s
FetchAll.run:·gc.alloc.rate.norm                  50000   3991107.524 ±  701585.657    B/op

FetchAll.run                                     100000        38.371 ±      11.710   ops/s
FetchAll.run:·gc.alloc.rate.norm                 100000  13487028.139 ± 3369614.510    B/op

IndexedFetchAndFilter.run                         10000       296.557 ±     198.389   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    655319.005 ±   98138.360    B/op

IndexedFetchAndFilter.run                         50000        50.300 ±       5.818   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   6671548.381 ±  452020.849    B/op

IndexedFetchAndFilter.run                        100000        17.637 ±       3.739   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  28100173.458 ± 4486308.188    B/op
```

With this patch:
```console
Benchmark                                    (numTasks)         Score         Error   Units
FetchAll.run                                      10000      1653.572 ±     799.123   ops/s
FetchAll.run:·gc.alloc.rate.norm                  10000    155426.052 ±   10345.657    B/op

FetchAll.run                                      50000       210.454 ±      54.340   ops/s
FetchAll.run:·gc.alloc.rate.norm                  50000   1457560.505 ±  228631.547    B/op

FetchAll.run                                     100000        97.783 ±      42.130   ops/s
FetchAll.run:·gc.alloc.rate.norm                 100000   5096464.582 ± 1792136.191    B/op

IndexedFetchAndFilter.run                         10000       500.740 ±     210.675   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    370760.068 ±   36813.071    B/op

IndexedFetchAndFilter.run                         50000        95.316 ±      23.084   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   3389472.432 ±  550602.162    B/op

IndexedFetchAndFilter.run                        100000        41.572 ±      26.747   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  12324183.188 ± 7537788.165    B/op
```


**Full benchmark output**

Prior to this patch:
```console
Benchmark                                                   (numTasks)         Score         Error   Units
FetchAll.run                                                     10000       481.529 ±     184.751   ops/s
FetchAll.run:·gc.alloc.rate                                      10000       148.678 ±      42.890  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 10000    334970.771 ±   33544.960    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             10000       146.991 ±     135.486  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    332983.005 ±  347401.950    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         0.804 ±       1.823  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000      1784.147 ±    3904.546    B/op
FetchAll.run:·gc.count                                           10000         9.000                counts
FetchAll.run:·gc.time                                            10000       143.000                    ms

FetchAll.run                                                     50000        78.652 ±      20.869   ops/s
FetchAll.run:·gc.alloc.rate                                      50000       250.771 ±      34.190  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 50000   3991107.524 ±  701585.657    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             50000       250.131 ±     144.214  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   3999003.844 ± 2907196.744    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.937 ±      20.180  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000    111462.141 ±  322286.235    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         6.056 ±       4.371  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     96534.909 ±   73072.098    B/op
FetchAll.run:·gc.count                                           50000        22.000                counts
FetchAll.run:·gc.time                                            50000      3222.000                    ms

FetchAll.run                                                    100000        38.371 ±      11.710   ops/s
FetchAll.run:·gc.alloc.rate                                     100000       343.280 ±      63.923  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                100000  13487028.139 ± 3369614.510    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                            100000       343.804 ±     147.542  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000  13524848.537 ± 7132093.384    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                               100000         7.251 ±      26.847  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    286256.200 ± 1043939.286    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        11.448 ±      16.645  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    440924.671 ±  539369.420    B/op
FetchAll.run:·gc.count                                          100000        53.000                counts
FetchAll.run:·gc.time                                           100000      8664.000                    ms

IndexedFetchAndFilter.run                                        10000       296.557 ±     198.389   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       178.657 ±      96.891  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    655319.005 ±   98138.360    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       181.829 ±     115.598  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    669894.533 ±  362265.228    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.017 ±       2.764  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      3509.419 ±    8933.232    B/op
IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
IndexedFetchAndFilter.run:·gc.time                               10000       174.000                    ms

IndexedFetchAndFilter.run                                        50000        50.300 ±       5.818   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       271.042 ±      35.522  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   6671548.381 ±  452020.849    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       278.006 ±     188.990  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   6835542.988 ± 4208216.383    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         7.836 ±      22.513  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000    194944.435 ±  557587.333    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         6.063 ±       2.432  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000    148960.731 ±   42282.391    B/op
IndexedFetchAndFilter.run:·gc.count                              50000        24.000                counts
IndexedFetchAndFilter.run:·gc.time                               50000      3059.000                    ms

IndexedFetchAndFilter.run                                       100000        17.637 ±       3.739   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       336.740 ±      69.527  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  28100173.458 ± 4486308.188    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       336.494 ±      88.830  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  28063164.240 ± 4888826.638    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000         8.028 ±      37.263  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    672808.968 ± 2924497.150    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.351 ±      17.881  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    930977.737 ± 1252367.282    B/op
IndexedFetchAndFilter.run:·gc.count                             100000        47.000                counts
IndexedFetchAndFilter.run:·gc.time                              100000      7245.000                    ms
```

With this patch:
```console
Benchmark                                                   (numTasks)         Score         Error   Units
FetchAll.run                                                     10000      1653.572 ±     799.123   ops/s
FetchAll.run:·gc.alloc.rate                                      10000       236.532 ±      98.709  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 10000    155426.052 ±   10345.657    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             10000       247.755 ±      55.490  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    163873.606 ±   59092.580    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         1.328 ±       1.540  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000       883.684 ±    1120.393    B/op
FetchAll.run:·gc.count                                           10000        18.000                counts
FetchAll.run:·gc.time                                            10000       191.000                    ms

FetchAll.run                                                     50000       210.454 ±      54.340   ops/s
FetchAll.run:·gc.alloc.rate                                      50000       248.216 ±      15.196  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 50000   1457560.505 ±  228631.547    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             50000       239.336 ±     174.541  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   1409078.860 ± 1141224.117    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                                50000         6.504 ±      17.220  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000     38644.950 ±  105262.889    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         5.994 ±       4.160  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     35246.411 ±   25958.915    B/op
FetchAll.run:·gc.count                                           50000        21.000                counts
FetchAll.run:·gc.time                                            50000      2875.000                    ms

FetchAll.run                                                    100000        97.783 ±      42.130   ops/s
FetchAll.run:·gc.alloc.rate                                     100000       336.209 ±      80.094  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                100000   5096464.582 ± 1792136.191    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                            100000       342.190 ±     144.180  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000   5167420.986 ± 1634774.992    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                               100000        11.783 ±      36.073  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    182947.872 ±  525172.467    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        12.299 ±      13.795  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    184635.309 ±  199254.266    B/op
FetchAll.run:·gc.count                                          100000        46.000                counts
FetchAll.run:·gc.time                                           100000      7778.000                    ms

IndexedFetchAndFilter.run                                        10000       500.740 ±     210.675   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       171.305 ±      57.968  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    370760.068 ±   36813.071    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       176.084 ±     103.579  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    387100.753 ±  376481.454    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         1.305 ±       1.866  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      2812.059 ±    3518.689    B/op
IndexedFetchAndFilter.run:·gc.count                              10000        11.000                counts
IndexedFetchAndFilter.run:·gc.time                               10000       170.000                    ms

IndexedFetchAndFilter.run                                        50000        95.316 ±      23.084   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       258.291 ±      30.111  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   3389472.432 ±  550602.162    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       250.887 ±     148.296  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   3308741.831 ± 2461004.974    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         5.218 ±      21.710  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000     69254.269 ±  282577.478    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         5.803 ±       2.885  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000     76523.177 ±   51120.227    B/op
IndexedFetchAndFilter.run:·gc.count                              50000        21.000                counts
IndexedFetchAndFilter.run:·gc.time                               50000      2775.000                    ms

IndexedFetchAndFilter.run                                       100000        41.572 ±      26.747   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       331.638 ±      50.813  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  12324183.188 ± 7537788.165    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       333.474 ±     116.673  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  12357891.009 ± 7285356.875    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000        10.296 ±      27.573  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    371782.085 ±  910072.098    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        11.815 ±      10.161  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    428555.780 ±  184610.507    B/op
IndexedFetchAndFilter.run:·gc.count                             100000        49.000                counts
IndexedFetchAndFilter.run:·gc.time                              100000      8602.000                    ms
```


Thanks,

Bill Farner


Re: Review Request 65303: Improve performance of MemTaskStore queries

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196085
-----------------------------------------------------------



Master (dbe7137) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Jan. 24, 2018, 12:32 a.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2018, 12:32 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style rather than functional.  I arrived at this result after running benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), providing insight into the heap impact of changes.  I started this change with a heap profiler as the primary motivation, and ended up using it to guide this improvement.
> 
> 
> Diffs
> -----
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java b59999c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> -------
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at the bottom, but here is an abridged version.  It shows that task fetch throughput universally improves by at least 2x, and heap allocation reduces by at least the same factor.  Overall GC time increases slightly as captured here, but the stddev was anecdotally high across runs.  I chose to present this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more granular allocation data.  Please note that the `norm` stats are normalized for the number of operations, which i find to be the most useful measure for validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is allocation/GC-bound (figure the allocation pressure "ceiling" for your configuration!), and normalized counters to see the more precise benchmark behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark                 (numTasks)    Score         Error   Units
> 
>                           10000      1066.632 ±     266.924   ops/s
> ·gc.alloc.rate.norm       10000    289227.205 ±    8888.051    B/op
> ·gc.count                 10000        24.000                counts
> ·gc.time                  10000       103.000                    ms
> 
>                           50000        84.444 ±      32.620   ops/s
> ·gc.alloc.rate.norm       50000   3831210.967 ±  840844.713    B/op
> ·gc.count                 50000        21.000                counts
> ·gc.time                  50000      1407.000                    ms
> 
>                          100000        38.645 ±      20.557   ops/s
> ·gc.alloc.rate.norm      100000  13555430.931 ± 6787344.701    B/op
> ·gc.count                100000        52.000                counts
> ·gc.time                 100000      3304.000                    ms
> ```
> 
> With this patch:
> ```console
> Benchmark               (numTasks)   Score         Error   Units
> 
>                          10000    2851.288 ±     481.472   ops/s
> ·gc.alloc.rate.norm      10000  145281.908 ±    2223.621    B/op
> ·gc.count                10000      39.000                counts
> ·gc.time                 10000     130.000                    ms
> 
>                          50000     297.380 ±      35.681   ops/s
> ·gc.alloc.rate.norm      50000 1183791.866 ±   77487.278    B/op
> ·gc.count                50000      25.000                counts
> ·gc.time                 50000    1821.000                    ms
> 
>                         100000     122.211 ±      81.618   ops/s                        
> ·gc.alloc.rate.norm     100000 4364450.973 ± 2856586.882    B/op
> ·gc.count               100000      52.000                counts
> ·gc.time                100000    3698.000                    ms
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark                                                                        (numTasks)   Mode  Cnt         Score         Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5      1066.632 ±     266.924   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5       286.647 ±      62.371  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5    289227.205 ±    8888.051    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5       291.263 ±     159.266  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5    294277.617 ±  166069.041    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5         1.218 ±       1.029  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      1220.540 ±     708.455    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5        24.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5       103.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt                NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5        84.444 ±      32.620   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5       267.018 ±      27.389  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5   3831210.967 ±  840844.713    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5       258.565 ±     149.845  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5   3707563.530 ± 2262218.319    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5         4.487 ±      18.053  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5     63848.757 ±  264487.651    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5         6.034 ±       3.651  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5     87385.381 ±   75159.508    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5        21.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5      1407.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt                NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5        38.645 ±      20.557   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5       381.453 ±      63.491  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  13555430.931 ± 6787344.701    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5       389.816 ±     123.320  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  13823571.735 ± 6642604.600    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5         1.947 ±      16.766  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5     92330.241 ±  794991.221    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5        11.934 ±      18.565  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5    414896.926 ±  551658.959    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5        52.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5      3304.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt                NaN                   ---
> ```
> 
> With this patch:
> ```console
> Benchmark                                                                        (numTasks)   Mode  Cnt        Score         Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        10000  thrpt    5     2851.288 ±     481.472   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         10000  thrpt    5      384.383 ±      58.697  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    10000  thrpt    5   145281.908 ±    2223.621    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                10000  thrpt    5      388.851 ±     114.120  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           10000  thrpt    5   147171.915 ±   50430.527    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            10000  thrpt    5        1.264 ±       0.980  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       10000  thrpt    5      479.848 ±     420.881    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              10000  thrpt    5       39.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               10000  thrpt    5      130.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 10000  thrpt               NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                        50000  thrpt    5      297.380 ±      35.681   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                         50000  thrpt    5      288.839 ±      19.035  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                    50000  thrpt    5  1183791.866 ±   77487.278    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space                50000  thrpt    5      296.587 ±     125.148  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm           50000  thrpt    5  1214497.578 ±  457975.153    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                   50000  thrpt    5        6.942 ±      23.492  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm              50000  thrpt    5    28880.733 ±   99593.659    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space            50000  thrpt    5        6.440 ±       3.887  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm       50000  thrpt    5    26354.762 ±   14876.857    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                              50000  thrpt    5       25.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                               50000  thrpt    5     1821.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                 50000  thrpt               NaN                   ---
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                       100000  thrpt    5      122.211 ±      81.618   ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                        100000  thrpt    5      377.099 ±      77.146  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm                   100000  thrpt    5  4364450.973 ± 2856586.882    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space               100000  thrpt    5      381.570 ±     119.260  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm          100000  thrpt    5  4415115.428 ± 3000198.792    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen                  100000  thrpt    5        1.914 ±      16.479  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm             100000  thrpt    5    31833.830 ±  274098.881    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space           100000  thrpt    5       12.117 ±      20.931  MB/sec
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm      100000  thrpt    5   136001.918 ±  196459.666    B/op
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                             100000  thrpt    5       52.000                counts
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                              100000  thrpt    5     3698.000                    ms
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                                100000  thrpt               NaN                   ---
> ```
> 
> 
> Thanks,
> 
> Bill Farner
> 
>