You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/04/21 15:30:53 UTC
[GitHub] [kafka] tombentley commented on a change in pull request #8417: KAFKA-8955: Add an AbstractResponse#errorCounts(Stream) and tidy

tombentley commented on a change in pull request #8417:
URL: https://github.com/apache/kafka/pull/8417#discussion_r412287986



##########
File path: clients/src/main/java/org/apache/kafka/common/requests/AbstractResponse.java
##########
@@ -57,6 +59,10 @@ public ByteBuffer serialize(ApiKeys apiKey, short version, int correlationId) {
         return Collections.singletonMap(error, 1);
     }
 
+    protected Map<Errors, Integer> errorCounts(Stream<Errors> errors) {
+        return errors.collect(Collectors.groupingBy(e -> e, Collectors.summingInt(e -> 1)));

Review comment:
       @ijuma I wrote a small microbenchmark (committed and reverted if you want to take a look) to compare performance. I picked `TxnOffsetCommitResponse` (more or less at random, but since it has two levels of nesting for topics and partitions it has a double loop) with an unrepresentitively large number of topics and partitions. 
   
   Using the old code (`errorCounts(errors())`), I got this test run:
   
   ```
   {NONE=200000000}
   run 0, times=20000 took 50172790515ns, 398.62243647820753ops/s
   {NONE=200000000}
   run 1, times=20000 took 49210843555ns, 406.4144923191004ops/s
   {NONE=200000000}
   run 2, times=20000 took 49366208092ns, 405.1354311582437ops/s
   {NONE=200000000}
   run 3, times=20000 took 48628565963ns, 411.2808922890589ops/s
   {NONE=200000000}
   run 4, times=20000 took 48727847017ns, 410.4429237971969ops/s
   ```
   
   I aborted early because it was pretty slow. You can see the JIT is improving the performance a little over time.
   
   Using the streaming approach I got this test run:
   
   ```
   {NONE=200000000}
   run 0, times=20000 took 6984797524ns, 2863.36145482805ops/s
   {NONE=200000000}
   run 1, times=20000 took 6566254988ns, 3045.8762318171493ops/s
   {NONE=200000000}
   run 2, times=20000 took 6553362923ns, 3051.868214074797ops/s
   {NONE=200000000}
   run 3, times=20000 took 6259904961ns, 3194.936684279159ops/s
   {NONE=200000000}
   run 4, times=20000 took 6675450385ns, 2996.052527772626ops/s
   {NONE=200000000}
   run 5, times=20000 took 6949088789ns, 2878.075184714696ops/s
   {NONE=200000000}
   run 6, times=20000 took 6045899635ns, 3308.02712704972ops/s
   {NONE=200000000}
   run 7, times=20000 took 5845348664ns, 3421.5238730197325ops/s
   {NONE=200000000}
   run 8, times=20000 took 6370088159ns, 3139.6739732311135ops/s
   {NONE=200000000}
   run 9, times=20000 took 6799792822ns, 2941.2660831800854ops/s
   {NONE=200000000}
   run 10, times=20000 took 6641092713ns, 3011.5525959831602ops/s
   {NONE=200000000}
   run 11, times=20000 took 6621610314ns, 3020.4133211696576ops/s
   {NONE=200000000}
   run 12, times=20000 took 6339235087ns, 3154.9547738045576ops/s
   {NONE=200000000}
   run 13, times=20000 took 6461046814ns, 3095.473624593366ops/s
   {NONE=200000000}
   run 14, times=20000 took 6585386195ns, 3037.027656052296ops/s
   {NONE=200000000}
   run 15, times=20000 took 6565973868ns, 3046.00664000084ops/s
   {NONE=200000000}
   run 16, times=20000 took 6585253169ns, 3037.0890058031114ops/s
   {NONE=200000000}
   run 17, times=20000 took 6618664562ns, 3021.7576087518905ops/s
   {NONE=200000000}
   run 18, times=20000 took 6592603829ns, 3033.7026945290754ops/s
   {NONE=200000000}
   run 19, times=20000 took 6567525693ns, 3045.2869063484604ops/s
   ```
   
   This is about 7½ times faster.
   
   Out of interest I also rewote the `TxnOffsetCommitResponse.errorCounts()` to use a `forEach()`:
   
   ```
   {NONE=200000000}
   run 0, times=20000 took 6038137472ns, 3312.279671131012ops/s
   {NONE=200000000}
   run 1, times=20000 took 5642135982ns, 3544.7568197231726ops/s
   {NONE=200000000}
   run 2, times=20000 took 5551109425ns, 3602.883400195268ops/s
   {NONE=200000000}
   run 3, times=20000 took 5511950192ns, 3628.4798126492215ops/s
   {NONE=200000000}
   run 4, times=20000 took 5180664883ns, 3860.5083423999577ops/s
   {NONE=200000000}
   run 5, times=20000 took 4571569172ns, 4374.865444997799ops/s
   {NONE=200000000}
   run 6, times=20000 took 5472660241ns, 3654.529811692726ops/s
   {NONE=200000000}
   run 7, times=20000 took 5499370051ns, 3636.780179279483ops/s
   {NONE=200000000}
   run 8, times=20000 took 5523721146ns, 3620.7475850736946ops/s
   {NONE=200000000}
   run 9, times=20000 took 4691001711ns, 4263.481710761627ops/s
   {NONE=200000000}
   run 10, times=20000 took 5495174831ns, 3639.5566319698773ops/s
   {NONE=200000000}
   run 11, times=20000 took 5676661773ns, 3523.1974001210237ops/s
   {NONE=200000000}
   run 12, times=20000 took 5605106974ns, 3568.174540249194ops/s
   {NONE=200000000}
   run 13, times=20000 took 5577604479ns, 3585.768778568137ops/s
   {NONE=200000000}
   run 14, times=20000 took 5544332242ns, 3607.287429222572ops/s
   {NONE=200000000}
   run 15, times=20000 took 5502312660ns, 3634.835247621134ops/s
   {NONE=200000000}
   run 16, times=20000 took 5528323376ns, 3617.7333776865516ops/s
   {NONE=200000000}
   run 17, times=20000 took 5528944581ns, 3617.3269069704934ops/s
   {NONE=200000000}
   run 18, times=20000 took 5496460628ns, 3638.705223888306ops/s
   {NONE=200000000}
   run 19, times=20000 took 5511532751ns, 3628.754632070588ops/s
   ```
   
   There was some variability between different runs of this benchmark, the above is one of the better ones, the less good ones had performace similar to the Stream case.
   
   Conclusions: 
   
   1. `errorCounts(Stream)` is an improvement over `errorCounts(Collection)`.
   2. If we're really interested in getting the best performance then we should probably drop `errorCounts(Stream)` and use `forEach` everywhere. The drawback is it's a little more verbose than the `errorCounts(Stream)` approach.
   
   Thoughts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org