You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/03/10 20:25:32 UTC

[GitHub] merlimat opened a new pull request #1245: Collect Prometheus latency stats using DataSketches

merlimat opened a new pull request #1245: Collect Prometheus latency stats using DataSketches
URL: https://github.com/apache/bookkeeper/pull/1245
 
 
   The implementation for collecting and estimating the latency quantiles in the Prometheus Java client library is very slow and it is impacting the the bookie performance.
   
   I have added a micro-benchmark that tests our various stats providers. These tests are simulating 16 concurrent threads updating the stats.
   
   #### Counter increment
   ```
   Benchmark                              (statsProvider)   Mode  Cnt    Score      Error   Units
   StatsLoggerBenchmark.counterIncrement       Prometheus  thrpt    3  391.882 ?  786.987  ops/us
   StatsLoggerBenchmark.counterIncrement         Codahale  thrpt    3  449.341 ? 1337.736  ops/us
   StatsLoggerBenchmark.counterIncrement          Twitter  thrpt    3   43.354 ?    9.331  ops/us
   StatsLoggerBenchmark.counterIncrement          Ostrich  thrpt    3   43.790 ?    1.332  ops/us
   ```
   
   Here prometheus is fast, though not as fast as a simple `LongAdder` which can reach ~500M ops/sec.
   
   #### Latency quantiles
   
   ```
   Benchmark                              (statsProvider)   Mode  Cnt    Score      Error   Units
   StatsLoggerBenchmark.recordLatency          Prometheus  thrpt    3    0.255 ?    0.667  ops/us
   StatsLoggerBenchmark.recordLatency            Codahale  thrpt    3    4.963 ?    1.671  ops/us
   StatsLoggerBenchmark.recordLatency             Twitter  thrpt    3    4.793 ?    0.766  ops/us
   StatsLoggerBenchmark.recordLatency             Ostrich  thrpt    3    2.473 ?    6.394  ops/us
   ```
   
   Here is where Prometheus is super-slow: 250K ops/second max, mostly due to contention and GC pressure.
   
   ## Modification
   
   I have re-adapted a stats collector I had done in the Yahoo branch: 
   https://github.com/yahoo/bookkeeper/tree/yahoo-4.3/bookkeeper-stats-providers/datasketches-metrics-provider/src/main/java/org/apache/bokkeeper/stats/datasketches
   
   This is based on the [DataSketches](https://datasketches.github.io/) library to have very fast and lightweight quantile estimates (along with a number of other operations), plus some tricks to avoid concurrency issues by using thread local collectors and aggregating when needed in background.
   
   After the change, the throughput is 150x the original prometheus collector.
   
   ```
   Benchmark                              (statsProvider)   Mode  Cnt    Score     Error   Units
   StatsLoggerBenchmark.counterIncrement       Prometheus  thrpt    3  531.906 ? 129.602  ops/us
   StatsLoggerBenchmark.recordLatency          Prometheus  thrpt    3   27.538 ?   5.893  ops/us
   ```
   
   It is worth noting that the main bottle-neck in the `recordLatency` test is now the `System.nanoTime()` 
   call used to pass different samples to the stat logger.
   
   `System.nanoTime()` is not super fast: 
   
   ```
   Benchmark                               (statsProvider)   Mode  Cnt    Score     Error   Units
   StatsLoggerBenchmark.currentTimeMillis              N/A  thrpt    3  161.502 ? 267.238  ops/us
   StatsLoggerBenchmark.nanoTime                       N/A  thrpt    3   32.822 ?   2.256  ops/us
   ```
   
   
   By removing the `System.nanoTime()` call from the benchmark, the Prometheus+DataSketches collector results in:
   
   
   ```
   Benchmark                               (statsProvider)   Mode  Cnt    Score     Error   Units
   StatsLoggerBenchmark.recordLatency           Prometheus  thrpt    3  108.542 ?  31.848  ops/us 
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services