You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/04 23:59:06 UTC

[GitHub] [incubator-druid] jon-wei opened a new pull request #8466: Speed up StringDimensionIndexer.estimateEncodedKeyComponentSize

jon-wei opened a new pull request #8466: Speed up StringDimensionIndexer.estimateEncodedKeyComponentSize
URL: https://github.com/apache/incubator-druid/pull/8466
 
 
   This PR changes the implementation of `StringDimensionIndexer.estimateEncodedKeyComponentSize` for a performance benefit.
   
   From flame graphs captured of running ingestion tasks, this method can be a substantial component of total ingestion workload, e.g. 15% of IncrementalIndex.add time.
   
   ![Screen Shot 2019-09-04 at 4 50 42 PM](https://user-images.githubusercontent.com/8729063/64301739-0ffd8500-cf35-11e9-94b9-01f47a9dab0e.png)
   
   Benchmarks for the new implementation are shown below (the benchmark class is included in the PR):
   
   ```
   Benchmark                                                         (cardinality)  (rowSize)  Mode  Cnt  Score    Error  Units
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize           10000          1  avgt   10  0.101 ±  0.003  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize2          10000          1  avgt   10  0.024 ±  0.001  us/op
   
   Benchmark                                                         (cardinality)  (rowSize)  Mode  Cnt  Score   Error  Units
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize           10000          8  avgt   10  0.402 ± 0.003  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize2          10000          8  avgt   10  0.174 ± 0.002  us/op
   
   Benchmark                                                         (cardinality)  (rowSize)  Mode  Cnt  Score   Error  Units
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize           10000          4  avgt   10  0.234 ± 0.006  us/op
   StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize2          10000          4  avgt   10  0.092 ± 0.003  us/op
   ```
   
   
   This PR has:
   - [x] been self-reviewed.
   - [x] been tested in a test Druid cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org