You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/04 23:59:06 UTC
[GitHub] [incubator-druid] jon-wei opened a new pull request #8466: Speed up
StringDimensionIndexer.estimateEncodedKeyComponentSize
jon-wei opened a new pull request #8466: Speed up StringDimensionIndexer.estimateEncodedKeyComponentSize
URL: https://github.com/apache/incubator-druid/pull/8466
This PR changes the implementation of `StringDimensionIndexer.estimateEncodedKeyComponentSize` for a performance benefit.
From flame graphs captured of running ingestion tasks, this method can be a substantial component of total ingestion workload, e.g. 15% of IncrementalIndex.add time.
![Screen Shot 2019-09-04 at 4 50 42 PM](https://user-images.githubusercontent.com/8729063/64301739-0ffd8500-cf35-11e9-94b9-01f47a9dab0e.png)
Benchmarks for the new implementation are shown below (the benchmark class is included in the PR):
```
Benchmark (cardinality) (rowSize) Mode Cnt Score Error Units
StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize 10000 1 avgt 10 0.101 ± 0.003 us/op
StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize2 10000 1 avgt 10 0.024 ± 0.001 us/op
Benchmark (cardinality) (rowSize) Mode Cnt Score Error Units
StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize 10000 8 avgt 10 0.402 ± 0.003 us/op
StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize2 10000 8 avgt 10 0.174 ± 0.002 us/op
Benchmark (cardinality) (rowSize) Mode Cnt Score Error Units
StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize 10000 4 avgt 10 0.234 ± 0.006 us/op
StringDimensionIndexerBenchmark.estimateEncodedKeyComponentSize2 10000 4 avgt 10 0.092 ± 0.003 us/op
```
This PR has:
- [x] been self-reviewed.
- [x] been tested in a test Druid cluster.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org