You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/02 00:58:28 UTC
[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is
disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-409771369
I did some digging and the more straightforward solution to this problem is to just store the rows sorted in `PlainFactsHolder` similar to how `RollupFactsHolder` is currently doing, changing from `ConcurrentMap<Long, Deque<IncrementalIndexRow>>` to `ConcurrentMap<IncrementalIndexRow, Deque<IncrementalIndexRow>>` and using the `IncrementalIndexRowComparator`. I think the alternative is finding all calls to `FactsHolder.iterator` and `FactsHolder.keySet` used for persist and replacing those with a version that sorts `IncrementalIndexRow` with the comparator (unless I'm missing another simpler place where this could be done).
I did some benchmarking and there does appear to be some cost to doing the sorting up front, I'm not sure on the range of what is an acceptable amount of overhead.
```
Benchmark (rollupOption) Mode Cnt Score Error Units
IncrementalIndexRowTypeBenchmark.normalFloats rollup avgt 40 19.965 ± 1.780 us/op
IncrementalIndexRowTypeBenchmark.normalFloats no-rollup avgt 40 10.596 ± 1.055 us/op
IncrementalIndexRowTypeBenchmark.normalFloats ordered-no-rollup avgt 40 21.925 ± 1.323 us/op
IncrementalIndexRowTypeBenchmark.normalLongs rollup avgt 40 19.108 ± 1.286 us/op
IncrementalIndexRowTypeBenchmark.normalLongs no-rollup avgt 40 10.107 ± 1.042 us/op
IncrementalIndexRowTypeBenchmark.normalLongs ordered-no-rollup avgt 40 21.967 ± 1.406 us/op
IncrementalIndexRowTypeBenchmark.normalStrings rollup avgt 40 20.489 ± 2.442 us/op
IncrementalIndexRowTypeBenchmark.normalStrings no-rollup avgt 40 9.352 ± 0.105 us/op
IncrementalIndexRowTypeBenchmark.normalStrings ordered-no-rollup avgt 40 20.986 ± 0.508 us/op
```
```
Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) (threshold) Mode Cnt Score Error Units
TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 750000 basic.A 10 avgt 25 950647.742 ± 20969.530 us/op
TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 750000 basic.numericSort 10 avgt 25 230487.526 ± 29340.615 us/op
TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 750000 basic.alphanumericSort 10 avgt 25 218782.138 ± 6203.484 us/op
TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 750000 basic.A 10 avgt 25 945842.924 ± 12074.015 us/op
TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 750000 basic.numericSort 10 avgt 25 222019.610 ± 3486.365 us/op
TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 750000 basic.alphanumericSort 10 avgt 25 223015.114 ± 3130.184 us/op
TopNBenchmark.querySingleIncrementalIndex 1 rollup 750000 basic.A 10 avgt 25 1347085.823 ± 12655.001 us/op
TopNBenchmark.querySingleIncrementalIndex 1 rollup 750000 basic.numericSort 10 avgt 25 204926.129 ± 4846.150 us/op
TopNBenchmark.querySingleIncrementalIndex 1 rollup 750000 basic.alphanumericSort 10 avgt 25 201050.213 ± 6559.034 us/op
```
```
Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units
TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup 750000 basic.A avgt 25 921919.453 ± 25357.840 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup 750000 basic.timeFilterNumeric avgt 25 69240.969 ± 1403.393 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup 750000 basic.timeFilterAlphanumeric avgt 25 152974.422 ± 2181.950 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup 750000 basic.timeFilterByInterval avgt 25 16752.936 ± 406.768 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 750000 basic.A avgt 25 906129.041 ± 19497.575 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 750000 basic.timeFilterNumeric avgt 25 66989.537 ± 1249.002 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 750000 basic.timeFilterAlphanumeric avgt 25 153816.935 ± 2080.406 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 750000 basic.timeFilterByInterval avgt 25 16650.825 ± 271.827 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup 750000 basic.A avgt 25 1410127.820 ± 19685.994 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup 750000 basic.timeFilterNumeric avgt 25 48694.028 ± 865.701 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup 750000 basic.timeFilterAlphanumeric avgt 25 138381.904 ± 1284.566 us/op
TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup 750000 basic.timeFilterByInterval avgt 25 14172.496 ± 1020.567 us/op
```
```
Benchmark (defaultStrategy) (initialBuckets) (numProcessingThreads) (numSegments) (queryGranularity) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 all no-rollup 100000 basic.A avgt 25 53761.212 ± 2854.750 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 all no-rollup 100000 basic.nested avgt 25 71957.267 ± 3881.153 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 all ordered-no-rollup 100000 basic.A avgt 25 63418.312 ± 9735.523 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 all ordered-no-rollup 100000 basic.nested avgt 25 79107.369 ± 3597.208 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 all rollup 100000 basic.A avgt 25 57728.209 ± 3683.978 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 all rollup 100000 basic.nested avgt 25 77225.014 ± 4820.121 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 day no-rollup 100000 basic.A avgt 25 60686.368 ± 3545.676 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 day no-rollup 100000 basic.nested avgt 25 73173.081 ± 3365.438 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 day ordered-no-rollup 100000 basic.A avgt 25 67065.055 ± 2742.212 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 day ordered-no-rollup 100000 basic.nested avgt 25 78658.129 ± 4969.871 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 day rollup 100000 basic.A avgt 25 62835.895 ± 3471.298 us/op
GroupByBenchmark.querySingleIncrementalIndex v2 -1 4 4 day rollup 100000 basic.nested avgt 25 78779.644 ± 5321.856 us/op
```
The difference is most apparent on adding rows where performance is similar to performance of rollup enabled, and in group by queries, where performance is slightly slower than if rollup were enabled. I would also expect some slight increased memory usage with this approach due to increased number of shorter length `Deque` objects from a larger number of key entries in the facts holder map.
I also haven't done any measurement on size differences of persisted segments yet.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org