You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/02 00:58:28 UTC

[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-409771369
 
 
   I did some digging and the more straightforward solution to this problem is to just store the rows sorted in `PlainFactsHolder` similar to how `RollupFactsHolder` is currently doing, changing from `ConcurrentMap<Long, Deque<IncrementalIndexRow>>` to `ConcurrentMap<IncrementalIndexRow, Deque<IncrementalIndexRow>>` and using the `IncrementalIndexRowComparator`. I think the alternative is finding all calls to `FactsHolder.iterator` and `FactsHolder.keySet` used for persist and replacing those with a version that sorts `IncrementalIndexRow` with the comparator (unless I'm missing another simpler place where this could be done). 
   
   I did some benchmarking and there does appear to be some cost to doing the sorting up front, I'm not sure on the range of what is an acceptable amount of overhead.
   
   ```
   Benchmark                                          (rollupOption)  Mode  Cnt   Score   Error  Units
   IncrementalIndexRowTypeBenchmark.normalFloats              rollup  avgt   40  19.965 ± 1.780  us/op
   IncrementalIndexRowTypeBenchmark.normalFloats           no-rollup  avgt   40  10.596 ± 1.055  us/op
   IncrementalIndexRowTypeBenchmark.normalFloats   ordered-no-rollup  avgt   40  21.925 ± 1.323  us/op
   IncrementalIndexRowTypeBenchmark.normalLongs               rollup  avgt   40  19.108 ± 1.286  us/op
   IncrementalIndexRowTypeBenchmark.normalLongs            no-rollup  avgt   40  10.107 ± 1.042  us/op
   IncrementalIndexRowTypeBenchmark.normalLongs    ordered-no-rollup  avgt   40  21.967 ± 1.406  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings             rollup  avgt   40  20.489 ± 2.442  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings          no-rollup  avgt   40   9.352 ± 0.105  us/op
   IncrementalIndexRowTypeBenchmark.normalStrings  ordered-no-rollup  avgt   40  20.986 ± 0.508  us/op
   ```
   
   ```
   Benchmark                                  (numSegments)     (rollupSchema)  (rowsPerSegment)        (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup            750000                 basic.A           10  avgt   25   950647.742 ± 20969.530  us/op
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup            750000       basic.numericSort           10  avgt   25   230487.526 ± 29340.615  us/op
   TopNBenchmark.querySingleIncrementalIndex              1          no-rollup            750000  basic.alphanumericSort           10  avgt   25   218782.138 ±  6203.484  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup            750000                 basic.A           10  avgt   25   945842.924 ± 12074.015  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup            750000       basic.numericSort           10  avgt   25   222019.610 ±  3486.365  us/op
   TopNBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup            750000  basic.alphanumericSort           10  avgt   25   223015.114 ±  3130.184  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup            750000                 basic.A           10  avgt   25  1347085.823 ± 12655.001  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup            750000       basic.numericSort           10  avgt   25   204926.129 ±  4846.150  us/op
   TopNBenchmark.querySingleIncrementalIndex              1             rollup            750000  basic.alphanumericSort           10  avgt   25   201050.213 ±  6559.034  us/op
   ```
   ```
   Benchmark                                        (numSegments)     (rollupSchema)  (rowsPerSegment)              (schemaAndQuery)  Mode  Cnt        Score       Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex              1          no-rollup            750000                       basic.A  avgt   25   921919.453 ± 25357.840  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1          no-rollup            750000       basic.timeFilterNumeric  avgt   25    69240.969 ±  1403.393  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1          no-rollup            750000  basic.timeFilterAlphanumeric  avgt   25   152974.422 ±  2181.950  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1          no-rollup            750000    basic.timeFilterByInterval  avgt   25    16752.936 ±   406.768  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup            750000                       basic.A  avgt   25   906129.041 ± 19497.575  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup            750000       basic.timeFilterNumeric  avgt   25    66989.537 ±  1249.002  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup            750000  basic.timeFilterAlphanumeric  avgt   25   153816.935 ±  2080.406  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1  ordered-no-rollup            750000    basic.timeFilterByInterval  avgt   25    16650.825 ±   271.827  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             rollup            750000                       basic.A  avgt   25  1410127.820 ± 19685.994  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             rollup            750000       basic.timeFilterNumeric  avgt   25    48694.028 ±   865.701  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             rollup            750000  basic.timeFilterAlphanumeric  avgt   25   138381.904 ±  1284.566  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex              1             rollup            750000    basic.timeFilterByInterval  avgt   25    14172.496 ±  1020.567  us/op
   ```
   ```
   Benchmark                                     (defaultStrategy)  (initialBuckets)  (numProcessingThreads)  (numSegments)  (queryGranularity)     (rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt      Score      Error  Units
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 all          no-rollup            100000           basic.A  avgt   25  53761.212 ± 2854.750  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 all          no-rollup            100000      basic.nested  avgt   25  71957.267 ± 3881.153  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 all  ordered-no-rollup            100000           basic.A  avgt   25  63418.312 ± 9735.523  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 all  ordered-no-rollup            100000      basic.nested  avgt   25  79107.369 ± 3597.208  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 all             rollup            100000           basic.A  avgt   25  57728.209 ± 3683.978  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 all             rollup            100000      basic.nested  avgt   25  77225.014 ± 4820.121  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 day          no-rollup            100000           basic.A  avgt   25  60686.368 ± 3545.676  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 day          no-rollup            100000      basic.nested  avgt   25  73173.081 ± 3365.438  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 day  ordered-no-rollup            100000           basic.A  avgt   25  67065.055 ± 2742.212  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 day  ordered-no-rollup            100000      basic.nested  avgt   25  78658.129 ± 4969.871  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 day             rollup            100000           basic.A  avgt   25  62835.895 ± 3471.298  us/op
   GroupByBenchmark.querySingleIncrementalIndex                 v2                -1                       4              4                 day             rollup            100000      basic.nested  avgt   25  78779.644 ± 5321.856  us/op
   ```
   The difference is most apparent on adding rows where performance is similar to performance of rollup enabled, and in group by queries, where performance is slightly slower than if rollup were enabled. I would also expect some slight increased memory usage with this approach due to increased number of shorter length `Deque` objects from a larger number of key entries in the facts holder map.
   
   I also haven't done any measurement on size differences of persisted segments yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org