You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/04 05:12:34 UTC

[GitHub] clintropolis opened a new pull request #6107: Order rows during incremental index persist when rollup is disabled.

clintropolis opened a new pull request #6107: Order rows during incremental index persist when rollup is disabled.
URL: https://github.com/apache/incubator-druid/pull/6107
 
 
   Resolves #6066 by modifying the `FactsHolder` interface to include a new method `Iterable<IncrementalIndexRow> getPersistIterable()` and using this when persisting incremental indexes. Added an additional benchmark generator schema with 4 low cardinality dimensions to enable testing this scenario.
   
   Before this patch:
   ```
   Benchmark                        (rollup)  (rollupOpportunity)  (rowsPerSegment)  (schema)  Mode  Cnt       Score       Error  Units
   IndexPersistBenchmark.persistV9      true                 none             75000     rollo  avgt   25  429409.821 ± 17771.526  us/op
   IndexPersistBenchmark.persistV9      true             moderate             75000     rollo  avgt   25   57578.929 ±  2650.596  us/op
   IndexPersistBenchmark.persistV9      true                 high             75000     rollo  avgt   25   11023.976 ±   461.142  us/op
   IndexPersistBenchmark.persistV9     false                 none             75000     rollo  avgt   25  414289.365 ± 16384.902  us/op
   IndexPersistBenchmark.persistV9     false             moderate             75000     rollo  avgt   25  407060.720 ± 16965.695  us/op
   IndexPersistBenchmark.persistV9     false                 high             75000     rollo  avgt   25  400008.825 ± 19613.728  us/op
   
   size [2262258] bytes.
   size [276631] bytes.
   size [47597] bytes.
   size [2280590] bytes.
   size [2095354] bytes.
   size [2094972] bytes.
   ```
   
   After:
   ```
   
   Benchmark                        (rollup)  (rollupOpportunity)  (rowsPerSegment)  (schema)  Mode  Cnt       Score       Error  Units
   IndexPersistBenchmark.persistV9      true                 none             75000     rollo  avgt   25  436966.463 ± 45936.358  us/op
   IndexPersistBenchmark.persistV9      true             moderate             75000     rollo  avgt   25   54724.237 ±  7500.566  us/op
   IndexPersistBenchmark.persistV9      true                 high             75000     rollo  avgt   25   11010.033 ±   718.345  us/op
   IndexPersistBenchmark.persistV9     false                 none             75000     rollo  avgt   25  464730.668 ± 30413.613  us/op
   IndexPersistBenchmark.persistV9     false             moderate             75000     rollo  avgt   25  523597.179 ± 43443.648  us/op
   IndexPersistBenchmark.persistV9     false                 high             75000     rollo  avgt   25  535282.839 ± 46529.297  us/op
   
   size [2262258] bytes.
   size [276631] bytes.
   size [47597] bytes.
   size [2269144] bytes. 
   size [1475402] bytes.
   size [1357298] bytes.
   ```
   
   Actual difference in segment size will vary quite a lot from this contrived scenario, but should generally be smaller, at the cost of slower index persist time.
   
   Query performance should be unaffected. See #6066 for additional benchmarks and discussion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org