You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/23 23:00:47 UTC

[GitHub] [druid] gianm opened a new pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.

gianm opened a new pull request #10430:
URL: https://github.com/apache/druid/pull/10430


   Two new index types are added:
   
   1) Use an int-array-based index in cases where the difference between
      the min and max values isn't too large, and keys are unique.
   
   2) Use a Long2ObjectOpenHashMap (instead of the prior Java HashMap) in
      all other cases.
   
   In addition:
   
   1) RowBasedIndexBuilder, a new class, is responsible for picking which
      index implementation to use.
   
   2) The IndexedTable.Index interface is extended to support using
      unboxed primitives in the unique-long-keys case, and callers are
      updated to use the new functionality.
   
   Other key types continue to use indexes backed by Java HashMaps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm edited a comment on pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.

Posted by GitBox <gi...@apache.org>.
gianm edited a comment on pull request #10430:
URL: https://github.com/apache/druid/pull/10430#issuecomment-698026349


   Btw, the coverage here is decent from existing tests (most of the new lines are getting called), but I think it needs some more explicit coverage so I marked it as a draft. I'll add some.
   
   I did some benchmarks: in IndexedTableJoinCursorBenchmark with long keys, it's about a 2x speedup for the first four projection types. (I didn't test the others.)
   
   ```
   master
   
   Benchmark                                                         (indexedTableType)    (joinColumns)  (projection)  (rowsPerSegment)  (rowsPerTableSegment)  Mode  Cnt  Score   Error  Units
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             0             50000                5000000  avgt   10  2.483 ± 0.057  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             1             50000                5000000  avgt   10  2.440 ± 0.155  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             2             50000                5000000  avgt   10  2.975 ± 0.055  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             3             50000                5000000  avgt   10  3.910 ± 0.426  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             0             50000                5000000  avgt   10  3.570 ± 0.420  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             1             50000                5000000  avgt   10  3.354 ± 0.122  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             2             50000                5000000  avgt   10  3.737 ± 0.098  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             3             50000                5000000  avgt   10  4.534 ± 0.322  ms/op
   
   patch
   
   Benchmark                                                         (indexedTableType)    (joinColumns)  (projection)  (rowsPerSegment)  (rowsPerTableSegment)  Mode  Cnt  Score   Error  Units
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             0             50000                5000000  avgt   10  1.247 ± 0.033  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             1             50000                5000000  avgt   10  1.226 ± 0.067  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             2             50000                5000000  avgt   10  1.617 ± 0.088  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             3             50000                5000000  avgt   10  2.174 ± 0.024  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             0             50000                5000000  avgt   10  1.479 ± 0.070  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             1             50000                5000000  avgt   10  1.599 ± 0.145  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             2             50000                5000000  avgt   10  1.963 ± 0.105  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             3             50000                5000000  avgt   10  2.268 ± 0.161  ms/op
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10430:
URL: https://github.com/apache/druid/pull/10430#issuecomment-698026349


   Btw, the coverage here is OK (most of the new methods are getting called by existing tests), but I think it needs some more explicit coverage so I marked it as a draft. I'll add some.
   
   I did some benchmarks: in IndexedTableJoinCursorBenchmark with long keys, it's about a 2x speedup for the first four projection types. (I didn't test the others.)
   
   ```
   master
   
   Benchmark                                                         (indexedTableType)    (joinColumns)  (projection)  (rowsPerSegment)  (rowsPerTableSegment)  Mode  Cnt  Score   Error  Units
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             0             50000                5000000  avgt   10  2.483 ± 0.057  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             1             50000                5000000  avgt   10  2.440 ± 0.155  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             2             50000                5000000  avgt   10  2.975 ± 0.055  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             3             50000                5000000  avgt   10  3.910 ± 0.426  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             0             50000                5000000  avgt   10  3.570 ± 0.420  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             1             50000                5000000  avgt   10  3.354 ± 0.122  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             2             50000                5000000  avgt   10  3.737 ± 0.098  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             3             50000                5000000  avgt   10  4.534 ± 0.322  ms/op
   
   patch
   
   Benchmark                                                         (indexedTableType)    (joinColumns)  (projection)  (rowsPerSegment)  (rowsPerTableSegment)  Mode  Cnt  Score   Error  Units
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             0             50000                5000000  avgt   10  1.247 ± 0.033  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             1             50000                5000000  avgt   10  1.226 ± 0.067  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             2             50000                5000000  avgt   10  1.617 ± 0.088  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment    long3,longKey             3             50000                5000000  avgt   10  2.174 ± 0.024  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             0             50000                5000000  avgt   10  1.479 ± 0.070  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             1             50000                5000000  avgt   10  1.599 ± 0.145  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             2             50000                5000000  avgt   10  1.963 ± 0.105  ms/op
   IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors             segment  longKey,longKey             3             50000                5000000  avgt   10  2.268 ± 0.161  ms/op
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10430:
URL: https://github.com/apache/druid/pull/10430#issuecomment-700429082


   I've added some more tests and have marked this patch as ready.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm merged pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.

Posted by GitBox <gi...@apache.org>.
gianm merged pull request #10430:
URL: https://github.com/apache/druid/pull/10430


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org