You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/23 23:00:47 UTC
[GitHub] [druid] gianm opened a new pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.
gianm opened a new pull request #10430:
URL: https://github.com/apache/druid/pull/10430
Two new index types are added:
1) Use an int-array-based index in cases where the difference between
the min and max values isn't too large, and keys are unique.
2) Use a Long2ObjectOpenHashMap (instead of the prior Java HashMap) in
all other cases.
In addition:
1) RowBasedIndexBuilder, a new class, is responsible for picking which
index implementation to use.
2) The IndexedTable.Index interface is extended to support using
unboxed primitives in the unique-long-keys case, and callers are
updated to use the new functionality.
Other key types continue to use indexes backed by Java HashMaps.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm edited a comment on pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.
Posted by GitBox <gi...@apache.org>.
gianm edited a comment on pull request #10430:
URL: https://github.com/apache/druid/pull/10430#issuecomment-698026349
Btw, the coverage here is decent from existing tests (most of the new lines are getting called), but I think it needs some more explicit coverage so I marked it as a draft. I'll add some.
I did some benchmarks: in IndexedTableJoinCursorBenchmark with long keys, it's about a 2x speedup for the first four projection types. (I didn't test the others.)
```
master
Benchmark (indexedTableType) (joinColumns) (projection) (rowsPerSegment) (rowsPerTableSegment) Mode Cnt Score Error Units
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 0 50000 5000000 avgt 10 2.483 ± 0.057 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 1 50000 5000000 avgt 10 2.440 ± 0.155 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 2 50000 5000000 avgt 10 2.975 ± 0.055 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 3 50000 5000000 avgt 10 3.910 ± 0.426 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 0 50000 5000000 avgt 10 3.570 ± 0.420 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 1 50000 5000000 avgt 10 3.354 ± 0.122 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 2 50000 5000000 avgt 10 3.737 ± 0.098 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 3 50000 5000000 avgt 10 4.534 ± 0.322 ms/op
patch
Benchmark (indexedTableType) (joinColumns) (projection) (rowsPerSegment) (rowsPerTableSegment) Mode Cnt Score Error Units
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 0 50000 5000000 avgt 10 1.247 ± 0.033 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 1 50000 5000000 avgt 10 1.226 ± 0.067 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 2 50000 5000000 avgt 10 1.617 ± 0.088 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 3 50000 5000000 avgt 10 2.174 ± 0.024 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 0 50000 5000000 avgt 10 1.479 ± 0.070 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 1 50000 5000000 avgt 10 1.599 ± 0.145 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 2 50000 5000000 avgt 10 1.963 ± 0.105 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 3 50000 5000000 avgt 10 2.268 ± 0.161 ms/op
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.
Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10430:
URL: https://github.com/apache/druid/pull/10430#issuecomment-698026349
Btw, the coverage here is OK (most of the new methods are getting called by existing tests), but I think it needs some more explicit coverage so I marked it as a draft. I'll add some.
I did some benchmarks: in IndexedTableJoinCursorBenchmark with long keys, it's about a 2x speedup for the first four projection types. (I didn't test the others.)
```
master
Benchmark (indexedTableType) (joinColumns) (projection) (rowsPerSegment) (rowsPerTableSegment) Mode Cnt Score Error Units
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 0 50000 5000000 avgt 10 2.483 ± 0.057 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 1 50000 5000000 avgt 10 2.440 ± 0.155 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 2 50000 5000000 avgt 10 2.975 ± 0.055 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 3 50000 5000000 avgt 10 3.910 ± 0.426 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 0 50000 5000000 avgt 10 3.570 ± 0.420 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 1 50000 5000000 avgt 10 3.354 ± 0.122 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 2 50000 5000000 avgt 10 3.737 ± 0.098 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 3 50000 5000000 avgt 10 4.534 ± 0.322 ms/op
patch
Benchmark (indexedTableType) (joinColumns) (projection) (rowsPerSegment) (rowsPerTableSegment) Mode Cnt Score Error Units
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 0 50000 5000000 avgt 10 1.247 ± 0.033 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 1 50000 5000000 avgt 10 1.226 ± 0.067 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 2 50000 5000000 avgt 10 1.617 ± 0.088 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment long3,longKey 3 50000 5000000 avgt 10 2.174 ± 0.024 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 0 50000 5000000 avgt 10 1.479 ± 0.070 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 1 50000 5000000 avgt 10 1.599 ± 0.145 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 2 50000 5000000 avgt 10 1.963 ± 0.105 ms/op
IndexedTableJoinCursorBenchmark.hashJoinCursorDimensionSelectors segment longKey,longKey 3 50000 5000000 avgt 10 2.268 ± 0.161 ms/op
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.
Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10430:
URL: https://github.com/apache/druid/pull/10430#issuecomment-700429082
I've added some more tests and have marked this patch as ready.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm merged pull request #10430: RowBasedIndexedTable: Add specialized index types for long keys.
Posted by GitBox <gi...@apache.org>.
gianm merged pull request #10430:
URL: https://github.com/apache/druid/pull/10430
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org