You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/05 16:55:02 UTC
[GitHub] [druid] gianm opened a new pull request #9314: Add
HashVectorGrouper based on MemoryOpenHashTable.
gianm opened a new pull request #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314
This causes vectorized hash-based groupBys to use MemoryOpenHashTable
instead of ByteBufferHashTable (see #9308).
Additional supporting changes:
1) Modifies VectorGrouper interface to use Memory instead of ByteBuffers.
2) Modifies BufferArrayGrouper to match the new VectorGrouper interface.
3) Removes "implements VectorGrouper" from BufferHashGrouper.
Benchmarks:
```
master @ a085685182d62e5dd1b716f1bbb9bcbbaeb1c661
Benchmark (query) (rowsPerSegment) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 10 5000000 force avgt 25 683.263 ± 8.382 ms/op
SqlBenchmark.querySql 11 5000000 force avgt 25 691.823 ± 7.979 ms/op
SqlBenchmark.querySql 12 5000000 force avgt 25 34.977 ± 0.842 ms/op
SqlBenchmark.querySql 13 5000000 force avgt 25 42.408 ± 0.984 ms/op
SqlBenchmark.querySql 14 5000000 force avgt 25 98.015 ± 0.887 ms/op
SqlBenchmark.querySql 15 5000000 force avgt 25 449.379 ± 6.541 ms/op
SqlBenchmark.querySql 16 5000000 force avgt 25 519.083 ± 5.115 ms/op
SqlBenchmark.querySql 17 5000000 force avgt 25 456.992 ± 4.015 ms/op
SqlBenchmark.querySql 18 5000000 force avgt 25 534.258 ± 8.832 ms/op
groupby-hvg @ 9fdc4bcf8e6dd6896a5fe687145e5ea0f1c6a709
Benchmark (query) (rowsPerSegment) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.querySql 10 5000000 force avgt 25 450.345 ± 6.428 ms/op
SqlBenchmark.querySql 11 5000000 force avgt 25 473.163 ± 4.918 ms/op
SqlBenchmark.querySql 12 5000000 force avgt 25 32.281 ± 0.566 ms/op
SqlBenchmark.querySql 13 5000000 force avgt 25 39.259 ± 1.300 ms/op
SqlBenchmark.querySql 14 5000000 force avgt 25 91.553 ± 0.892 ms/op
SqlBenchmark.querySql 15 5000000 force avgt 25 85.417 ± 0.670 ms/op
SqlBenchmark.querySql 16 5000000 force avgt 25 158.197 ± 1.458 ms/op
SqlBenchmark.querySql 17 5000000 force avgt 25 103.445 ± 0.872 ms/op
SqlBenchmark.querySql 18 5000000 force avgt 25 175.661 ± 1.859 ms/op
```
The largest improvements are on queries 10 and 11 (~50%) and queries 15–18 (3–5x). Queries 12–14 look very slightly faster or maybe unchanged.
Queries 10 & 11 are groupBys on two strings and would use hash-based grouping.
Queries 12–14 are groupBys on one string and would use array-based grouping.
Queries 15–18 are groupBys on one long column and would use hash-based grouping.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] jihoonson commented on issue #9314: Add HashVectorGrouper
based on MemoryOpenHashTable.
Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-583108499
Please check the LGTM warning.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] lgtm-com[bot] commented on issue #9314: Add
HashVectorGrouper based on MemoryOpenHashTable.
Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-582557402
This pull request **introduces 1 alert** when merging 672fa2dd049ef64de6c23f0381bad8e6af0ab620 into 54bc4dc71b3a55d8fbb4bbd4aa0e1bf12762c2fc - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-d1820fe72aa599efe958d9f29125cb26d97dbb38)
**new alerts:**
* 1 for Result of multiplication cast to wider type
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] lgtm-com[bot] commented on issue #9314: Add
HashVectorGrouper based on MemoryOpenHashTable.
Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-582610503
This pull request **introduces 1 alert** when merging 2ce825f0bada99f68fed421bb1e9422978be91b4 into 54bc4dc71b3a55d8fbb4bbd4aa0e1bf12762c2fc - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-898bbc9f8402086d40a7f91b0f43b6646c62bcb7)
**new alerts:**
* 1 for Result of multiplication cast to wider type
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] gianm commented on issue #9314: Add HashVectorGrouper based
on MemoryOpenHashTable.
Posted by GitBox <gi...@apache.org>.
gianm commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-583122340
I just looked at it. It's complaining that `keySpace.getInt(i * Integer.BYTES)` accepts a long and in theory `i * Integer.BYTES` could overflow. It's not very plausible: it would only happen if the vector size was crazy huge. I added a check to the start of the method that will detect this and block it. I'm not sure if LGTM will be smart enough to understand it, but it means the issue isn't real.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] lgtm-com[bot] commented on issue #9314: Add
HashVectorGrouper based on MemoryOpenHashTable.
Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-583143529
This pull request **introduces 1 alert** when merging 78bc768e6a8c0a487ec3fcdd529963fb94d08967 into 2e1dbe598ce26a4668dffcbd996ec2b57cf3761a - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-96ec659add4e16e2400d8fddbc01a245bbb33a71)
**new alerts:**
* 1 for Result of multiplication cast to wider type
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #9314: Add HashVectorGrouper
based on MemoryOpenHashTable.
Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org