You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/05 16:55:02 UTC

[GitHub] [druid] gianm opened a new pull request #9314: Add HashVectorGrouper based on MemoryOpenHashTable.

gianm opened a new pull request #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314
 
 
   This causes vectorized hash-based groupBys to use MemoryOpenHashTable
   instead of ByteBufferHashTable (see #9308).
   
   Additional supporting changes:
   
   1) Modifies VectorGrouper interface to use Memory instead of ByteBuffers.
   2) Modifies BufferArrayGrouper to match the new VectorGrouper interface.
   3) Removes "implements VectorGrouper" from BufferHashGrouper.
   
   Benchmarks:
   
   ```
   master @ a085685182d62e5dd1b716f1bbb9bcbbaeb1c661
   
   Benchmark              (query)  (rowsPerSegment)  (vectorize)  Mode  Cnt    Score   Error  Units
   SqlBenchmark.querySql       10           5000000        force  avgt   25  683.263 ± 8.382  ms/op
   SqlBenchmark.querySql       11           5000000        force  avgt   25  691.823 ± 7.979  ms/op
   SqlBenchmark.querySql       12           5000000        force  avgt   25   34.977 ± 0.842  ms/op
   SqlBenchmark.querySql       13           5000000        force  avgt   25   42.408 ± 0.984  ms/op
   SqlBenchmark.querySql       14           5000000        force  avgt   25   98.015 ± 0.887  ms/op
   SqlBenchmark.querySql       15           5000000        force  avgt   25  449.379 ± 6.541  ms/op
   SqlBenchmark.querySql       16           5000000        force  avgt   25  519.083 ± 5.115  ms/op
   SqlBenchmark.querySql       17           5000000        force  avgt   25  456.992 ± 4.015  ms/op
   SqlBenchmark.querySql       18           5000000        force  avgt   25  534.258 ± 8.832  ms/op
   
   groupby-hvg @ 9fdc4bcf8e6dd6896a5fe687145e5ea0f1c6a709
   
   Benchmark              (query)  (rowsPerSegment)  (vectorize)  Mode  Cnt    Score   Error  Units
   SqlBenchmark.querySql       10           5000000        force  avgt   25  450.345 ± 6.428  ms/op
   SqlBenchmark.querySql       11           5000000        force  avgt   25  473.163 ± 4.918  ms/op
   SqlBenchmark.querySql       12           5000000        force  avgt   25   32.281 ± 0.566  ms/op
   SqlBenchmark.querySql       13           5000000        force  avgt   25   39.259 ± 1.300  ms/op
   SqlBenchmark.querySql       14           5000000        force  avgt   25   91.553 ± 0.892  ms/op
   SqlBenchmark.querySql       15           5000000        force  avgt   25   85.417 ± 0.670  ms/op
   SqlBenchmark.querySql       16           5000000        force  avgt   25  158.197 ± 1.458  ms/op
   SqlBenchmark.querySql       17           5000000        force  avgt   25  103.445 ± 0.872  ms/op
   SqlBenchmark.querySql       18           5000000        force  avgt   25  175.661 ± 1.859  ms/op
   ```
   
   The largest improvements are on queries 10 and 11 (~50%) and queries 15–18 (3–5x). Queries 12–14 look very slightly faster or maybe unchanged.
   
   Queries 10 & 11 are groupBys on two strings and would use hash-based grouping.
   
   Queries 12–14 are groupBys on one string and would use array-based grouping.
   
   Queries 15–18 are groupBys on one long column and would use hash-based grouping.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-583108499
 
 
   Please check the LGTM warning.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-582557402
 
 
   This pull request **introduces 1 alert** when merging 672fa2dd049ef64de6c23f0381bad8e6af0ab620 into 54bc4dc71b3a55d8fbb4bbd4aa0e1bf12762c2fc - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-d1820fe72aa599efe958d9f29125cb26d97dbb38)
   
   **new alerts:**
   
   * 1 for Result of multiplication cast to wider type

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-582610503
 
 
   This pull request **introduces 1 alert** when merging 2ce825f0bada99f68fed421bb1e9422978be91b4 into 54bc4dc71b3a55d8fbb4bbd4aa0e1bf12762c2fc - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-898bbc9f8402086d40a7f91b0f43b6646c62bcb7)
   
   **new alerts:**
   
   * 1 for Result of multiplication cast to wider type

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.

Posted by GitBox <gi...@apache.org>.
gianm commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-583122340
 
 
   I just looked at it. It's complaining that `keySpace.getInt(i * Integer.BYTES)` accepts a long and in theory `i * Integer.BYTES` could overflow. It's not very plausible: it would only happen if the vector size was crazy huge. I added a check to the start of the method that will detect this and block it. I'm not sure if LGTM will be smart enough to understand it, but it means the issue isn't real.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on issue #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314#issuecomment-583143529
 
 
   This pull request **introduces 1 alert** when merging 78bc768e6a8c0a487ec3fcdd529963fb94d08967 into 2e1dbe598ce26a4668dffcbd996ec2b57cf3761a - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-96ec659add4e16e2400d8fddbc01a245bbb33a71)
   
   **new alerts:**
   
   * 1 for Result of multiplication cast to wider type

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #9314: Add HashVectorGrouper based on MemoryOpenHashTable.

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #9314: Add HashVectorGrouper based on MemoryOpenHashTable.
URL: https://github.com/apache/druid/pull/9314
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org