You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/16 10:16:32 UTC

[GitHub] [arrow] Dandandan opened a new pull request #9213: ARROW-11266: [Rust][DataFusion] Implement vectorized hashing for hash aggregate [WIP]

Dandandan opened a new pull request #9213:
URL: https://github.com/apache/arrow/pull/9213


   This is a WIP PR for implementing a similar approach to hashing as used in the hash join.
   For the hash-aggregate heavy query TCPH query 1 this speeds it up by ~30%.
   
   TODO:
   - [ ] Implement collision checking
   - [ ] Add test for collisions
   - [ ] Move some code to hash utils
   
   Benchmark results
   PR
   ```
   Query 1 iteration 0 took 457.0 ms
   Query 1 iteration 1 took 459.7 ms
   Query 1 iteration 2 took 459.3 ms
   Query 1 iteration 3 took 461.1 ms
   Query 1 iteration 4 took 456.8 ms
   Query 1 iteration 5 took 460.6 ms
   Query 1 iteration 6 took 462.0 ms
   Query 1 iteration 7 took 462.3 ms
   Query 1 iteration 8 took 461.0 ms
   Query 1 iteration 9 took 466.4 ms
   Query 1 avg time: 460.63 ms
   ```
   
   Vectorized hashing:
   ```
   Query 1 iteration 0 took 650.0 ms
   Query 1 iteration 1 took 648.5 ms
   Query 1 iteration 2 took 646.8 ms
   Query 1 iteration 3 took 646.2 ms
   Query 1 iteration 4 took 645.7 ms
   Query 1 iteration 5 took 643.0 ms
   Query 1 iteration 6 took 649.5 ms
   Query 1 iteration 7 took 649.5 ms
   Query 1 iteration 8 took 643.4 ms
   Query 1 iteration 9 took 643.6 ms
   Query 1 avg time: 646.63 ms
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #9213: ARROW-11266: [Rust][DataFusion] Implement vectorized hashing for hash aggregate [WIP]

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #9213:
URL: https://github.com/apache/arrow/pull/9213#issuecomment-761597501


   Really great work @Dandandan !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9213: ARROW-11266: [Rust][DataFusion] Implement vectorized hashing for hash aggregate [WIP]

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9213:
URL: https://github.com/apache/arrow/pull/9213#issuecomment-789665919


   @alamb thanks, probably will open this or a new one if/when I continue with it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9213: ARROW-11266: [Rust][DataFusion] Implement vectorized hashing for hash aggregate [WIP]

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9213:
URL: https://github.com/apache/arrow/pull/9213#issuecomment-789733069


   Closing this PR for now; We can reopen it if need be


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9213: ARROW-11266: [Rust][DataFusion] Implement vectorized hashing for hash aggregate [WIP]

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9213:
URL: https://github.com/apache/arrow/pull/9213#issuecomment-761539134


   https://issues.apache.org/jira/browse/ARROW-11266


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9213: ARROW-11266: [Rust][DataFusion] Implement vectorized hashing for hash aggregate [WIP]

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9213:
URL: https://github.com/apache/arrow/pull/9213


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9213: ARROW-11266: [Rust][DataFusion] Implement vectorized hashing for hash aggregate [WIP]

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9213:
URL: https://github.com/apache/arrow/pull/9213#issuecomment-789646626


   @Dandandan I am closing this PR for the time being to clean up the Rust/Arrow PR backlog.  Please let me know if this is a mistake


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org