You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/04 19:17:43 UTC

[GitHub] [arrow-datafusion] houqp opened a new issue #677: Use a single HashMap implementation consistently across the code base

houqp opened a new issue #677:
URL: https://github.com/apache/arrow-datafusion/issues/677


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   
   We are using both `std::collections::HashMap` and `hashbrown::HashMap` in datafusion, it would be better to consistently use only one of them.
   
   hashbrown's official readme has the following statement:
   
   > Since Rust 1.36, this is now the HashMap implementation for the Rust standard library. However you may still want to use this crate instead since it works in environments without std, such as embedded systems and kernels.
   
   However, it's unclear whether Rust std also uses ahash as the default hasher. If not, it seems like hashbrown would still be a better choice when it comes to performance.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   
   Follow up from https://github.com/apache/arrow-datafusion/pull/676#discussion_r663544711.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #677: Use a single HashMap implementation consistently across the code base

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #677:
URL: https://github.com/apache/arrow-datafusion/issues/677#issuecomment-874046165


   `hashbrown::HashMap` is a good choice 👍 in my opinion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #677: Use a single HashMap implementation consistently across the code base

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #677:
URL: https://github.com/apache/arrow-datafusion/issues/677#issuecomment-874046165


   `hashbrown::HashMap` is a good choice 👍 in my opinion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #677: Use a single HashMap implementation consistently across the code base

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #677:
URL: https://github.com/apache/arrow-datafusion/issues/677#issuecomment-873671789


   The hash brown crate is used 
   
   * It exposes some more APIs like `RawEntry`, which can avoid hashing / looking up elements twice.
   * It is slightly faster than the version included with Rust std, not only because of aHash (vs SipHasher used in std), but also some further optimizations / inlining.
   
   I think best would be to define a `HashMap` type alias, so we can use whatever is set by default across the project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #677: Use a single HashMap implementation consistently across the code base

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #677:
URL: https://github.com/apache/arrow-datafusion/issues/677#issuecomment-873871918


   Agreed, I think we should move all HashMap import to our own pluggable type alias and set `hashbrown::HashMap` as the default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #677: Use a single HashMap implementation consistently across the code base

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #677:
URL: https://github.com/apache/arrow-datafusion/issues/677#issuecomment-873671789


   The hash brown crate is used  for those reasons:
   
   * It exposes some more APIs like `RawEntry`, which can avoid hashing / looking up elements twice.
   * It is slightly faster than the version included with Rust std, not only because of aHash (vs SipHasher used in std), but also some further optimizations / inlining.
   
   I think best would be to define a `HashMap` type alias, so we can use whatever is set by default across the project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org