You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/02 12:45:10 UTC

[GitHub] [arrow-datafusion] jhorstmann commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

jhorstmann commented on issue #790:
URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-890995232


   > Collisions are handled by the fact that the entry in the hash table is a _list_ of indicies into the mutable area -- if there are multiple values in that list each entry in the mutable area needs to be checked for equality to find the correct one.
   
   I don't really see how we can to this equality checks faster than the hashmap itself could do it, but I would be very happy to be proven wrong.
   
   For optimized eq and hash handling, hashbrown has some more low-level functionality that we are not yet using:
   
    - [`from_hash(hash, eq_fn)`](https://docs.rs/hashbrown/0.11.2/hashbrown/hash_map/struct.RawEntryBuilderMut.html#method.from_hash): allows looking up an entry with a given hash and equals function, bypassing the hashbuilder of the map. This allows implementing eq on values that are not directly stored in the map, for example if only an index into another structure is part of the key.
    - [`insert_with_hasher(hash, key, value, hasher)`](https://docs.rs/hashbrown/0.11.2/hashbrown/hash_map/struct.RawVacantEntryMut.html#method.insert_with_hasher): Again allows bypassing the hashbuilder and looking up the hash value in some other structure. The `hasher` parameter is only called if the table needs rehashing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org