You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Micah Kornfield (JIRA)" <ji...@apache.org> on 2019/08/10 05:57:00 UTC

[jira] [Updated] (ARROW-6024) [Java] Provide more hash algorithms

     [ https://issues.apache.org/jira/browse/ARROW-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Micah Kornfield updated ARROW-6024:
-----------------------------------
    Description: 
Provide more hash algorithms to choose for different scenarios. In particular, we provide the following hash algorithms:
 * Simple hasher: A hasher that calculates the hash code of integers as is, and do not perform any finalization. So the computation is extremely efficient, but the quality of the produced hash code may not be good.

 * Murmur finalizing hasher: Finalize the hash code by the Murmur hashing algorithm. Details of the algorithm can be found in [https://en.wikipedia.org/wiki/MurmurHash]. Murmur hashing is computational expensive, as it involves several integer multiplications. However, the produced hash codes have good quality in the sense that they are uniformly distributed in the universe.

  was:
Provide more hash algorithms to choose for different scenarios. In particular, we provide the following hash algorithms:

* Simple hasher: A hasher that calculates the hash code of integers as is, and do not perform any finalization. So the computation is extremely efficient, but the quality of the produced hash code may not be good.

* Murmur finalizing hasher: Finalize the hash code by the Murmur hashing algorithm. Details of the algorithm can be found in https://en.wikipedia.org/wiki/MurmurHash. Murmur hashing is computational expensive, as it involves several integer multiplications. However, the produced hash codes have good quality in the sense that they are uniformly distributed in the universe.

* Jenkins finalizing hasher: Finalize the hash code by Bob Jenkins' algorithm. Details of this algorithm can be found in http://www.burtleburtle.net/bob/hash/integer.html. Jenkins hashing is less computational expensive than Murmur hashing, as it involves no integer multiplication. However, the produced hash codes also have good quality in the sense that they are uniformly distributed in the universe.

* Non-negative hasher: Wrapper for another hasher, to make the generated hash code non-negative. This can be useful for scenarios like hash table.


> [Java] Provide more hash algorithms 
> ------------------------------------
>
>                 Key: ARROW-6024
>                 URL: https://issues.apache.org/jira/browse/ARROW-6024
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.15.0
>
>          Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Provide more hash algorithms to choose for different scenarios. In particular, we provide the following hash algorithms:
>  * Simple hasher: A hasher that calculates the hash code of integers as is, and do not perform any finalization. So the computation is extremely efficient, but the quality of the produced hash code may not be good.
>  * Murmur finalizing hasher: Finalize the hash code by the Murmur hashing algorithm. Details of the algorithm can be found in [https://en.wikipedia.org/wiki/MurmurHash]. Murmur hashing is computational expensive, as it involves several integer multiplications. However, the produced hash codes have good quality in the sense that they are uniformly distributed in the universe.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)