You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2018/11/08 17:29:01 UTC

[jira] [Comment Edited] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision

    [ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680050#comment-16680050 ] 

slim bouguerra edited comment on HIVE-20873 at 11/8/18 5:28 PM:
----------------------------------------------------------------

Still unclear to me why are we using Murmur, there is a dozen of other hash algorithms including XXhash that way faster and has good quality. https://cyan4973.github.io/xxHash/
Anyway i will try to take a look at benchmarking this i have created [a task|https://issues.apache.org/jira/browse/HIVE-20892] .  FYI XXHash is widely used by lot of MPP style engines.


was (Author: bslim):
Still unclear to me why are we using Murmur, there is a dozen of other hash algorithms including XXhash that way faster and has good quality. https://cyan4973.github.io/xxHash/
Anyway i will try to take a look at benchmarking this i have created a sub task.  FYI XXHash is widely used by lot of MPP style engines.

> Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
> ------------------------------------------------------------------------
>
>                 Key: HIVE-20873
>                 URL: https://issues.apache.org/jira/browse/HIVE-20873
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch, HIVE-20873.3.patch
>
>
> VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and XOR operators for short computation time, but more hash collision. Group by operations become very slow on large data sets. It needs Murmur hash or a better hash function for less hash collision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)