You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2008/09/06 12:22:44 UTC

[jira] Created: (HBASE-875) Use MurmurHash instead of JenkinsHash

Use MurmurHash instead of JenkinsHash
-------------------------------------

                 Key: HBASE-875
                 URL: https://issues.apache.org/jira/browse/HBASE-875
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: util
    Affects Versions: 0.19.0
            Reporter: Andrzej Bialecki 


I recently ported the MurmurHash (http://murmurhash.googlepages.com/) to Java, and according to my tests it's roughly 5 times faster than the current version of JenkinsHash in the trunk/ . According to the author (and other analysts at comp.sci.crypt) this hash has an excellent avalanche behavior, and low collision rate. I propose to either replace the JenkinsHash or add this hash as an option to be used in BloomFilter-s and related classes.

If your opinion is positive, I'll prepare a patch. The Java implementation of the hash can be found here: http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-875) Use MurmurHash instead of JenkinsHash

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629229#action_12629229 ] 

stack commented on HBASE-875:
-----------------------------

Sound great Andrzej.

Would suggest that it not replace jenkins but can it be offered as an option?   We can't just replace the jenkins hashing since the hash is used in filenames out on the filesystem; we'd have to run a migration to move from one hash type to the other.

> Use MurmurHash instead of JenkinsHash
> -------------------------------------
>
>                 Key: HBASE-875
>                 URL: https://issues.apache.org/jira/browse/HBASE-875
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.19.0
>            Reporter: Andrzej Bialecki 
>
> I recently ported the MurmurHash (http://murmurhash.googlepages.com/) to Java, and according to my tests it's roughly 5 times faster than the current version of JenkinsHash in the trunk/ . According to the author (and other analysts at comp.sci.crypt) this hash has an excellent avalanche behavior, and low collision rate. I propose to either replace the JenkinsHash or add this hash as an option to be used in BloomFilter-s and related classes.
> If your opinion is positive, I'll prepare a patch. The Java implementation of the hash can be found here: http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-875) Use MurmurHash instead of JenkinsHash

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated HBASE-875:
------------------------------------

    Attachment: murmur.patch

This patch adds MurmurHash as an option (used by default) alongside with JenkinsHash, and provides for backwards-compatibility with data created before this patch.

> Use MurmurHash instead of JenkinsHash
> -------------------------------------
>
>                 Key: HBASE-875
>                 URL: https://issues.apache.org/jira/browse/HBASE-875
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.19.0
>            Reporter: Andrzej Bialecki 
>         Attachments: murmur.patch
>
>
> I recently ported the MurmurHash (http://murmurhash.googlepages.com/) to Java, and according to my tests it's roughly 5 times faster than the current version of JenkinsHash in the trunk/ . According to the author (and other analysts at comp.sci.crypt) this hash has an excellent avalanche behavior, and low collision rate. I propose to either replace the JenkinsHash or add this hash as an option to be used in BloomFilter-s and related classes.
> If your opinion is positive, I'll prepare a patch. The Java implementation of the hash can be found here: http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-875) Use MurmurHash instead of JenkinsHash

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633391#action_12633391 ] 

stack commented on HBASE-875:
-----------------------------

Thanks for the patch Andrzej.  It looks great.  One comment. In  src/java/org/onelab/filter/Filter.java, you add in versioning.  Deserializing, if first int is > 0, then you assume Filter is of an older version.  Is this test safe?  Its not possible for a hash to be negative?

Otherwise, the configuration of which hash to use applies to BloomFilters only it seems?  If so, that seems right; it shouldn''t break hbase finding encoded region names in the filesystem.


> Use MurmurHash instead of JenkinsHash
> -------------------------------------
>
>                 Key: HBASE-875
>                 URL: https://issues.apache.org/jira/browse/HBASE-875
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.19.0
>            Reporter: Andrzej Bialecki 
>         Attachments: murmur.patch
>
>
> I recently ported the MurmurHash (http://murmurhash.googlepages.com/) to Java, and according to my tests it's roughly 5 times faster than the current version of JenkinsHash in the trunk/ . According to the author (and other analysts at comp.sci.crypt) this hash has an excellent avalanche behavior, and low collision rate. I propose to either replace the JenkinsHash or add this hash as an option to be used in BloomFilter-s and related classes.
> If your opinion is positive, I'll prepare a patch. The Java implementation of the hash can be found here: http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-875) Use MurmurHash instead of JenkinsHash

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-875.
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.19.0

You are right on the first value being count of hashes; my misreading.

Committed to TRUNK.  Thanks for the patch Andrzej.

> Use MurmurHash instead of JenkinsHash
> -------------------------------------
>
>                 Key: HBASE-875
>                 URL: https://issues.apache.org/jira/browse/HBASE-875
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.19.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.19.0
>
>         Attachments: murmur.patch
>
>
> I recently ported the MurmurHash (http://murmurhash.googlepages.com/) to Java, and according to my tests it's roughly 5 times faster than the current version of JenkinsHash in the trunk/ . According to the author (and other analysts at comp.sci.crypt) this hash has an excellent avalanche behavior, and low collision rate. I propose to either replace the JenkinsHash or add this hash as an option to be used in BloomFilter-s and related classes.
> If your opinion is positive, I'll prepare a patch. The Java implementation of the hash can be found here: http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-875) Use MurmurHash instead of JenkinsHash

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633657#action_12633657 ] 

Andrzej Bialecki  commented on HBASE-875:
-----------------------------------------

Re: deserialization. Sure, hash values can be anything. But the first parameter in the old format is the number of hash functions to use, not the hash value. so it can't be negative.

Re: configuration. I was of a split mind on this, but if we allowed configuring hash function in these cases, then we would have to persist this information somewhere in the data, which sounds kind of messy - so I decided against it. Perhaps the name of the property should indicate that it affects only BloomFilters ... OTOH some day we may want to use this conf. knob in other places too.

> Use MurmurHash instead of JenkinsHash
> -------------------------------------
>
>                 Key: HBASE-875
>                 URL: https://issues.apache.org/jira/browse/HBASE-875
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.19.0
>            Reporter: Andrzej Bialecki 
>         Attachments: murmur.patch
>
>
> I recently ported the MurmurHash (http://murmurhash.googlepages.com/) to Java, and according to my tests it's roughly 5 times faster than the current version of JenkinsHash in the trunk/ . According to the author (and other analysts at comp.sci.crypt) this hash has an excellent avalanche behavior, and low collision rate. I propose to either replace the JenkinsHash or add this hash as an option to be used in BloomFilter-s and related classes.
> If your opinion is positive, I'll prepare a patch. The Java implementation of the hash can be found here: http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.