You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "David Allsopp (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/11/16 01:41:52 UTC

[jira] [Issue Comment Edited] (CASSANDRA-2975) Upgrade MurmurHash to version 3

    [ https://issues.apache.org/jira/browse/CASSANDRA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150931#comment-13150931 ] 

David Allsopp edited comment on CASSANDRA-2975 at 11/16/11 12:39 AM:
---------------------------------------------------------------------

Does anyone have any data on what the typical key size is (i.e. the average input size for the hash)?

I have a couple of optimisations for the MurmurHash3 implementation that I think give another 10-40% speedup, particularly for smaller values (e.g. 30% speedup for buffer lengths under 256 bytes) and no worse for large values (tens of KB). These results were on a AMD Phenom II X6 1055T @ 2.80 GHz, under 64-bit Windows 7, Java 1.6.0_27.

Firstly, inline the {{rotl64}} calls , e.g. so that
{noformat}
k1 = rotl64(k1, 31);
{noformat}
becomes
{noformat}
k1 = (k1 << 31) | (k1 >>> 33);
{noformat}

Secondly, rather than a large {{switch-case}} to handle the 'tail', use nested {{if-else}} to form a simple binary search. Particularly for relatively small inputs, handling the 'tail' is a significant part of the computation. E.g:

{noformat}
int ln = length & 15;
if (ln > 8)
  {
     if (ln > 12)
       {
          // etc for cases 13 - 15
       }
     else
       {
          // cases 11 and 12
       }

  }
else
  {
     // etc for cases 1-7
  }
{noformat}

Will try to post a proper benchmark when I've tidied it up (run out of time today!) so anyone interested can try it on other hardware...

This latter optimisation is pretty verbose and ugly to look at - it _might_ be just as fast, and much more concise, to lookup the offsets and shifts from an array, and deal with the special cases 1 and 9 as, well, special cases - but haven't benchmarked this alternative yet...
                
      was (Author: dallsopp):
    Does anyone have any data on what the typical key size is (i.e. the average input size for the hash)?

I have a couple of optimisations for the MurmurHash3 implementation that I think give another 10-40% speedup, particularly for smaller values (e.g. 30% speedup for buffer lengths under 256 bytes) and no worse for large values (tens of KB). These results were on a AMD Phenom II X6 1055T @ 2.80 GHz, under 64-bit Windows 7, Java 1.6.0_27.

Firstly, inline the {rotl64}} calls , e.g. so that
{noformat}
k1 = rotl64(k1, 31);
{noformat}
becomes
{noformat}
k1 = (k1 << 31) | (k1 >>> 33);
{noformat}

Secondly, rather than a large {{switch-case}} to handle the 'tail', use nested {{if-else}} to form a simple binary search. Particularly for relatively small inputs, handling the 'tail' is a significant part of the computation. E.g:

{noformat}
int ln = length & 15;
if (ln > 8)
  {
     if (ln > 12)
       {
          // etc for cases 13 - 15
       }
     else
       {
          // cases 11 and 12
       }

  }
else
  {
     // etc for cases 1-7
  }
{noformat}

Will try to post a proper benchmark when I've tidied it up (run out of time today!) so anyone interested can try it on other hardware...

This latter optimisation is pretty verbose and ugly to look at - it _might_ be just as fast, and much more concise, to lookup the offsets and shifts from an array, and deal with the special cases 1 and 9 as, well, special cases - but haven't benchmarked this alternative yet...
                  
> Upgrade MurmurHash to version 3
> -------------------------------
>
>                 Key: CASSANDRA-2975
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2975
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brian Lindauer
>            Assignee: Brian Lindauer
>            Priority: Trivial
>              Labels: lhf
>             Fix For: 1.1
>
>         Attachments: 0001-Convert-BloomFilter-to-use-MurmurHash-v3-instead-of-.patch, 0002-Backwards-compatibility-with-files-using-Murmur2-blo.patch
>
>
> MurmurHash version 3 was finalized on June 3. It provides an enormous speedup and increased robustness over version 2, which is implemented in Cassandra. Information here:
> http://code.google.com/p/smhasher/
> The reference implementation is here:
> http://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?spec=svn136&r=136
> I have already done the work to port the (public domain) reference implementation to Java in the MurmurHash class and updated the BloomFilter class to use the new implementation:
> https://github.com/lindauer/cassandra/commit/cea6068a4a3e5d7d9509335394f9ef3350d37e93
> Apart from the faster hash time, the new version only requires one call to hash() rather than 2, since it returns 128 bits of hash instead of 64.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira