You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (Created) (JIRA)" <ji...@apache.org> on 2012/01/23 20:11:41 UTC

[jira] [Created] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Evaluate Murmur3-based partitioner
----------------------------------

                 Key: CASSANDRA-3772
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Jonathan Ellis
             Fix For: 1.2


MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Drew Kutcharian (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195960#comment-13195960 ] 

Drew Kutcharian commented on CASSANDRA-3772:
--------------------------------------------

It probably wouldn't be such a bad idea to also evaluate CityHash which is expected to be even faster than Murmur.

http://code.google.com/p/cityhash/
http://en.wikipedia.org/wiki/CityHash

                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3772:
-----------------------------

    Attachment: 0001-CASSANDRA-3772-Test.patch

micro benchmark shows a lot better performance

testing size of: 200000
Test MD5
MD5 test completed @ 1506
Test Murmur3
Murmur3 test completed @ 781

Hi Dave, while reviewing the patch it looks like 
Murmur3Partitioner.hash 

{code}
hashBytes[1] = (byte) (bufferLong >> 48);
...
{code}

is kind of redundant to 

{code}
case 15: k2 ^= ((long) key.get(offset+14)) << 48
... 
{code}

Though i dont think it is going to cause any additional latency :)


                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3772-Test.patch, MumPartitionerTest.docx, hashed_partitioner.diff, hashed_partitioner_3.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207132#comment-13207132 ] 

Yuki Morishita commented on CASSANDRA-3772:
-------------------------------------------

Dave,

Patch needs rebase, but looking at the patch, I noticed the following:

{code}
private static byte[] hashMurmur3(ByteBuffer... data)
{
    HashFunction hashFunction = murmur3HF.get();
    Hasher hasher = hashFunction.newHasher();
    // snip
}
{code}

Isn't that slow if you instantiate every time? I looked up guava source code but I saw no way to "reset", so I guess the above is the only thing you could do...

I also note that CASSANDRA-2975 will implement MurmurHash3, so I think it is better not to introduce external library. What do you think?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207159#comment-13207159 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

bq. I looked up guava source code but I saw no way to "reset", so I guess the above is the only thing you could do

It looks like you're right: http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/hash/MessageDigestHashFunction.java

So using the standalone MH3 library is probably the way to go.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239649#comment-13239649 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

>>>  Added back in a deprecated RandomPartitioner class that identity-derives from MD5Partitioner for backwards compatibility.
Not sure about this:
May be we should call the MD5Partitioner a random partitioner and not Deprecate it (Because we might not be able to remove this class for a long long time... Similar to OldNetworkTopologyStrategy)?

BTW: Jeremy, I was not able to see any Much of a difference using the stress tool.


                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: hashed_partitioner.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3772:
-----------------------------

    Attachment: 0001-CASSANDRA-3772.patch

Made minor change format, added comments to yaml and modified Murmur3Partitioner to add

{code}
public static void writeLong(long src, byte[] dest, int offset)
{code}

I do see a bigger gain with faster disks.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3772-Test.patch, 0001-CASSANDRA-3772.patch, MumPartitionerTest.docx, hashed_partitioner.diff, hashed_partitioner_3.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439261#comment-13439261 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

+1 for this too!
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199659#comment-13199659 ] 

Dave Brosius commented on CASSANDRA-3772:
-----------------------------------------

Interestingly, timing just the hashing function itself shows very little difference (statistically irrelevant). 
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439624#comment-13439624 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Looks like it's worth prototyping, LongPartitioner should be very straightforward

If there's no performance advantage though we can leave it out
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236329#comment-13236329 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

(2975 did get committed.)
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439602#comment-13439602 ] 

Pavel Yaskevich commented on CASSANDRA-3772:
--------------------------------------------

Would have to re-implement all IPartitioner methods, make a M3P truly separate from RandomPartitioner, as we share BigInteger interface with it?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238690#comment-13238690 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Why not just make a separate IPartitioner class so we don't need to add any special cases for the config?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239657#comment-13239657 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Seriously bummed that we're not seeing stress wins.  Didn't someone post profiler results fingering MD5 as a bottleneck?  Maybe I am misremembering.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: hashed_partitioner.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207416#comment-13207416 ] 

Dave Brosius commented on CASSANDRA-3772:
-----------------------------------------

I'd be glad to rework it, but didn't find this mysterious standalone MH3 library when i first was looking, and only found the guava version. Can you post a link as to where i should get this from?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439004#comment-13439004 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

That sounds worth it to me.  Any downsides if we make MP the default for new clusters?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199516#comment-13199516 ] 

Dave Brosius edited comment on CASSANDRA-3772 at 2/3/12 4:56 AM:
-----------------------------------------------------------------

Doing 1000 inserts of a 5 column CF on a single node cluster on a really lousy machine seems to show that the guava murmur hash is significantly slower than MD5. 5x? perhaps. Perhaps it's just the guava implementation, as opposed to the Murmur3 algorithm.
                
      was (Author: dbrosius@apache.org):
    Doing 1000 inserts of a 5 column CF on a single node cluster on a really lousy machine seems to show that the guava murmur hash is significantly slower than MD5. 5x? perhaps. Perhaps it's just the guava implementation, as opposed to the Murmur3 implementation.
                  
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288706#comment-13288706 ] 

Sylvain Lebresne commented on CASSANDRA-3772:
---------------------------------------------

bq. Is that something we can address in the index code then, instead of creating a new partitioner?

We could change the index code to write the token along with the key in the index rows, but it's unclear it would be a straight win since it means we'll do more I/O. It would also only work for new indexes, so we would have to keep compatibility with the old code and hence that's not hassle free either. Imho, if we do can find cases where md5 is a bottleneck, transitioning to a new murmur3 partitioned over time is cleaner.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3772-Test.patch, MumPartitionerTest.docx, hashed_partitioner.diff, hashed_partitioner_3.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207436#comment-13207436 ] 

Vijay edited comment on CASSANDRA-3772 at 2/14/12 2:26 AM:
-----------------------------------------------------------

If CASSANDRA-2975 gets committed you should be able to use that.

Edit: you can use MurmurHash.hash3_x64_128 function from 2975.
                
      was (Author: vijay2win@yahoo.com):
    If CASSANDRA-2975 gets committed you should be able to use that.
                  
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239671#comment-13239671 ] 

Sylvain Lebresne commented on CASSANDRA-3772:
---------------------------------------------

Did you guys tested on secondary index reads (with reasonably high cardinality)? That's when MD5 was a bottleneck because then the column comparator ends up redecorating keys over and over again. The normal read/write path probably do only a handful of md5 computations, so I wouldn't be surprised this doesn't make much difference.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: hashed_partitioner.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440374#comment-13440374 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Might be able to share a describeOwnership implementation b/t M3P and RP that deals with Token<Number>, but +1 from me.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, CASSANDRA-3772-v4.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3772:
--------------------------------------

    Reviewer: vijay  (was: yukim)
    
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-3772:
---------------------------------------

    Attachment: CASSANDRA-3772-v3.patch

Attached patch to use first part of hash3_x64_128 (no copies into byte array) which shows better results than hash2_64. This approach ~18 op points better than previous.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reopened CASSANDRA-3772:
---------------------------------------

    
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439627#comment-13439627 ] 

Pavel Yaskevich commented on CASSANDRA-3772:
--------------------------------------------

What is the point of doing that? Making BigInteger from long is very straight-forward thing, doesn't give more overhead than converting to Long.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-3772:
---------------------------------------

    Fix Version/s:     (was: 1.3)
                   1.2.0
    
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199308#comment-13199308 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Any preliminary results on performance improvements?  (Even single node numbers would be interesting.)
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438143#comment-13438143 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

+1
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238684#comment-13238684 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

Well if we are worried about user mis-configuring.
Option 1) We should probably be gossip about this version and fail the server at the startup.
Option 2) We should make this a make this a keyspace setting and treat it like a comparator (Cannot be changed once KS is created).
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439630#comment-13439630 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Have a look at BigInteger source, it's about 3x as large as a Long.  compareTo is also more complex.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199679#comment-13199679 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Hmm.  You probably need to run at least 10k inserts through it first to make sure the JIT is warm, before timing things.  Otherwise I would expect the (calling out to C) MD5 code to do better than (interpreted) Murmur3.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3772:
------------------------------------

    Attachment: hashed_partitioner.diff

Moved existing RandomPartitioner code to AbstractHashedPartitioner, and added abstract hash method. Created two subclasses, MD5Partitioner and Murmur3Partitioner that extend and implement this method. Added back in a deprecated RandomPartitioner class that identity-derives from MD5Partitioner for backwards compatibility.

patch against trunk
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: hashed_partitioner.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3772:
-----------------------------

    Attachment: MumPartitionerTest.docx

>>> Didn't someone post profiler results fingering MD5 as a bottleneck?
I do have the profile where the md5 was a bottleneck earlier but after changing the bloom filter I am not sure.

Hi Sylvain, Plz find the attachement it is a 3 node setup and i am doing very basic thing (including index scan). I can run let the secondary index test run longer if you want.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: MumPartitionerTest.docx, hashed_partitioner.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-3772:
-----------------------------------------

    Assignee: Pavel Yaskevich  (was: Dave Brosius)

Pavel, can you see if you can reproduce md5 bottlenecks in 2I reads, and see if this partitioner makes a meaningful difference?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3772-Test.patch, MumPartitionerTest.docx, hashed_partitioner.diff, hashed_partitioner_3.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239654#comment-13239654 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Agreed, I'd rather leave RandomPartitioner class name alone since people have been using it for 2+ years now, not worth the headache of renaming things at this point IMO.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: hashed_partitioner.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237446#comment-13237446 ] 

Dave Brosius edited comment on CASSANDRA-3772 at 3/24/12 5:31 AM:
------------------------------------------------------------------

new patch against trunk using MurmurHash.hash3_x64_128.

Preliminary testing shows murmur3 hash to be marginally faster than md5, altho not significantly. (this is on very pedestrian hardware tho, so that might mask differences). Running longer tests now to see if jit has had a fair chance to do it's magic.
                
      was (Author: dbrosius@apache.org):
    new patch against trunk using MurmurHash.hash3_x64_128.

Preliminary testing show murmur3 hash to be marginally faster than md5, altho not significantly. (this is one very pedestrian hardware tho, so that might mask differences). Running longer tests now to see if jit has had a fair chance to do it's magic.
                  
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3772:
--------------------------------------

    Reviewer: vijay2win@yahoo.com  (was: vijay)
    
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238607#comment-13238607 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

+1 running the test now and will commit it once successful (Without spaces in Config).
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439030#comment-13439030 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Sorry, didn't look at the code until commit...

Can you test making it hash to a Long or a 8-byte ByteBuffer?  16-byte BigInteger is overkill, all we need is a reasonable distribution (now that Tokens don't need to be unique) and 64 or even 32 bits is plenty for that.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3772:
------------------------------------

    Attachment: try_murmur3.diff

Add an option
hash_algorithm: murmur3|md5

in the yaml, to specify the hash function.

this requires an upgrade to guava to version 11.0.1, i changed the build.xml, but didn't include the jar in this patch.



                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Drew Kutcharian (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196363#comment-13196363 ] 

Drew Kutcharian commented on CASSANDRA-3772:
--------------------------------------------

True. Not sure the Java implementation would be as fast since the C++ code uses CPU's CRC instructions to speed things up, don't know if that can be done using Java alone. If anything, this should probably be used via JNA.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238695#comment-13238695 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

But implementing a IParitiner will have the same problem as the original option of leaving it as a config option, Where the user have to make sure he changes in all the servers at once (And he shouldn't be changing in the existing clusters which already has some data in it). I am ok with that too... 
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438081#comment-13438081 ] 

Pavel Yaskevich commented on CASSANDRA-3772:
--------------------------------------------

Java Cryptography Architecture doesn't disclose that but from the tests it looks like that it is.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239027#comment-13239027 ] 

Dave Brosius commented on CASSANDRA-3772:
-----------------------------------------

Sure... I will push the current RandomPartitioner code to an AbstractRandomPartitioner class and then subclass a new RandomPartitioner that uses md5 and a second subclass that does murmur3.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438044#comment-13438044 ] 

Radim Kolar commented on CASSANDRA-3772:
----------------------------------------

md5 is implemented using native call in Java?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238658#comment-13238658 ] 

Dave Brosius commented on CASSANDRA-3772:
-----------------------------------------

this patch was only to evaluate... I think we would need to embellish this patch to save the setting in the system table, before committing this patch.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196134#comment-13196134 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

I don't think there is a Java CityHash implementation yet.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288698#comment-13288698 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

bq. Did you guys tested on secondary index reads (with reasonably high cardinality)? That's when MD5 was a bottleneck because then the column comparator ends up redecorating keys over and over again.

Is that something we can address in the index code then, instead of creating a new partitioner?


                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3772-Test.patch, MumPartitionerTest.docx, hashed_partitioner.diff, hashed_partitioner_3.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-3772:
---------------------------------------

    Attachment: CASSANDRA-3772-v2.patch

I have removed ThreadLocal declaration from the M3P (and cleaned whitespace errors) which was the bottleneck, after re-running tests with that modification M3P beats RP with 903 to 847.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239839#comment-13239839 ] 

Sylvain Lebresne commented on CASSANDRA-3772:
---------------------------------------------

I suppose that we'd want to tweak the stress params so that each indexed_slice call returns more results (so that each call end up doing more hash computation). However if we have to tune too much to get even a slight improvement, I wonder if it's worth bothering with that.

Btw, did we micro-benchmarked our version of murmur3 against md5? How much faster is it?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: MumPartitionerTest.docx, hashed_partitioner.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439016#comment-13439016 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Let's do it.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3772:
------------------------------------

    Attachment: try_murmur3_2.diff
    
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3772:
------------------------------------

    Attachment: hashed_partitioner_3.diff

rework to remove MD5Partitioner and just have Random and Murmur3 as comments suggested. Reworked Murmur3 to remove as many allocations as possible. Also just flipped the top bit (rather than abs) to avoid object allocs. Different then doing abs, but shouldn't effect distribution.

Still isn't a significant difference from MD5 tho
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: MumPartitionerTest.docx, hashed_partitioner.diff, hashed_partitioner_3.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3772:
--------------------------------------

       Reviewer: yukim
    Component/s: Core
       Assignee: Dave Brosius

Yuki, can you take a look?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-3772:
---------------------------------------

    Attachment: CASSANDRA-3772-v4.patch

v4 includes the following changes:

  - LongToken is added to o.a.c.dht
  - M3P is changed to use LongToken
  - AbstractHashedPartitioner is removed as no longer needed
  - RP was reverted to it's previous code state (before split into AHP and M3P)
  - NEWS.txt are updated to reflect change in default partitioner
  - added Murmur3PartitionerTest

change to use longs in M3P buys us another ~6-7 op/s comparing to BigIntegerToken.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, CASSANDRA-3772-v4.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238725#comment-13238725 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Right, "upgradeable" isn't a design goal here.  Neither is "make partitioner changes without downtime."
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438002#comment-13438002 ] 

Pavel Yaskevich commented on CASSANDRA-3772:
--------------------------------------------

My tests show that Murmur3Partitioner actually is worse than MD5 with high cardinality indexes, here is what I did (kernel 3.0.0-19, 2.2Ghz quad-core Opteron, 2GB RAM):

For each test:

 - wiped all of the data directories and re-compiled with 'clean'
 - ran stress with -c 50 -C 500 -S 512 -n 50000 (where -c is number of columns, -C values cardinality and -S is value size in bytes) 4 times (to make it hot)

RandomPartitioner:  average op rate is 845.
Murmur3Partitioner: average op rage is 721. 



                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439601#comment-13439601 ] 

Jonathan Ellis commented on CASSANDRA-3772:
-------------------------------------------

Why not just use a Long instead of BigInteger?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, CASSANDRA-3772-v3.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238658#comment-13238658 ] 

Dave Brosius edited comment on CASSANDRA-3772 at 3/26/12 6:48 PM:
------------------------------------------------------------------

this patch was only to evaluate... I think we would need to embellish this patch to save the setting in the system table, before committing this patch. Hmm, then again, it's a chicken and egg thing. You can't read the setting if the setting is wrong in the yaml.... kind of bad. Perhaps system table settings should always be hashed with MD5, and this setting only applies to user keyspaces. 
                
      was (Author: dbrosius@apache.org):
    this patch was only to evaluate... I think we would need to embellish this patch to save the setting in the system table, before committing this patch.
                  
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206830#comment-13206830 ] 

Dave Brosius commented on CASSANDRA-3772:
-----------------------------------------

With 10,000 inserts i'm seeing the same ratios, which i'm having a hard time describing why as again the hash function itself is about the same time.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237446#comment-13237446 ] 

Dave Brosius commented on CASSANDRA-3772:
-----------------------------------------

new patch against trunk using MurmurHash.hash3_x64_128.

Preliminary testing show murmur3 hash to be marginally faster than md5, altho not significantly. (this is one very pedestrian hardware tho, so that might mask differences). Running longer tests now to see if jit has had a fair chance to do it's magic.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415136#comment-13415136 ] 

Radim Kolar commented on CASSANDRA-3772:
----------------------------------------

did you measured CPU time used during md5 vs murmur3 tests? Not only wall clock time.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-3772-Test.patch, MumPartitionerTest.docx, hashed_partitioner.diff, hashed_partitioner_3.diff, try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439012#comment-13439012 ] 

Pavel Yaskevich commented on CASSANDRA-3772:
--------------------------------------------

I don't see any, as it has both good collision resistance and distribution.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238872#comment-13238872 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

Hi Dave, mind writing a partitioner?
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Jeremy Hanna (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239054#comment-13239054 ] 

Jeremy Hanna commented on CASSANDRA-3772:
-----------------------------------------

Will be nice to see more benchmarks as people try this out.  Changing the partitioner, as has been mentioned, is a significant change.  So it will be nice to find out whether it is worth changing or whether it's just good for green field applications.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff, try_murmur3_2.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439031#comment-13439031 ] 

Pavel Yaskevich commented on CASSANDRA-3772:
--------------------------------------------

Sure
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.2.0
>
>         Attachments: 0001-CASSANDRA-3772.patch, 0001-CASSANDRA-3772-Test.patch, CASSANDRA-3772-v2.patch, hashed_partitioner_3.diff, hashed_partitioner.diff, MumPartitionerTest.docx, try_murmur3_2.diff, try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199516#comment-13199516 ] 

Dave Brosius commented on CASSANDRA-3772:
-----------------------------------------

Doing 1000 inserts of a 5 column CF on a single node cluster on a really lousy machine seems to show that the guava murmur hash is significantly slower than MD5. 5x? perhaps. Perhaps it's just the guava implementation, as opposed to the Murmur3 implementation.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3772) Evaluate Murmur3-based partitioner

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207436#comment-13207436 ] 

Vijay commented on CASSANDRA-3772:
----------------------------------

If CASSANDRA-2975 gets committed you should be able to use that.
                
> Evaluate Murmur3-based partitioner
> ----------------------------------
>
>                 Key: CASSANDRA-3772
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>             Fix For: 1.2
>
>         Attachments: try_murmur3.diff
>
>
> MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution.  Let's see how much overhead we can save by using Murmur3 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira