You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Thomas Sauzedde (JIRA)" <ji...@apache.org> on 2010/09/16 10:29:33 UTC
[jira] Created: (MAHOUT-503) Bad murmur hash implementation ?!?
Bad murmur hash implementation ?!?
----------------------------------
Key: MAHOUT-503
URL: https://issues.apache.org/jira/browse/MAHOUT-503
Project: Mahout
Issue Type: Improvement
Components: Classification
Reporter: Thomas Sauzedde
It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Resolved: (MAHOUT-503) Bad murmur hash implementation ?!?
Posted by Ted Dunning <te...@gmail.com>.
Thanks for the warning anyway!
Btw... since MurmurHash is beginning to look like a critical path on some
modeling, I think I am going to go back and port the 32 bit version.
On Fri, Sep 17, 2010 at 12:20 AM, Thomas Sauzedde (JIRA) <ji...@apache.org>wrote:
>
> [
> https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Thomas Sauzedde resolved MAHOUT-503.
> ------------------------------------
>
> Resolution: Invalid
>
> Didn't notice the comment neither the diff between the code base and the
> original port :-(
> Sorry, but now I know you was aware of a potential issue ;-)
>
> Thanks
>
>
> > Bad murmur hash implementation ?!?
> > ----------------------------------
> >
> > Key: MAHOUT-503
> > URL: https://issues.apache.org/jira/browse/MAHOUT-503
> > Project: Mahout
> > Issue Type: Improvement
> > Components: Classification
> > Reporter: Thomas Sauzedde
> >
> > It looks like the murmur hash implementation is coming from the original
> C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> > According to http://dmy999.com/article/50/murmurhash-2-java-port, (not
> verified myself), this port doesn't produce the same results than the
> original C code.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
[jira] Resolved: (MAHOUT-503) Bad murmur hash implementation ?!?
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Dunning resolved MAHOUT-503.
--------------------------------
Resolution: Not A Problem
New tests checked in.
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
> Assignee: Ted Dunning
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-503) Bad murmur hash implementation ?!?
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910280#action_12910280 ]
Ted Dunning commented on MAHOUT-503:
------------------------------------
Feel free to suggest additional test vectors. Note that I only ported the 64 bit hash.
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-503) Bad murmur hash implementation ?!?
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910823#action_12910823 ]
Hudson commented on MAHOUT-503:
-------------------------------
Integrated in Mahout-Quality #288 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/288/])
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
> Assignee: Ted Dunning
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-503) Bad murmur hash implementation ?!?
Posted by "Thomas Sauzedde (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Sauzedde resolved MAHOUT-503.
------------------------------------
Resolution: Invalid
Didn't notice the comment neither the diff between the code base and the original port :-(
Sorry, but now I know you was aware of a potential issue ;-)
Thanks
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (MAHOUT-503) Bad murmur hash
implementation ?!?
Posted by "Thomas Sauzedde (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910490#action_12910490 ]
Thomas Sauzedde edited comment on MAHOUT-503 at 9/17/10 3:20 AM:
-----------------------------------------------------------------
Didn't notice the comment neither the diff between the code base and the original port :-(
Sorry, but now I know you were aware of a potential issue ;-)
Thanks
was (Author: yaourt):
Didn't notice the comment neither the diff between the code base and the original port :-(
Sorry, but now I know you was aware of a potential issue ;-)
Thanks
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (MAHOUT-503) Bad murmur hash implementation ?!?
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Dunning reopened MAHOUT-503:
--------------------------------
Assignee: Ted Dunning
I just copied over the tests from the blog (all marked as public domain).
Our implementation passed.
Will check in shortly and resolve this issue as not a problem.
But the pointer to additional tests was *very* helpful.
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
> Assignee: Ted Dunning
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-503) Bad murmur hash implementation ?!?
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910279#action_12910279 ]
Ted Dunning commented on MAHOUT-503:
------------------------------------
Note this comment in the test cases for MurmurHash:
{code}
// test data generated by running MurmurHash2_64.cpp
{code}
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (MAHOUT-503) Bad murmur hash
implementation ?!?
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910279#action_12910279 ]
Ted Dunning edited comment on MAHOUT-503 at 9/16/10 3:38 PM:
-------------------------------------------------------------
Note this comment in the test cases for MurmurHash:
{code}
// test data generated by running MurmurHash2_64.cpp
{code}
I did quite a bit of changing around of Andrej's version of the code in this port and generated test vectors much the way that the article you cite did.
was (Author: tdunning):
Note this comment in the test cases for MurmurHash:
{code}
// test data generated by running MurmurHash2_64.cpp
{code}
> Bad murmur hash implementation ?!?
> ----------------------------------
>
> Key: MAHOUT-503
> URL: https://issues.apache.org/jira/browse/MAHOUT-503
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.