You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Thomas Sauzedde (JIRA)" <ji...@apache.org> on 2010/09/16 10:29:33 UTC

[jira] Created: (MAHOUT-503) Bad murmur hash implementation ?!?

Bad murmur hash implementation ?!?
----------------------------------

                 Key: MAHOUT-503
                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
             Project: Mahout
          Issue Type: Improvement
          Components: Classification
            Reporter: Thomas Sauzedde


It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Resolved: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by Ted Dunning <te...@gmail.com>.
Thanks for the warning anyway!

Btw... since MurmurHash is beginning to look like a critical path on some
modeling, I think I am going to go back and port the 32 bit version.

On Fri, Sep 17, 2010 at 12:20 AM, Thomas Sauzedde (JIRA) <ji...@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Thomas Sauzedde resolved MAHOUT-503.
> ------------------------------------
>
>    Resolution: Invalid
>
> Didn't notice the comment neither the diff between the code base and the
> original port  :-(
> Sorry, but now I know you was aware of a potential issue ;-)
>
> Thanks
>
>
> > Bad murmur hash implementation ?!?
> > ----------------------------------
> >
> >                 Key: MAHOUT-503
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Classification
> >            Reporter: Thomas Sauzedde
> >
> > It looks like the murmur hash implementation is coming from the original
> C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> > According to http://dmy999.com/article/50/murmurhash-2-java-port, (not
> verified myself), this port doesn't produce the same results than the
> original C code.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Resolved: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning resolved MAHOUT-503.
--------------------------------

    Resolution: Not A Problem

New tests checked in.

> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>            Assignee: Ted Dunning
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910280#action_12910280 ] 

Ted Dunning commented on MAHOUT-503:
------------------------------------

Feel free to suggest additional test vectors.  Note that I only ported the 64 bit hash.

> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910823#action_12910823 ] 

Hudson commented on MAHOUT-503:
-------------------------------

Integrated in Mahout-Quality #288 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/288/])
    

> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>            Assignee: Ted Dunning
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Thomas Sauzedde (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Sauzedde resolved MAHOUT-503.
------------------------------------

    Resolution: Invalid

Didn't notice the comment neither the diff between the code base and the original port  :-(
Sorry, but now I know you was aware of a potential issue ;-)

Thanks


> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Thomas Sauzedde (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910490#action_12910490 ] 

Thomas Sauzedde edited comment on MAHOUT-503 at 9/17/10 3:20 AM:
-----------------------------------------------------------------

Didn't notice the comment neither the diff between the code base and the original port  :-(
Sorry, but now I know you were aware of a potential issue ;-)

Thanks


      was (Author: yaourt):
    Didn't notice the comment neither the diff between the code base and the original port  :-(
Sorry, but now I know you was aware of a potential issue ;-)

Thanks

  
> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning reopened MAHOUT-503:
--------------------------------

      Assignee: Ted Dunning

I just copied over the tests from the blog (all marked as public domain).

Our implementation passed.

Will check in shortly and resolve this issue as not a problem.

But the pointer to additional tests was *very* helpful.

> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>            Assignee: Ted Dunning
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910279#action_12910279 ] 

Ted Dunning commented on MAHOUT-503:
------------------------------------

Note this comment in the test cases for MurmurHash:
{code}
    // test data generated by running MurmurHash2_64.cpp
{code}


> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (MAHOUT-503) Bad murmur hash implementation ?!?

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910279#action_12910279 ] 

Ted Dunning edited comment on MAHOUT-503 at 9/16/10 3:38 PM:
-------------------------------------------------------------

Note this comment in the test cases for MurmurHash:
{code}
    // test data generated by running MurmurHash2_64.cpp
{code}

I did quite a bit of changing around of Andrej's version of the code in this port and generated test vectors much the way that the article you cite did.

      was (Author: tdunning):
    Note this comment in the test cases for MurmurHash:
{code}
    // test data generated by running MurmurHash2_64.cpp
{code}

  
> Bad murmur hash implementation ?!?
> ----------------------------------
>
>                 Key: MAHOUT-503
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-503
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Thomas Sauzedde
>
> It looks like the murmur hash implementation is coming from the original C to Java port (see http://www.getopt.org/murmur/MurmurHash.java)
> According to http://dmy999.com/article/50/murmurhash-2-java-port, (not verified myself), this port doesn't produce the same results than the original C code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.