You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/02/16 21:23:59 UTC

[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20630

    [SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations

    ## What changes were proposed in this pull request?
    Murmur3 hash generates a different value from the original and other implementations (like Scala standard library and Guava or so) when the length of a bytes array is not multiple of 4.
    
    ## How was this patch tested?
    Added a unit test.
    
    **Note:** When we merge this PR, please give all the credits to Shintaro Murakami. 
    
    Author: Shintaro Murakami <mr...@gmail.com>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark pr-20568

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20630
    
----
commit c20cd97d7ce5690993b4490bb7cca955e7703d90
Author: Shintaro Murakami <mr...@...>
Date:   2018-02-10T14:29:48Z

    [SPARK-23381][CORE] Fix Murmur3 for byte arrays whose byte length is not a multiple of 4

commit 5fcd994554a82d809ac8547378776e2b95c37d93
Author: gatorsmile <ga...@...>
Date:   2018-02-16T21:05:32Z

    fix.

commit c406f9846f81ddf8030803cb918bb51506d4ae6b
Author: gatorsmile <ga...@...>
Date:   2018-02-16T21:19:28Z

    fix.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20630#discussion_r168885323
  
    --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java ---
    @@ -51,6 +53,23 @@ public void testKnownLongInputs() {
         Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE));
       }
     
    +  // SPARK-23381 Check whether the hash of the byte array is the same as another implementations
    --- End diff --
    
    We could add a randomized (disabled) test here. That might increase the confidence we have in the new hash.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    cc @hvanhovell @sameeragarwal @jkbradley @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20630#discussion_r168904164
  
    --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java ---
    @@ -51,6 +53,23 @@ public void testKnownLongInputs() {
         Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE));
       }
     
    +  // SPARK-23381 Check whether the hash of the byte array is the same as another implementations
    --- End diff --
    
    Yeah, we can do it in the follow-up PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20630


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87514/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    The ML changes LGTM.  Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    **[Test build #87514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87514/testReport)** for PR 20630 at commit [`c406f98`](https://github.com/apache/spark/commit/c406f9846f81ddf8030803cb918bb51506d4ae6b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/940/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20630#discussion_r168904179
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java ---
    @@ -60,6 +60,8 @@ public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, i
       }
     
       public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, int seed) {
    +    // This is not compatible with original and another implementations.
    +    // But remain it for backward compatibility for the components existing before 2.3.
    --- End diff --
    
    Will correct it in the follow-up PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    **[Test build #87514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87514/testReport)** for PR 20630 at commit [`c406f98`](https://github.com/apache/spark/commit/c406f9846f81ddf8030803cb918bb51506d4ae6b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20630#discussion_r168883881
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java ---
    @@ -60,6 +60,8 @@ public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, i
       }
     
       public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, int seed) {
    +    // This is not compatible with original and another implementations.
    +    // But remain it for backward compatibility for the components existing before 2.3.
    --- End diff --
    
    NIT: touch this up. The sentence is a bit weird, i.e.: `However we retain this implementation for backwards compatibility with pre-existing (pre 2.3) components.`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20630
  
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org