You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/02/16 21:23:59 UTC
[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...
GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/20630
[SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations
## What changes were proposed in this pull request?
Murmur3 hash generates a different value from the original and other implementations (like Scala standard library and Guava or so) when the length of a bytes array is not multiple of 4.
## How was this patch tested?
Added a unit test.
**Note:** When we merge this PR, please give all the credits to Shintaro Murakami.
Author: Shintaro Murakami <mr...@gmail.com>
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark pr-20568
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20630.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20630
----
commit c20cd97d7ce5690993b4490bb7cca955e7703d90
Author: Shintaro Murakami <mr...@...>
Date: 2018-02-10T14:29:48Z
[SPARK-23381][CORE] Fix Murmur3 for byte arrays whose byte length is not a multiple of 4
commit 5fcd994554a82d809ac8547378776e2b95c37d93
Author: gatorsmile <ga...@...>
Date: 2018-02-16T21:05:32Z
fix.
commit c406f9846f81ddf8030803cb918bb51506d4ae6b
Author: gatorsmile <ga...@...>
Date: 2018-02-16T21:19:28Z
fix.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/20630#discussion_r168885323
--- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java ---
@@ -51,6 +53,23 @@ public void testKnownLongInputs() {
Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE));
}
+ // SPARK-23381 Check whether the hash of the byte array is the same as another implementations
--- End diff --
We could add a randomized (disabled) test here. That might increase the confidence we have in the new hash.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20630
cc @hvanhovell @sameeragarwal @jkbradley @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20630#discussion_r168904164
--- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/hash/Murmur3_x86_32Suite.java ---
@@ -51,6 +53,23 @@ public void testKnownLongInputs() {
Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE));
}
+ // SPARK-23381 Check whether the hash of the byte array is the same as another implementations
--- End diff --
Yeah, we can do it in the follow-up PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20630
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20630
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20630
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20630
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87514/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/20630
The ML changes LGTM. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20630
**[Test build #87514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87514/testReport)** for PR 20630 at commit [`c406f98`](https://github.com/apache/spark/commit/c406f9846f81ddf8030803cb918bb51506d4ae6b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20630
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/940/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20630#discussion_r168904179
--- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java ---
@@ -60,6 +60,8 @@ public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, i
}
public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, int seed) {
+ // This is not compatible with original and another implementations.
+ // But remain it for backward compatibility for the components existing before 2.3.
--- End diff --
Will correct it in the follow-up PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20630
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20630
**[Test build #87514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87514/testReport)** for PR 20630 at commit [`c406f98`](https://github.com/apache/spark/commit/c406f9846f81ddf8030803cb918bb51506d4ae6b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20630: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/20630#discussion_r168883881
--- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java ---
@@ -60,6 +60,8 @@ public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, i
}
public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, int seed) {
+ // This is not compatible with original and another implementations.
+ // But remain it for backward compatibility for the components existing before 2.3.
--- End diff --
NIT: touch this up. The sentence is a bit weird, i.e.: `However we retain this implementation for backwards compatibility with pre-existing (pre 2.3) components.`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20630: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20630
Thanks! Merged to master/2.3
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org