You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/11/24 03:43:49 UTC
[spark] branch branch-3.0 updated: [SPARK-33524][SQL][TESTS] Change
`InMemoryTable` not to use Tuple.hashCode for `BucketTransform`
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new 200417e [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode for `BucketTransform`
200417e is described below
commit 200417e47ac400a48af61a2ce119da0041b93712
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Mon Nov 23 19:35:58 2020 -0800
[SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode for `BucketTransform`
This PR aims to change `InMemoryTable` not to use `Tuple.hashCode` for `BucketTransform`.
SPARK-32168 made `InMemoryTable` to handle `BucketTransform` as a hash of `Tuple` which is dependents on Scala versions.
- https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala#L159
**Scala 2.12.10**
```scala
$ bin/scala
Welcome to Scala 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272).
Type in expressions for evaluation. Or try :help.
scala> (1, 1).hashCode
res0: Int = -2074071657
```
**Scala 2.13.3**
```scala
Welcome to Scala 2.13.3 (OpenJDK 64-Bit Server VM, Java 1.8.0_272).
Type in expressions for evaluation. Or try :help.
scala> (1, 1).hashCode
val res0: Int = -1669302457
```
Yes. This is a correctness issue.
Pass the UT with both Scala 2.12/2.13.
Closes #30477 from dongjoon-hyun/SPARK-33524.
Authored-by: Dongjoon Hyun <do...@apache.org>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
(cherry picked from commit 8380e00419281cd1b1fc5706d23d5231356a3379)
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
.../src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
index 616fc72..98b6a3b 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
@@ -128,7 +128,9 @@ class InMemoryTable(
ChronoUnit.HOURS.between(Instant.EPOCH, DateTimeUtils.microsToInstant(micros))
}
case BucketTransform(numBuckets, ref) =>
- (extractor(ref.fieldNames, schema, row).hashCode() & Integer.MAX_VALUE) % numBuckets
+ val (value, dataType) = extractor(ref.fieldNames, schema, row)
+ val valueHashCode = if (value == null) 0 else value.hashCode
+ ((valueHashCode + 31 * dataType.hashCode()) & Integer.MAX_VALUE) % numBuckets
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org