You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/11/24 03:43:49 UTC
[spark] branch branch-3.0 updated: [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode for `BucketTransform`

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 200417e  [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode for `BucketTransform`
200417e is described below

commit 200417e47ac400a48af61a2ce119da0041b93712
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Mon Nov 23 19:35:58 2020 -0800

    [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode for `BucketTransform`
    
    This PR aims to change `InMemoryTable` not to use `Tuple.hashCode` for `BucketTransform`.
    
    SPARK-32168 made `InMemoryTable` to handle `BucketTransform` as a hash of `Tuple` which is dependents on Scala versions.
    - https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala#L159
    
    **Scala 2.12.10**
    ```scala
    $ bin/scala
    Welcome to Scala 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272).
    Type in expressions for evaluation. Or try :help.
    
    scala> (1, 1).hashCode
    res0: Int = -2074071657
    ```
    
    **Scala 2.13.3**
    ```scala
    Welcome to Scala 2.13.3 (OpenJDK 64-Bit Server VM, Java 1.8.0_272).
    Type in expressions for evaluation. Or try :help.
    
    scala> (1, 1).hashCode
    val res0: Int = -1669302457
    ```
    
    Yes. This is a correctness issue.
    
    Pass the UT with both Scala 2.12/2.13.
    
    Closes #30477 from dongjoon-hyun/SPARK-33524.
    
    Authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit 8380e00419281cd1b1fc5706d23d5231356a3379)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
index 616fc72..98b6a3b 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
@@ -128,7 +128,9 @@ class InMemoryTable(
             ChronoUnit.HOURS.between(Instant.EPOCH, DateTimeUtils.microsToInstant(micros))
         }
       case BucketTransform(numBuckets, ref) =>
-        (extractor(ref.fieldNames, schema, row).hashCode() & Integer.MAX_VALUE) % numBuckets
+        val (value, dataType) = extractor(ref.fieldNames, schema, row)
+        val valueHashCode = if (value == null) 0 else value.hashCode
+        ((valueHashCode + 31 * dataType.hashCode()) & Integer.MAX_VALUE) % numBuckets
     }
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org