You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mn-mikke <gi...@git.apache.org> on 2018/08/15 08:23:24 UTC

[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

GitHub user mn-mikke opened a pull request:

    https://github.com/apache/spark/pull/22110

    [SPARK-25122][SQL] Deduplication of supports equals code

    ## What changes were proposed in this pull request?
    
    The method ```*supportEquals``` determining whether elements of a data type could be used as items in a hash set or as keys in a hash map is duplicated across multiple collection and higher-order functions.
    
    This PR suggests to deduplicate the method.
    
    ## How was this patch tested?
    
    Run tests in:
    - DataFrameFunctionsSuite
    - CollectionExpressionsSuite
    - HigherOrderExpressionsSuite


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mn-mikke/spark SPARK-25122

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22110
    
----
commit dd292e8cf3ed1788793e626da3a136e9acb9d81c
Author: Marek Novotny <mn...@...>
Date:   2018-08-15T08:18:05Z

    [SPARK-25122][SQL] Deduplication of supports equals code

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94847/testReport)** for PR 22110 at commit [`7d395d2`](https://github.com/apache/spark/commit/7d395d2392440201ec5da08e87e24013bb6e0add).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210493260
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    +   * of a hash map.
    +   */
    +  def typeCanBeHashed(dataType: DataType): Boolean = dataType match {
    --- End diff --
    
    I will change it :)
    
    Just one question to ```hashCode```. If ```case classes``` are used, ```equals``` and ```hashCode``` are generated by compiler. But if we define ```equals``` manually, shouldn't also hold ```a.equals(b) == true``` => ```a.hashCode == b.hashCode```?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94795/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22110


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94824/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94795/testReport)** for PR 22110 at commit [`dd292e8`](https://github.com/apache/spark/commit/dd292e8cf3ed1788793e626da3a136e9acb9d81c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210350632
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    --- End diff --
    
    I'm open to any changes :) But if you want to explicitly mention the ```equals``` method, I would also mention ```hashCode``` generally needed for usage in "hash" collections. But then this not 100% true for Spark's specialized ```OpenHashSets``` and ```OpenHashMaps``` since they calculate hash by themselves. WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94800/testReport)** for PR 22110 at commit [`9ed65cf`](https://github.com/apache/spark/commit/9ed65cf5aa3f440e49db9613379593c23272737e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210213125
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -115,6 +115,8 @@ protected[sql] abstract class AtomicType extends DataType {
       private[sql] type InternalType
       private[sql] val tag: TypeTag[InternalType]
       private[sql] val ordering: Ordering[InternalType]
    +
    +  private[spark] override def supportsEquals: Boolean = true
    --- End diff --
    
    SGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94800/testReport)** for PR 22110 at commit [`9ed65cf`](https://github.com/apache/spark/commit/9ed65cf5aa3f440e49db9613379593c23272737e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210357474
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    --- End diff --
    
    What about changing the name of the method to ```typeCanBeHashed```?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210455974
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    +   * of a hash map.
    +   */
    +  def typeCanBeHashed(dataType: DataType): Boolean = dataType match {
    --- End diff --
    
    hey, this is a weird name, `byte[]` can also be hashed. I'd rather call it `typeWithProperEquals`, and document it as @mgaido91 proposed. I don't think we need to consider `hashCode` here, it's a rule in java world that equals and hashCode should be defined in a coherent way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210593726
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,15 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if the equals method of the elements of the data type is implemented properly.
    +   * This also means that they can be safely used in collections relying on the equals method,
    +   * as sets or maps.
    +   */
    +  def typeWithProperEquals(dataType: DataType): Boolean = dataType match {
    --- End diff --
    
    IMHO `typeSupportsEquals` sounded simpler. I'd not have changed. But I am fine with this too if others agree on it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210215352
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -115,6 +115,8 @@ protected[sql] abstract class AtomicType extends DataType {
       private[sql] type InternalType
       private[sql] val tag: TypeTag[InternalType]
       private[sql] val ordering: Ordering[InternalType]
    +
    +  private[spark] override def supportsEquals: Boolean = true
    --- End diff --
    
    +1 too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210353947
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    --- End diff --
    
    I think it is a common pattern for every well formed class to have equals and hashCode defined in a coherent way. I think what we are doing here is saying: this class has a meaningful equals method, so we can rely on it or this class has a meaningless equals method, like the default one comparing the pointers, so we cannot rely on it. I am open too to any suggestion, I'd only like to have a description which is coherent with the method name, otherwise I feel that either one or the other has to be changed in order to properly reflect what the method does.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94795/testReport)** for PR 22110 at commit [`dd292e8`](https://github.com/apache/spark/commit/dd292e8cf3ed1788793e626da3a136e9acb9d81c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    cc @ueshin @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94847/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210282476
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -115,6 +115,8 @@ protected[sql] abstract class AtomicType extends DataType {
       private[sql] type InternalType
       private[sql] val tag: TypeTag[InternalType]
       private[sql] val ordering: Ordering[InternalType]
    +
    +  private[spark] override def supportsEquals: Boolean = true
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94824/testReport)** for PR 22110 at commit [`d0b0552`](https://github.com/apache/spark/commit/d0b05521e26ec4d0628ac7c940b361d331a37560).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210343741
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    --- End diff --
    
    I think this comment is not very coherent with the method name. I think we can rephrase to something like:
    ```
    Returns true if the equal method of the elements of the data type is implemented properly.
    This also means that they can be safely used in collections relying on the equals method,
    as sets or maps.
    ```
    What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210525593
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    --- End diff --
    
    I don't really like this proposal...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94800/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210207729
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -115,6 +115,8 @@ protected[sql] abstract class AtomicType extends DataType {
       private[sql] type InternalType
       private[sql] val tag: TypeTag[InternalType]
       private[sql] val ordering: Ordering[InternalType]
    +
    +  private[spark] override def supportsEquals: Boolean = true
    --- End diff --
    
    I don't think this should be a property of the data type. It's specific to the `OpenHashSet`. How about we add this method to `object OpenHashSet`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210561778
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    +   * of a hash map.
    +   */
    +  def typeCanBeHashed(dataType: DataType): Boolean = dataType match {
    --- End diff --
    
    we should, and I believe Spark has enforced it with style checker: when you override `equals`, you must override `hashCode` too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210212586
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ---
    @@ -115,6 +115,8 @@ protected[sql] abstract class AtomicType extends DataType {
       private[sql] type InternalType
       private[sql] val tag: TypeTag[InternalType]
       private[sql] val ordering: Ordering[InternalType]
    +
    +  private[spark] override def supportsEquals: Boolean = true
    --- End diff --
    
    Not all of the expressions utilize ```OpenHashSet``` or ```OpenHashMap```. What about ```TypeUtils``` that contains methods like ```getInterpretedOrdering```?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94847/testReport)** for PR 22110 at commit [`7d395d2`](https://github.com/apache/spark/commit/7d395d2392440201ec5da08e87e24013bb6e0add).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22110: [SPARK-25122][SQL] Deduplication of supports equals code

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22110
  
    **[Test build #94824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94824/testReport)** for PR 22110 at commit [`d0b0552`](https://github.com/apache/spark/commit/d0b05521e26ec4d0628ac7c940b361d331a37560).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

Posted by mn-mikke <gi...@git.apache.org>.
Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22110#discussion_r210563516
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala ---
    @@ -73,4 +73,14 @@ object TypeUtils {
         }
         x.length - y.length
       }
    +
    +  /**
    +   * Returns true if elements of the data type could be used as items of a hash set or as keys
    +   * of a hash map.
    +   */
    +  def typeCanBeHashed(dataType: DataType): Boolean = dataType match {
    --- End diff --
    
    Ok, thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org