You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jiangxb1987 <gi...@git.apache.org> on 2018/04/15 15:27:14 UTC

[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/21074

    [SPARK-21811][SQL] Fix the inconsistency behavior when finding the widest common type

    ## What changes were proposed in this pull request?
    
    Currently we find the wider common type by comparing the two types from left to right, this can be a problem when you have two data types which don't have a common type but each can be promoted to StringType.
    
    For instance, if you have a table with the schema:
    [c1: date, c2: string, c3: int]
    
    The following succeeds:
    SELECT coalesce(c1, c2, c3) FROM table
    
    While the following produces an exception:
    SELECT coalesce(c1, c3, c2) FROM table
    
    This is only a issue when the seq of dataTypes contains `StringType` and all the types can do string promotion.
    
    close #19033
    
    ## How was this patch tested?
    
    Add test in `TypeCoercionSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark typeCoercion

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21074.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21074
    
----
commit 803a6a443ba9f7d3dc34d68b0d15f53c1b6054fb
Author: Xingbo Jiang <xi...@...>
Date:   2018-04-15T15:12:16Z

    fix type coercion when promting to StringType

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89516/testReport)** for PR 21074 at commit [`c2abce2`](https://github.com/apache/spark/commit/c2abce2dcb4d4cdf822150b80cf83104cd9e41f3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2409/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89508/testReport)** for PR 21074 at commit [`55eefe0`](https://github.com/apache/spark/commit/55eefe0883ad8aec6a79f56a42085ea645e8eecb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182458121
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -175,11 +175,27 @@ object TypeCoercion {
           })
       }
     
    +  /**
    +   * Whether the data type contains StringType.
    +   */
    +  def hasStringType(dt: DataType): Boolean = dt match {
    +    case StringType => true
    +    case ArrayType(et, _) => hasStringType(et)
    +    // Add StructType if we support string promotion for struct fields in the future.
    +    case _ => false
    +  }
    +
       private def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
    -    types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
    -      case Some(d) => findWiderTypeForTwo(d, c)
    -      case None => None
    -    })
    +    // findWiderTypeForTwo doesn't satisfy the associative law, i.e. (a op b) op c may not equal
    +    // to a op (b op c). This is only a problem for StringType. Excluding StringType,
    --- End diff --
    
    `This is only a problem for StringType or nested StringType in ArrayType. Excluding these types, ...`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182100748
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -176,16 +176,16 @@ object TypeCoercion {
       }
     
       private def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
    -    types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
    -      case Some(d) => findWiderTypeForTwo(d, c)
    -      // Currently we find the wider common type by comparing the two types from left to right,
    -      // this can be a problem when you have two data types which don't have a common type but each
    -      // can be promoted to StringType. For instance, (TimestampType, IntegerType, StringType)
    -      // should have StringType as the wider common type.
    -      case None if types.exists(_ == StringType) &&
    -        types.forall(stringPromotion(_, StringType).nonEmpty) => Some(StringType)
    -      case _ => None
    -    })
    +    // `findWiderTypeForTwo` doesn't satisfy the associative law, i.e. (a op b) op c may not equal
    +    // to a op (b op c). This is only a problem when each of the types is StringType or can be
    --- End diff --
    
    `This is only a problem for StringType`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89463/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89519/testReport)** for PR 21074 at commit [`ff514af`](https://github.com/apache/spark/commit/ff514afd8adf817ee14a7504f0e9e8244c596ab6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89478/testReport)** for PR 21074 at commit [`4ce5081`](https://github.com/apache/spark/commit/4ce5081fb2da8dabb216413fdda4da0f0b061f71).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89459/testReport)** for PR 21074 at commit [`571912f`](https://github.com/apache/spark/commit/571912f3ed21cd3753fa76225f88d0f6d8298989).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89463/testReport)** for PR 21074 at commit [`d77f213`](https://github.com/apache/spark/commit/d77f2136451008b2be6f9f63c8ffcae9a39a2426).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2434/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89478/testReport)** for PR 21074 at commit [`4ce5081`](https://github.com/apache/spark/commit/4ce5081fb2da8dabb216413fdda4da0f0b061f71).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2395/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2443/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89463/testReport)** for PR 21074 at commit [`d77f213`](https://github.com/apache/spark/commit/d77f2136451008b2be6f9f63c8ffcae9a39a2426).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89516/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182284460
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -176,10 +176,16 @@ object TypeCoercion {
       }
     
       private def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
    -    types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
    -      case Some(d) => findWiderTypeForTwo(d, c)
    -      case None => None
    -    })
    +    // findWiderTypeForTwo doesn't satisfy the associative law, i.e. (a op b) op c may not equal
    +    // to a op (b op c). This is only a problem for StringType. Excluding StringType,
    +    // findWiderTypeForTwo satisfies the associative law. For instance, (TimestampType,
    +    // IntegerType, StringType) should have StringType as the wider common type.
    +    val (stringTypes, nonStringTypes) = types.partition(_ == StringType)
    --- End diff --
    
    Out of curiosity, does this work with array types too (array of string vs array of non string types)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21074


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182484023
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1810,6 +1810,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see
      - Since Spark 2.4, writing a dataframe with an empty or nested empty schema using any file formats (parquet, orc, json, text, csv etc.) is not allowed. An exception is thrown when attempting to write dataframes with empty schema. 
      - Since Spark 2.4, Spark compares a DATE type with a TIMESTAMP type after promotes both sides to TIMESTAMP. To set `false` to `spark.sql.hive.compareDateTimestampInTimestamp` restores the previous behavior. This option will be removed in Spark 3.0.
      - Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set `true` to `spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the previous behavior. This option will be removed in Spark 3.0.
    + - Since Spark 2.4, finding the widest common type for the arguments of a variadic function(e.g. IN/COALESCE) should always success when each of the types of arguments is either StringType or can be promoted to StringType. Previously this may throw an exception for some specific arguments ordering.
    --- End diff --
    
    > - Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89376/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    cc @HyukjinKwon @gatorsmile @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89508/testReport)** for PR 21074 at commit [`55eefe0`](https://github.com/apache/spark/commit/55eefe0883ad8aec6a79f56a42085ea645e8eecb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r181619518
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -178,7 +178,13 @@ object TypeCoercion {
       private def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
         types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
           case Some(d) => findWiderTypeForTwo(d, c)
    -      case None => None
    +      // Currently we find the wider common type by comparing the two types from left to right,
    --- End diff --
    
    The real problem is, `findWiderTypeForTwo` doesn't satisfy the associative law, i.e. `(a op b) op c` may not equal to `a op (b op c)`. I think `StringType` is the only exception here, it's more clear to do
    ```
    val (stringType, nonStringType) = types.partition(_ == StringType)
    (stringType.distinct ++ nonStringType).foldLeft...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r181620108
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -178,7 +178,13 @@ object TypeCoercion {
       private def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
         types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
           case Some(d) => findWiderTypeForTwo(d, c)
    -      case None => None
    +      // Currently we find the wider common type by comparing the two types from left to right,
    --- End diff --
    
    This is a behavior change. We need to make it configurable. Add a conf and update the migration guide.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    @HyukjinKwon thanks! I encountered some network issue and forgot to come back and merge this...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89376/testReport)** for PR 21074 at commit [`803a6a4`](https://github.com/apache/spark/commit/803a6a443ba9f7d3dc34d68b0d15f53c1b6054fb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89459/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2332/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182295312
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -176,10 +176,16 @@ object TypeCoercion {
       }
     
       private def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
    -    types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
    -      case Some(d) => findWiderTypeForTwo(d, c)
    -      case None => None
    -    })
    +    // findWiderTypeForTwo doesn't satisfy the associative law, i.e. (a op b) op c may not equal
    +    // to a op (b op c). This is only a problem for StringType. Excluding StringType,
    +    // findWiderTypeForTwo satisfies the associative law. For instance, (TimestampType,
    +    // IntegerType, StringType) should have StringType as the wider common type.
    +    val (stringTypes, nonStringTypes) = types.partition(_ == StringType)
    --- End diff --
    
    It's expected to , let me also fix it for array types. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Seems missed out. I merged this to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89519/testReport)** for PR 21074 at commit [`ff514af`](https://github.com/apache/spark/commit/ff514afd8adf817ee14a7504f0e9e8244c596ab6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182651253
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala ---
    @@ -572,6 +575,16 @@ class TypeCoercionSuite extends AnalysisTest {
           Coalesce(Seq(nullLit, floatNullLit, doubleLit, stringLit)),
           Coalesce(Seq(Cast(nullLit, StringType), Cast(floatNullLit, StringType),
             Cast(doubleLit, StringType), Cast(stringLit, StringType))))
    +
    +    ruleTest(rule,
    +      Coalesce(Seq(timestampLit, intLit, stringLit)),
    +      Coalesce(Seq(Cast(timestampLit, StringType), Cast(intLit, StringType),
    +        Cast(stringLit, StringType))))
    +
    +    ruleTest(rule,
    +      Coalesce(Seq(tsArrayLit, intArrayLit, strArrayLit)),
    +      Coalesce(Seq(Cast(tsArrayLit, ArrayType(StringType)),
    +        Cast(intArrayLit, ArrayType(StringType)), Cast(strArrayLit, ArrayType(StringType)))))
    --- End diff --
    
    We usually don't add end-to-end tests for type coercion changes, as the type coercion logic is pretty isolated, it's very unlikely that we can pass the unit test but not end-to-end test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89508/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2398/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89376/testReport)** for PR 21074 at commit [`803a6a4`](https://github.com/apache/spark/commit/803a6a443ba9f7d3dc34d68b0d15f53c1b6054fb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182301592
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -176,10 +176,18 @@ object TypeCoercion {
       }
     
       private def findWiderCommonType(types: Seq[DataType]): Option[DataType] = {
    -    types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
    -      case Some(d) => findWiderTypeForTwo(d, c)
    -      case None => None
    -    })
    +    // findWiderTypeForTwo doesn't satisfy the associative law, i.e. (a op b) op c may not equal
    +    // to a op (b op c). This is only a problem for StringType. Excluding StringType,
    +    // findWiderTypeForTwo satisfies the associative law. For instance, (TimestampType,
    +    // IntegerType, StringType) should have StringType as the wider common type.
    +    val (stringTypes, nonStringTypes) = types.partition { t =>
    +      t == StringType || t == ArrayType(StringType)
    --- End diff --
    
    we need something like
    ```
    def hasStringType(dt: DataType): Boolean = dt match {
      case StringType => true
      case ArrayType(et, _) => hasStringType(et)
      case _ => false // Add StructType if we support string promotion for struct fields in the future.
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89478/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89519/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2441/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89459/testReport)** for PR 21074 at commit [`571912f`](https://github.com/apache/spark/commit/571912f3ed21cd3753fa76225f88d0f6d8298989).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    **[Test build #89516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89516/testReport)** for PR 21074 at commit [`c2abce2`](https://github.com/apache/spark/commit/c2abce2dcb4d4cdf822150b80cf83104cd9e41f3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21074
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21074#discussion_r182595455
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala ---
    @@ -572,6 +575,16 @@ class TypeCoercionSuite extends AnalysisTest {
           Coalesce(Seq(nullLit, floatNullLit, doubleLit, stringLit)),
           Coalesce(Seq(Cast(nullLit, StringType), Cast(floatNullLit, StringType),
             Cast(doubleLit, StringType), Cast(stringLit, StringType))))
    +
    +    ruleTest(rule,
    +      Coalesce(Seq(timestampLit, intLit, stringLit)),
    +      Coalesce(Seq(Cast(timestampLit, StringType), Cast(intLit, StringType),
    +        Cast(stringLit, StringType))))
    +
    +    ruleTest(rule,
    +      Coalesce(Seq(tsArrayLit, intArrayLit, strArrayLit)),
    +      Coalesce(Seq(Cast(tsArrayLit, ArrayType(StringType)),
    +        Cast(intArrayLit, ArrayType(StringType)), Cast(strArrayLit, ArrayType(StringType)))))
    --- End diff --
    
    Could you add an end to end test case that can trigger this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org