You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/08/05 11:20:10 UTC

[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/1785

    [SPARK-2860][SQL] Fix coercion of CASE WHEN.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark caseNull

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1785
    
----
commit 2fe357f3425277cf1b348b785725540ab3fcbbb8
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-05T08:59:11Z

    Fix coercion of CASE WHEN.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by concretevitamin <gi...@git.apache.org>.
Github user concretevitamin commented on the pull request:

    https://github.com/apache/spark/pull/1785#issuecomment-51232808
  
    A few minor comments otherwise LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by concretevitamin <gi...@git.apache.org>.
Github user concretevitamin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1785#discussion_r15828596
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -336,28 +338,33 @@ trait HiveTypeCoercion {
       }
     
       /**
    -   * Ensures that NullType gets casted to some other types under certain circumstances.
    +   * Coerces the type of different branches of a CASE WHEN statement to a common type.
        */
    -  object CastNulls extends Rule[LogicalPlan] {
    +  object CaseWhenCoercion extends Rule[LogicalPlan] with TypeWidening {
         def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    -      case cw @ CaseWhen(branches) =>
    +      case cw @ CaseWhen(branches) if !cw.resolved && !branches.exists(!_.resolved)  =>
             val valueTypes = branches.sliding(2, 2).map {
    -          case Seq(_, value) if value.resolved => Some(value.dataType)
    -          case Seq(elseVal) if elseVal.resolved => Some(elseVal.dataType)
    -          case _ => None
    +          case Seq(_, value) => value.dataType
    +          case Seq(elseVal) => elseVal.dataType
             }.toSeq
    -        if (valueTypes.distinct.size == 2 && valueTypes.exists(_ == Some(NullType))) {
    -          val otherType = valueTypes.filterNot(_ == Some(NullType))(0).get
    +
    +        logDebug(s"Input values for null casting ${valueTypes.mkString(",")}")
    +
    +        if (valueTypes.distinct.size > 1) {
    +          val commonType = valueTypes.reduce { (v1, v2) =>
    +            findTightestCommonType(v1, v2)
    +              .getOrElse(sys.error(s"Invalid types in CASE WHEN. $v1, $v2"))
    --- End diff --
    
    Perhaps "Value types in CASE WHEN must be the same, or can be coerced to a tightest upper bound." to be more precise about the semantics, in case users run into this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1785#issuecomment-51242449
  
    Thanks for looking this over.  Merged to master and 1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1785


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by concretevitamin <gi...@git.apache.org>.
Github user concretevitamin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1785#discussion_r15828435
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -336,28 +338,33 @@ trait HiveTypeCoercion {
       }
     
       /**
    -   * Ensures that NullType gets casted to some other types under certain circumstances.
    +   * Coerces the type of different branches of a CASE WHEN statement to a common type.
        */
    -  object CastNulls extends Rule[LogicalPlan] {
    +  object CaseWhenCoercion extends Rule[LogicalPlan] with TypeWidening {
         def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    -      case cw @ CaseWhen(branches) =>
    +      case cw @ CaseWhen(branches) if !cw.resolved && !branches.exists(!_.resolved)  =>
             val valueTypes = branches.sliding(2, 2).map {
    -          case Seq(_, value) if value.resolved => Some(value.dataType)
    -          case Seq(elseVal) if elseVal.resolved => Some(elseVal.dataType)
    -          case _ => None
    +          case Seq(_, value) => value.dataType
    +          case Seq(elseVal) => elseVal.dataType
             }.toSeq
    -        if (valueTypes.distinct.size == 2 && valueTypes.exists(_ == Some(NullType))) {
    -          val otherType = valueTypes.filterNot(_ == Some(NullType))(0).get
    +
    +        logDebug(s"Input values for null casting ${valueTypes.mkString(",")}")
    --- End diff --
    
    Do we want to leave this in?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1785#discussion_r15831030
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -336,28 +338,33 @@ trait HiveTypeCoercion {
       }
     
       /**
    -   * Ensures that NullType gets casted to some other types under certain circumstances.
    +   * Coerces the type of different branches of a CASE WHEN statement to a common type.
        */
    -  object CastNulls extends Rule[LogicalPlan] {
    +  object CaseWhenCoercion extends Rule[LogicalPlan] with TypeWidening {
         def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    -      case cw @ CaseWhen(branches) =>
    +      case cw @ CaseWhen(branches) if !cw.resolved && !branches.exists(!_.resolved)  =>
             val valueTypes = branches.sliding(2, 2).map {
    -          case Seq(_, value) if value.resolved => Some(value.dataType)
    -          case Seq(elseVal) if elseVal.resolved => Some(elseVal.dataType)
    -          case _ => None
    +          case Seq(_, value) => value.dataType
    +          case Seq(elseVal) => elseVal.dataType
             }.toSeq
    -        if (valueTypes.distinct.size == 2 && valueTypes.exists(_ == Some(NullType))) {
    -          val otherType = valueTypes.filterNot(_ == Some(NullType))(0).get
    +
    +        logDebug(s"Input values for null casting ${valueTypes.mkString(",")}")
    --- End diff --
    
    Its essentially free and could be useful to turn on if we ever have problems with this rule again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1785#issuecomment-51181256
  
    QA results for PR 1785:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>trait TypeWidening {<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17939/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2860][SQL] Fix coercion of CASE WHEN.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1785#issuecomment-51172893
  
    QA tests have started for PR 1785. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17939/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org