You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2016/01/19 08:05:39 UTC

[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/10827

    [SPARK-12770][SQL] Implement rules for branch elimination for CaseWhen

    The three optimization cases are:
    1. If the first branch's condition is a true literal, remove the CaseWhen and use the value from that branch.
    2. If a branch's condition is a false or null literal, remove that branch.
    3. If only the else branch is left, remove the CaseWhen and use the value from the else branch.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-12770

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10827.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10827
    
----
commit 6c8201c4f884a16660088f3aa695a9b4f773c0d6
Author: Reynold Xin <rx...@databricks.com>
Date:   2016-01-19T07:05:03Z

    [SPARK-12770][SQL] Implement rules for branch elimination for CaseWhen

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172981717
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172760208
  
    cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172762012
  
    **[Test build #49668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49668/consoleFull)** for PR 10827 at commit [`8e980b5`](https://github.com/apache/spark/commit/8e980b5da667f5953d354942cd2c7cb03047b033).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10827#discussion_r50148674
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -635,6 +635,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
         case q: LogicalPlan => q transformExpressionsUp {
           case If(TrueLiteral, trueValue, _) => trueValue
           case If(FalseLiteral, _, falseValue) => falseValue
    +
    +      case e @ CaseWhen(branches, elseValue) if branches.exists(_._1 == FalseLiteral) =>
    +        // If there are branches that are always false, remove them.
    +        // If there are no more branches left, just use the else value.
    +        // Note that these two are handled together here in a single case statement because
    +        // otherwise we cannot determine the data type for the elseValue if it is None (i.e. null).
    +        val newBranches = branches.filter(_._1 != FalseLiteral)
    +        if (newBranches.isEmpty) {
    +          elseValue.getOrElse(Literal.create(null, e.dataType))
    +        } else {
    +          e.copy(branches = newBranches)
    +        }
    +
    +      case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == Some(TrueLiteral) =>
    --- End diff --
    
    Is there any possible the `branches` can be empty? If it is, we need to add a case to remove it with else value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-173030199
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10827#discussion_r50157471
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -635,6 +635,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
         case q: LogicalPlan => q transformExpressionsUp {
           case If(TrueLiteral, trueValue, _) => trueValue
           case If(FalseLiteral, _, falseValue) => falseValue
    +
    +      case e @ CaseWhen(branches, elseValue) if branches.exists(_._1 == FalseLiteral) =>
    +        // If there are branches that are always false, remove them.
    +        // If there are no more branches left, just use the else value.
    +        // Note that these two are handled together here in a single case statement because
    +        // otherwise we cannot determine the data type for the elseValue if it is None (i.e. null).
    +        val newBranches = branches.filter(_._1 != FalseLiteral)
    +        if (newBranches.isEmpty) {
    +          elseValue.getOrElse(Literal.create(null, e.dataType))
    +        } else {
    +          e.copy(branches = newBranches)
    +        }
    +
    +      case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == Some(TrueLiteral) =>
    --- End diff --
    
    yup technicall - i used head before but decided to change it to headOption to be more resilient to errors


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172780242
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49668/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172780240
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172982564
  
    **[Test build #2410 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2410/consoleFull)** for PR 10827 at commit [`57c4088`](https://github.com/apache/spark/commit/57c408869712943fc98c8713067798934c3782ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172779881
  
    **[Test build #49668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49668/consoleFull)** for PR 10827 at commit [`8e980b5`](https://github.com/apache/spark/commit/8e980b5da667f5953d354942cd2c7cb03047b033).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-173007415
  
    **[Test build #2410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2410/consoleFull)** for PR 10827 at commit [`57c4088`](https://github.com/apache/spark/commit/57c408869712943fc98c8713067798934c3782ee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-173008375
  
    updated and tests passed - @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10827#discussion_r50155600
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -635,6 +635,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
         case q: LogicalPlan => q transformExpressionsUp {
           case If(TrueLiteral, trueValue, _) => trueValue
           case If(FalseLiteral, _, falseValue) => falseValue
    +
    +      case e @ CaseWhen(branches, elseValue) if branches.exists(_._1 == FalseLiteral) =>
    +        // If there are branches that are always false, remove them.
    +        // If there are no more branches left, just use the else value.
    +        // Note that these two are handled together here in a single case statement because
    +        // otherwise we cannot determine the data type for the elseValue if it is None (i.e. null).
    +        val newBranches = branches.filter(_._1 != FalseLiteral)
    +        if (newBranches.isEmpty) {
    +          elseValue.getOrElse(Literal.create(null, e.dataType))
    +        } else {
    +          e.copy(branches = newBranches)
    +        }
    +
    +      case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == Some(TrueLiteral) =>
    --- End diff --
    
    so we can just use `head` here instead of `headOption`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10827


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10827#discussion_r50152663
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -635,6 +635,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
         case q: LogicalPlan => q transformExpressionsUp {
           case If(TrueLiteral, trueValue, _) => trueValue
           case If(FalseLiteral, _, falseValue) => falseValue
    +
    +      case e @ CaseWhen(branches, elseValue) if branches.exists(_._1 == FalseLiteral) =>
    +        // If there are branches that are always false, remove them.
    +        // If there are no more branches left, just use the else value.
    +        // Note that these two are handled together here in a single case statement because
    +        // otherwise we cannot determine the data type for the elseValue if it is None (i.e. null).
    +        val newBranches = branches.filter(_._1 != FalseLiteral)
    +        if (newBranches.isEmpty) {
    +          elseValue.getOrElse(Literal.create(null, e.dataType))
    +        } else {
    +          e.copy(branches = newBranches)
    +        }
    +
    +      case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == Some(TrueLiteral) =>
    --- End diff --
    
    No it's not due to the previous rule. If the elseValue is None, and there is no branches, there is actually no way to figure out the data type for the null elseValue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10827#issuecomment-172981719
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49704/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12770][SQL] Implement rules for branch ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10827#discussion_r50170033
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -635,6 +635,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper {
         case q: LogicalPlan => q transformExpressionsUp {
           case If(TrueLiteral, trueValue, _) => trueValue
           case If(FalseLiteral, _, falseValue) => falseValue
    +
    +      case e @ CaseWhen(branches, elseValue) if branches.exists(_._1 == FalseLiteral) =>
    +        // If there are branches that are always false, remove them.
    +        // If there are no more branches left, just use the else value.
    +        // Note that these two are handled together here in a single case statement because
    +        // otherwise we cannot determine the data type for the elseValue if it is None (i.e. null).
    +        val newBranches = branches.filter(_._1 != FalseLiteral)
    +        if (newBranches.isEmpty) {
    +          elseValue.getOrElse(Literal.create(null, e.dataType))
    +        } else {
    +          e.copy(branches = newBranches)
    +        }
    +
    +      case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == Some(TrueLiteral) =>
    --- End diff --
    
    Let me update the code to explain this is just an extra safeguard.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org