You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tejasapatil <gi...@git.apache.org> on 2017/05/16 03:17:12 UTC

[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

GitHub user tejasapatil opened a pull request:

    https://github.com/apache/spark/pull/17993

    [SPARK-20758][SQL] Add Constant propagation optimization

    ## What changes were proposed in this pull request?
    
    Added a rule based on this logic:
    - look for expression node with an `AND` and none of its children have a `NOT` or `OR` operation.
    - Populate a mapping of attribute => constant value by looking at all the `EqualTo` predicates
    - Using this mapping, replace occurrence of the attributes with the corresponding constant values in both the children of the `AND` node.
    
    ## How was this patch tested?
    
    - Added unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tejasapatil/spark SPARK-20758_const_propagation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17993.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17993
    
----
commit bb3b349d5b895bd8ca5b7db75845dede08601f40
Author: Tejas Patil <te...@fb.com>
Date:   2017-05-02T15:36:18Z

    Initial version of constant propagation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77432/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r116792029
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    --- End diff --
    
    Lets put the collect in a function, so we can avoid the repetition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118736410
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    --- End diff --
    
    `private def`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77133/testReport)** for PR 17993 at commit [`cc026da`](https://github.com/apache/spark/commit/cc026da840714bc2f88076dbb2aafa70aa1fa0b7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r116817636
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    --- End diff --
    
    How about `EqualNullSafe`? Normally, we use `Equality`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    LGTM - merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118851475
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    --- End diff --
    
    We currently infer `is not null` constraints up and down the plan. This could be easily extended to other constraints. Your PR has some overlap with this. However, lets focus on getting this merged first, and then we might take a stab at extending this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77432/testReport)** for PR 17993 at commit [`731f796`](https://github.com/apache/spark/commit/731f79626b38321845ccca8813174a9cfa2331dd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77160/testReport)** for PR 17993 at commit [`aaad78c`](https://github.com/apache/spark/commit/aaad78c4ab0985812798ea9b19fff5c9a4808fdc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77134/testReport)** for PR 17993 at commit [`b8c4147`](https://github.com/apache/spark/commit/b8c4147d3b7dd2c1d0e6b3015042271e754a18cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118849967
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    --- End diff --
    
    Yes the result should be the same. I don't have any theoretical proof if doing this over joins will be safe so want to be cautious here ... any bad rules might lead to correctness bugs which is super bad for end users.
    
    > it would be nice if these constraints are at least generated here
    
    Sorry I am not able to get you here and want to make sure if I am not ignoring your comment. Are you suggesting any changes over the existing version ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77432/testReport)** for PR 17993 at commit [`731f796`](https://github.com/apache/spark/commit/731f79626b38321845ccca8813174a9cfa2331dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118852332
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,167 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +/**
    + * Unit tests for constant propagation in expressions.
    + */
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +  private val columnC = 'c.int
    +
    +  test("basic test") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(columnA === Add(columnB, Literal(1)) && columnB === Literal(10))
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(columnA === Literal(11) && columnB === Literal(10)).analyze
    +
    +    comparePlans(Optimize.execute(query.analyze), correctAnswer)
    +  }
    +
    +  test("with combination of AND and OR predicates") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(
    +        columnA === Add(columnB, Literal(1)) &&
    +          columnB === Literal(10) &&
    +          (columnA === Add(columnC, Literal(3)) || columnB === columnC))
    +      .analyze
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(
    +          columnA === Literal(11) &&
    +            columnB === Literal(10) &&
    +            (columnA === Add(columnC, Literal(3)) || Literal(10) === columnC))
    --- End diff --
    
    I had set a higher value for iterations in previous version of this PR but somehow the unit tests kept failing for me over terminal (surprisingly they worked fine over Intellij). This seems unrelated to the change done in the PR. If you have any advice here, let me know
    
    eg. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77193/testReport/org.apache.spark.sql.catalyst.optimizer/ConstantPropagationSuite/basic_test/ 
    
    ```
    sbt.ForkMain$ForkError: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Max iterations (2) reached for batch ConstantPropagation, tree:
    !Filter ((a#82440 = 11) && (b#82441 = 10))
    +- !Project [a#82440]
       +- LocalRelation <empty>, [a#82437, b#82438, c#82439]
    
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:105)
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
    	at org.apache.spark.sql.catalyst.optimizer.ConstantPropagationSuite$$anonfun$1.apply$mcV$sp(ConstantPropagationSuite.scala:60)
    	at org.apache.spark.sql.catalyst.optimizer.ConstantPropagationSuite$$anonfun$1.apply(ConstantPropagationSuite.scala:50)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77193/testReport)** for PR 17993 at commit [`aaad78c`](https://github.com/apache/spark/commit/aaad78c4ab0985812798ea9b19fff5c9a4808fdc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r116792863
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val rightEntries = right.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1))
    +        val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet
    +
    +        def replaceConstants(expression: Expression) = expression transform {
    +          case a: AttributeReference if constantsMap.contains(a) =>
    --- End diff --
    
    I don't think the double lookup is necessary. `constantsMap.get(a).getOrElse(a)` should cover this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77193/testReport)** for PR 17993 at commit [`aaad78c`](https://github.com/apache/spark/commit/aaad78c4ab0985812798ea9b19fff5c9a4808fdc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77493/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77493/testReport)** for PR 17993 at commit [`38543de`](https://github.com/apache/spark/commit/38543de8b45dc4145182d20e84a5796579548f3f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118765075
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    +      case and: And =>
    +        val conjunctivePredicates =
    +          splitConjunctivePredicates(and)
    +            .filter(expr => expr.isInstanceOf[EqualTo] || expr.isInstanceOf[EqualNullSafe])
    +            .filterNot(expr => containsNonConjunctionPredicates(expr))
    +
    +        val equalityPredicates = conjunctivePredicates.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +          case e @ EqualNullSafe(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualNullSafe(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +
    +        val constantsMap = AttributeMap(equalityPredicates.map(_._1))
    +        val predicates = equalityPredicates.map(_._2).toSet
    --- End diff --
    
    Current impl will pick the last one (ie. `a = 2`) and propagate it. Given that its one of the equality predicates user provided, there is nothing wrong in propagating it. When the query is evaluated, it would return empty result given that `a = 1` and `a = 2` cannot be true at the same time.
    
    ```
    scala> hc.sql(" SELECT * FROM table1 a WHERE a.j = 1 AND a.j = 2 AND a.k = (a.j + 3)").explain(true)
    
    == Physical Plan ==
    *Project [i#51, j#52, k#53]
    +- *Filter ((((isnotnull(k#53) && isnotnull(j#52)) && (j#52 = 1)) && (j#52 = 2)) && (cast(k#53 as int) = 5))
       +- *FileScan orc default.table1[i#51,j#52,k#53] Batched: false, Format: ORC, Location: InMemoryFileIndex[file:/Users/tejasp/warehouse/table1], PartitionFilters: [], PushedFilters: [IsNotNull(k), IsNotNull(j), EqualTo(j,1), EqualTo(j,2)], ReadSchema: struct<i:int,j:int,k:string>
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118851559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  private def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    +      case and: And =>
    +        val conjunctivePredicates =
    +          splitConjunctivePredicates(and)
    +            .filter(expr => expr.isInstanceOf[EqualTo] || expr.isInstanceOf[EqualNullSafe])
    +            .filterNot(expr => containsNonConjunctionPredicates(expr))
    +
    +        val equalityPredicates = conjunctivePredicates.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +          case e @ EqualNullSafe(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualNullSafe(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +
    +        val constantsMap = AttributeMap(equalityPredicates.map(_._1))
    +        val predicates = equalityPredicates.map(_._2).toSet
    +
    +        def replaceConstants(expression: Expression) = expression transform {
    +          case a: AttributeReference =>
    +            constantsMap.get(a) match {
    +              case Some(literal) => literal
    +              case None => a
    +            }
    +        }
    +
    +        and transform {
    +          case e @ EqualTo(_, _) if !predicates.contains(e) => replaceConstants(e)
    --- End diff --
    
    Should we check for identity instead of equality? I think you are doing the latter. What will happen in the following example: `select * from bla where (a = 1 or b = 2) and a = 1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17993


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r116794140
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val rightEntries = right.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1))
    +        val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet
    +
    +        def replaceConstants(expression: Expression) = expression transform {
    +          case a: AttributeReference if constantsMap.contains(a) =>
    +            constantsMap.get(a).getOrElse(a)
    +        }
    +
    +        and transform {
    +          case e @ EqualTo(_, _) if !predicates.contains(e) &&
    +            e.references.exists(ref => constantsMap.contains(ref)) =>
    --- End diff --
    
    Building the `references` map is more expensive, shall we just skip this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117618801
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,102 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +
    +  /**
    +   * Unit tests for constant propagation in expressions.
    --- End diff --
    
    did this change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118765089
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117619380
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    --- End diff --
    
    I was initially doing this for the entire logical plan but now switched to do only for filter operator. 
    Reason: Doing this for the entire logical plan will mess up with JOIN predicates. eg.
    
    ```
    SELECT * FROM a JOIN b ON a.i = 1 AND b.i = a.i
    =>
     SELECT * FROM a JOIN b ON a.i = 1 AND b.i = 1
    ```
    
    .. the result is a cartesian product and Spark fails (asking to set a config). In case of OUTER JOINs, changing the join predicates might cause regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117618788
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    --- End diff --
    
    did this change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118765087
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,154 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +/**
    + * Unit tests for constant propagation in expressions.
    + */
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +  private val columnC = 'c.int
    +
    +  test("basic test") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(columnA === Add(columnB, Literal(1)) && columnB === Literal(10))
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(columnA === Literal(11) && columnB === Literal(10)).analyze
    +
    +    comparePlans(Optimize.execute(query.analyze), correctAnswer)
    +  }
    --- End diff --
    
    added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #76964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76964/testReport)** for PR 17993 at commit [`bb3b349`](https://github.com/apache/spark/commit/bb3b349d5b895bd8ca5b7db75845dede08601f40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r116790653
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    --- End diff --
    
    This might be more straightfoward:
    ```scala
    expression.find {
      case _: Not | _: Or => true
    }.isDefined
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77392/testReport)** for PR 17993 at commit [`399f348`](https://github.com/apache/spark/commit/399f34859e57b7330e39bccdc33b9cd602227bda).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77392/testReport)** for PR 17993 at commit [`399f348`](https://github.com/apache/spark/commit/399f34859e57b7330e39bccdc33b9cd602227bda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118854380
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,167 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +/**
    + * Unit tests for constant propagation in expressions.
    + */
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +  private val columnC = 'c.int
    +
    +  test("basic test") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(columnA === Add(columnB, Literal(1)) && columnB === Literal(10))
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(columnA === Literal(11) && columnB === Literal(10)).analyze
    +
    +    comparePlans(Optimize.execute(query.analyze), correctAnswer)
    +  }
    +
    +  test("with combination of AND and OR predicates") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(
    +        columnA === Add(columnB, Literal(1)) &&
    +          columnB === Literal(10) &&
    +          (columnA === Add(columnC, Literal(3)) || columnB === columnC))
    +      .analyze
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(
    +          columnA === Literal(11) &&
    +            columnB === Literal(10) &&
    +            (columnA === Add(columnC, Literal(3)) || Literal(10) === columnC))
    --- End diff --
    
    That did it !! updated the change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117618796
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val rightEntries = right.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1))
    +        val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet
    +
    +        def replaceConstants(expression: Expression) = expression transform {
    +          case a: AttributeReference if constantsMap.contains(a) =>
    +            constantsMap.get(a).getOrElse(a)
    +        }
    +
    +        and transform {
    +          case e @ EqualTo(_, _) if !predicates.contains(e) &&
    +            e.references.exists(ref => constantsMap.contains(ref)) =>
    --- End diff --
    
    skipped it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77193/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117618804
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    --- End diff --
    
    did this change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118851484
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    --- End diff --
    
    also cc @sameeragarwal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r116790844
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    --- End diff --
    
    `case and: And if containsNonConjunctionPredicates(and)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117618790
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    --- End diff --
    
    sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77134 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77134/testReport)** for PR 17993 at commit [`b8c4147`](https://github.com/apache/spark/commit/b8c4147d3b7dd2c1d0e6b3015042271e754a18cf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117618800
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val rightEntries = right.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1))
    +        val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet
    +
    +        def replaceConstants(expression: Expression) = expression transform {
    +          case a: AttributeReference if constantsMap.contains(a) =>
    --- End diff --
    
    Nice catch !!! I changed the logic to handle that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118737420
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,154 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +/**
    + * Unit tests for constant propagation in expressions.
    + */
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +  private val columnC = 'c.int
    +
    +  test("basic test") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(columnA === Add(columnB, Literal(1)) && columnB === Literal(10))
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(columnA === Literal(11) && columnB === Literal(10)).analyze
    +
    +    comparePlans(Optimize.execute(query.analyze), correctAnswer)
    +  }
    --- End diff --
    
    Could you add a negative test case like `SELECT * FROM t WHERE a=1 and a=2 and b=a+3`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77493/testReport)** for PR 17993 at commit [`38543de`](https://github.com/apache/spark/commit/38543de8b45dc4145182d20e84a5796579548f3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77133/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118852747
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,167 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +/**
    + * Unit tests for constant propagation in expressions.
    + */
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +  private val columnC = 'c.int
    +
    +  test("basic test") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(columnA === Add(columnB, Literal(1)) && columnB === Literal(10))
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(columnA === Literal(11) && columnB === Literal(10)).analyze
    +
    +    comparePlans(Optimize.execute(query.analyze), correctAnswer)
    +  }
    +
    +  test("with combination of AND and OR predicates") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(
    +        columnA === Add(columnB, Literal(1)) &&
    +          columnB === Literal(10) &&
    +          (columnA === Add(columnC, Literal(3)) || columnB === columnC))
    +      .analyze
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(
    +          columnA === Literal(11) &&
    +            columnB === Literal(10) &&
    +            (columnA === Add(columnC, Literal(3)) || Literal(10) === columnC))
    --- End diff --
    
    Hmmm... that means the optimizer is not converging to a fixed point. Could you try to increase the number of iterations? You can also check if the optimizer reaches 100 iterations during regular execution; it should log a warning. If it does something is wrong with the rule, and it might cause the optimizer to run prohibitively long...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    it is weird that jenkins is not kicking off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #76964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76964/testReport)** for PR 17993 at commit [`bb3b349`](https://github.com/apache/spark/commit/bb3b349d5b895bd8ca5b7db75845dede08601f40).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77134/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r116793889
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val rightEntries = right.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +        val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1))
    +        val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet
    +
    +        def replaceConstants(expression: Expression) = expression transform {
    +          case a: AttributeReference if constantsMap.contains(a) =>
    --- End diff --
    
    What happens if I do something stupid like `i = 1 and ((j = 1) = (j = i))`? I think `j = 1` might replaced by `1 = 1`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118846885
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    --- End diff --
    
    Maybe I am being myopic here but the result should be the same right? The only way this regresses is when we plan a `CartesianProduct` instead of an `BroadcastNestedLoopJoin`... I am fine with not optimizing this for now, it would be nice if these constraints are at least generated here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    cc @hvanhovell @gatorsmile @dongjoon-hyun


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76964/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117547245
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,102 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +
    +  /**
    +   * Unit tests for constant propagation in expressions.
    --- End diff --
    
    Hi, @tejasapatil . Nit. It looks like test suite comment. Can we move this comment to line 27?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77392/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118737200
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    +      case and: And =>
    +        val conjunctivePredicates =
    +          splitConjunctivePredicates(and)
    +            .filter(expr => expr.isInstanceOf[EqualTo] || expr.isInstanceOf[EqualNullSafe])
    +            .filterNot(expr => containsNonConjunctionPredicates(expr))
    +
    +        val equalityPredicates = conjunctivePredicates.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +          case e @ EqualNullSafe(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualNullSafe(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +
    +        val constantsMap = AttributeMap(equalityPredicates.map(_._1))
    +        val predicates = equalityPredicates.map(_._2).toSet
    --- End diff --
    
    I'm wondering if it's safe when we have both `a = 1` and `a = 2` at the same time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118852033
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,167 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +/**
    + * Unit tests for constant propagation in expressions.
    + */
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +  private val columnC = 'c.int
    +
    +  test("basic test") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(columnA === Add(columnB, Literal(1)) && columnB === Literal(10))
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(columnA === Literal(11) && columnB === Literal(10)).analyze
    +
    +    comparePlans(Optimize.execute(query.analyze), correctAnswer)
    +  }
    +
    +  test("with combination of AND and OR predicates") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(
    +        columnA === Add(columnB, Literal(1)) &&
    +          columnB === Literal(10) &&
    +          (columnA === Add(columnC, Literal(3)) || columnB === columnC))
    +      .analyze
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(
    +          columnA === Literal(11) &&
    +            columnB === Literal(10) &&
    +            (columnA === Add(columnC, Literal(3)) || Literal(10) === columnC))
    --- End diff --
    
    Perhaps if you increase the number of iterations on `ConstantPropagation` batch...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118852031
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,62 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + *
    + * Approach used:
    + * - Start from AND operator as the root
    + * - Get all the children conjunctive predicates which are EqualTo / EqualNullSafe such that they
    + *   don't have a `NOT` or `OR` operator in them
    + * - Populate a mapping of attribute => constant value by looking at all the equals predicates
    + * - Using this mapping, replace occurrence of the attributes with the corresponding constant values
    + *   in the AND node.
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +  private def containsNonConjunctionPredicates(expression: Expression): Boolean = expression.find {
    +    case _: Not | _: Or => true
    +    case _ => false
    +  }.isDefined
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f: Filter => f transformExpressionsUp {
    +      case and: And =>
    +        val conjunctivePredicates =
    +          splitConjunctivePredicates(and)
    +            .filter(expr => expr.isInstanceOf[EqualTo] || expr.isInstanceOf[EqualNullSafe])
    +            .filterNot(expr => containsNonConjunctionPredicates(expr))
    +
    +        val equalityPredicates = conjunctivePredicates.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e)
    +          case e @ EqualNullSafe(left: AttributeReference, right: Literal) => ((left, right), e)
    +          case e @ EqualNullSafe(left: Literal, right: AttributeReference) => ((right, left), e)
    +        }
    +
    +        val constantsMap = AttributeMap(equalityPredicates.map(_._1))
    +        val predicates = equalityPredicates.map(_._2).toSet
    +
    +        def replaceConstants(expression: Expression) = expression transform {
    +          case a: AttributeReference =>
    +            constantsMap.get(a) match {
    +              case Some(literal) => literal
    +              case None => a
    +            }
    +        }
    +
    +        and transform {
    +          case e @ EqualTo(_, _) if !predicates.contains(e) => replaceConstants(e)
    --- End diff --
    
    Here is the behavior with the PR. Seems reasonable because `a = 1` has to be true so `(a = 1 or b = 2)` would always be `true` and can be eliminated.
    
    ```
    scala> hc.sql(" select * from bla where (a = 1 or b = 2) and a = 1 ").explain(true)
    
    == Physical Plan ==
    *Project [a#34, b#35]
    +- *Filter (isnotnull(a#34) && (a#34 = 1))
       +- *FileScan ....
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r118852021
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala ---
    @@ -0,0 +1,167 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.PlanTest
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +
    +/**
    + * Unit tests for constant propagation in expressions.
    + */
    +class ConstantPropagationSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("AnalysisNodes", Once,
    +        EliminateSubqueryAliases) ::
    +        Batch("ConstantPropagation", Once,
    +          ColumnPruning,
    +          ConstantPropagation,
    +          ConstantFolding,
    +          BooleanSimplification) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  private val columnA = 'a.int
    +  private val columnB = 'b.int
    +  private val columnC = 'c.int
    +
    +  test("basic test") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(columnA === Add(columnB, Literal(1)) && columnB === Literal(10))
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(columnA === Literal(11) && columnB === Literal(10)).analyze
    +
    +    comparePlans(Optimize.execute(query.analyze), correctAnswer)
    +  }
    +
    +  test("with combination of AND and OR predicates") {
    +    val query = testRelation
    +      .select(columnA)
    +      .where(
    +        columnA === Add(columnB, Literal(1)) &&
    +          columnB === Literal(10) &&
    +          (columnA === Add(columnC, Literal(3)) || columnB === columnC))
    +      .analyze
    +
    +    val correctAnswer =
    +      testRelation
    +        .select(columnA)
    +        .where(
    +          columnA === Literal(11) &&
    +            columnB === Literal(10) &&
    +            (columnA === Add(columnC, Literal(3)) || Literal(10) === columnC))
    --- End diff --
    
    Should we be able to infer that `columnA == Literal(11)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17993
  
    **[Test build #77133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77133/testReport)** for PR 17993 at commit [`cc026da`](https://github.com/apache/spark/commit/cc026da840714bc2f88076dbb2aafa70aa1fa0b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17993#discussion_r117618791
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding
    + * value in conjunctive [[Expression Expressions]]
    + * eg.
    + * {{{
    + *   SELECT * FROM table WHERE i = 5 AND j = i + 3
    + *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
    + * }}}
    + */
    +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
    +
    +  def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match {
    +    case Not(_) => true
    +    case Or(_, _) => true
    +    case _ =>
    +      var result = false
    +      expression.children.foreach {
    +        case Not(_) => result = true
    +        case Or(_, _) => result = true
    +        case other => result = result || containsNonConjunctionPredicates(other)
    +      }
    +      result
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      case and @ (left And right)
    +        if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) =>
    +
    +        val leftEntries = left.collect {
    +          case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e)
    --- End diff --
    
    did this change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org