You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by sameeragarwal <gi...@git.apache.org> on 2016/03/12 00:40:49 UTC

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/11665

    [SPARK-XXXX][SQL] Support for inferring filters from data constraints

    ## What changes were proposed in this pull request?
    
    **[I'll link it to the JIRA once ASF JIRA is back online]**
    
    This PR generalizes the `NullFiltering` optimizer rule in catalyst to `InferFiltersFromConstraints` that can automatically infer all relevant filters based on an operator's constraints while making sure of 2 things:
    
    (a) no redundant filters are generated, and 
    (b) filters that do not contribute to any further optimizations are not generated.
    
    ## How was this patch tested?
    
    Extended all tests in `InferFiltersFromConstraintsSuite` (that were initially based on `NullFilteringSuite` to test filter inference in `Filter` and `Join` operators.
    
    In particular the 2 tests ( `single inner join with pre-existing filters: filter out values on either side` and `multiple inner joins: filter out values on all sides on equi-join keys` attempts to highlight/test the real potential of this rule for join optimization.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark infer-filters

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11665
    
----
commit 308e93c4854d38c99c37919383aec85b5272fdf9
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-03-11T07:51:50Z

    Add InferFiltersFromConstraints rule

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196584838
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196513407
  
    **[Test build #53101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53101/consoleFull)** for PR 11665 at commit [`4b0cf5f`](https://github.com/apache/spark/commit/4b0cf5f581901919a5513554b2982ef327a4d87a).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197505009
  
    **[Test build #53331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53331/consoleFull)** for PR 11665 at commit [`92d935f`](https://github.com/apache/spark/commit/92d935fc8c1204b5d8272655dbe0e606270e5854).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196519877
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197457111
  
    **[Test build #53330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53330/consoleFull)** for PR 11665 at commit [`336a18e`](https://github.com/apache/spark/commit/336a18e3e9f55514545a28f0bb32658dee2ff70b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55959787
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    Can you please give an example where non-IsNotNull filter conditions might be useful? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56239534
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -78,7 +78,7 @@ abstract class Optimizer extends RuleExecutor[LogicalPlan] {
           CombineLimits,
           CombineUnions,
           // Constant folding and strength reduction
    -      NullFiltering,
    +      InferFiltersFromConstraints,
    --- End diff --
    
    This rule doesn't seem related to the comment above it (same for nullfiltering)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-195625089
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52963/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196169183
  
    **[Test build #53051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53051/consoleFull)** for PR 11665 at commit [`d7f8d34`](https://github.com/apache/spark/commit/d7f8d34676e407911139c480c9018cb78756ba39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196519883
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53104/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196527388
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196513766
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55917728
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
             (child.constraints ++ splitConjunctivePredicates(condition))
    -      if (newIsNotNullConstraints.nonEmpty) {
    -        Filter(And(newIsNotNullConstraints.reduce(And), condition), child)
    +      if (newFilters.nonEmpty) {
    +        Filter(And(newFilters.reduce(And), condition), child)
           } else {
             filter
           }
     
    -    case join @ Join(left, right, joinType, condition) =>
    -      val leftIsNotNullConstraints = join.constraints
    -        .filter(_.isInstanceOf[IsNotNull])
    -        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    -      val rightIsNotNullConstraints =
    -        join.constraints
    -          .filter(_.isInstanceOf[IsNotNull])
    -          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    -      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    -        Filter(leftIsNotNullConstraints.reduce(And), left)
    -      } else {
    -        left
    -      }
    -      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    -        Filter(rightIsNotNullConstraints.reduce(And), right)
    -      } else {
    -        right
    -      }
    -      if (newLeftChild != left || newRightChild != right) {
    -        Join(newLeftChild, newRightChild, joinType, condition)
    -      } else {
    -        join
    +    case join@Join(left, right, joinType, conditionOpt) =>
    +      val additionalConstraints = join.constraints.filter { c =>
    +        // Only consider constraints that can be pushed down to either the left or the right child
    +        c.references.subsetOf(left.outputSet) || c.references.subsetOf(right.outputSet)} --
    +        (left.constraints ++ right.constraints)
    +      val newConditionOpt = conditionOpt match {
    +        case Some(condition) =>
    +          val newFilters = additionalConstraints -- splitConjunctivePredicates(condition)
    +          if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), condition)) else None
    +        case None =>
    +          if (additionalConstraints.nonEmpty) Option(additionalConstraints.reduce(And)) else None
    --- End diff --
    
    Nit: `Option` -> `Some`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56240703
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,41 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      val newFilters = filter.constraints --
             (child.constraints ++ splitConjunctivePredicates(condition))
    -      if (newIsNotNullConstraints.nonEmpty) {
    -        Filter(And(newIsNotNullConstraints.reduce(And), condition), child)
    +      if (newFilters.nonEmpty) {
    +        Filter(And(newFilters.reduce(And), condition), child)
           } else {
             filter
           }
     
    -    case join @ Join(left, right, joinType, condition) =>
    -      val leftIsNotNullConstraints = join.constraints
    -        .filter(_.isInstanceOf[IsNotNull])
    -        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    -      val rightIsNotNullConstraints =
    -        join.constraints
    -          .filter(_.isInstanceOf[IsNotNull])
    -          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    -      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    -        Filter(leftIsNotNullConstraints.reduce(And), left)
    -      } else {
    -        left
    -      }
    -      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    -        Filter(rightIsNotNullConstraints.reduce(And), right)
    -      } else {
    -        right
    -      }
    -      if (newLeftChild != left || newRightChild != right) {
    -        Join(newLeftChild, newRightChild, joinType, condition)
    -      } else {
    -        join
    +    case join @ Join(left, right, joinType, conditionOpt) =>
    +      val additionalConstraints = join.constraints.filter { c =>
    +        // Only consider constraints that can be pushed down to either the left or the right child
    +        c.references.subsetOf(left.outputSet) || c.references.subsetOf(right.outputSet)} --
    --- End diff --
    
    I think this line is really hard to read. Split the join.constraints.filter{} line from the set difference line and us another local var


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55917694
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    We need to infer the Filter conditions too. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197457635
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53330/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196513497
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196584839
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53113/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197458671
  
    **[Test build #53331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53331/consoleFull)** for PR 11665 at commit [`92d935f`](https://github.com/apache/spark/commit/92d935fc8c1204b5d8272655dbe0e606270e5854).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-195620519
  
    cc @nongli @yhuai @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55917737
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
             (child.constraints ++ splitConjunctivePredicates(condition))
    -      if (newIsNotNullConstraints.nonEmpty) {
    -        Filter(And(newIsNotNullConstraints.reduce(And), condition), child)
    +      if (newFilters.nonEmpty) {
    +        Filter(And(newFilters.reduce(And), condition), child)
           } else {
             filter
           }
     
    -    case join @ Join(left, right, joinType, condition) =>
    -      val leftIsNotNullConstraints = join.constraints
    -        .filter(_.isInstanceOf[IsNotNull])
    -        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    -      val rightIsNotNullConstraints =
    -        join.constraints
    -          .filter(_.isInstanceOf[IsNotNull])
    -          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    -      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    -        Filter(leftIsNotNullConstraints.reduce(And), left)
    -      } else {
    -        left
    -      }
    -      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    -        Filter(rightIsNotNullConstraints.reduce(And), right)
    -      } else {
    -        right
    -      }
    -      if (newLeftChild != left || newRightChild != right) {
    -        Join(newLeftChild, newRightChild, joinType, condition)
    -      } else {
    -        join
    +    case join@Join(left, right, joinType, conditionOpt) =>
    +      val additionalConstraints = join.constraints.filter { c =>
    +        // Only consider constraints that can be pushed down to either the left or the right child
    +        c.references.subsetOf(left.outputSet) || c.references.subsetOf(right.outputSet)} --
    +        (left.constraints ++ right.constraints)
    +      val newConditionOpt = conditionOpt match {
    +        case Some(condition) =>
    +          val newFilters = additionalConstraints -- splitConjunctivePredicates(condition)
    +          if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), condition)) else None
    --- End diff --
    
    Nit: `Option` -> `Some`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56068722
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    I think I see what you mean. Queries of the form: `x.join(y, Inner, Some(x.a === y.a)).where(x.a > 5)` can be optimized to `x.where('a > 5).join(y.where('a > 5), Inner, Some(x.a === y.a)`. Thanks, I'll add it back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196584622
  
    **[Test build #53113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53113/consoleFull)** for PR 11665 at commit [`4b0cf5f`](https://github.com/apache/spark/commit/4b0cf5f581901919a5513554b2982ef327a4d87a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197505404
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53331/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197457623
  
    **[Test build #53330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53330/consoleFull)** for PR 11665 at commit [`336a18e`](https://github.com/apache/spark/commit/336a18e3e9f55514545a28f0bb32658dee2ff70b).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55917538
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
             (child.constraints ++ splitConjunctivePredicates(condition))
    -      if (newIsNotNullConstraints.nonEmpty) {
    -        Filter(And(newIsNotNullConstraints.reduce(And), condition), child)
    +      if (newFilters.nonEmpty) {
    +        Filter(And(newFilters.reduce(And), condition), child)
           } else {
             filter
           }
     
    -    case join @ Join(left, right, joinType, condition) =>
    -      val leftIsNotNullConstraints = join.constraints
    -        .filter(_.isInstanceOf[IsNotNull])
    -        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    -      val rightIsNotNullConstraints =
    -        join.constraints
    -          .filter(_.isInstanceOf[IsNotNull])
    -          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    -      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    -        Filter(leftIsNotNullConstraints.reduce(And), left)
    -      } else {
    -        left
    -      }
    -      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    -        Filter(rightIsNotNullConstraints.reduce(And), right)
    -      } else {
    -        right
    -      }
    -      if (newLeftChild != left || newRightChild != right) {
    -        Join(newLeftChild, newRightChild, joinType, condition)
    -      } else {
    -        join
    +    case join@Join(left, right, joinType, conditionOpt) =>
    --- End diff --
    
    Nit: `join @ Join`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196202604
  
    **[Test build #53051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53051/consoleFull)** for PR 11665 at commit [`d7f8d34`](https://github.com/apache/spark/commit/d7f8d34676e407911139c480c9018cb78756ba39).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55959595
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
             (child.constraints ++ splitConjunctivePredicates(condition))
    -      if (newIsNotNullConstraints.nonEmpty) {
    -        Filter(And(newIsNotNullConstraints.reduce(And), condition), child)
    +      if (newFilters.nonEmpty) {
    +        Filter(And(newFilters.reduce(And), condition), child)
           } else {
             filter
           }
     
    -    case join @ Join(left, right, joinType, condition) =>
    -      val leftIsNotNullConstraints = join.constraints
    -        .filter(_.isInstanceOf[IsNotNull])
    -        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    -      val rightIsNotNullConstraints =
    -        join.constraints
    -          .filter(_.isInstanceOf[IsNotNull])
    -          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    -      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    -        Filter(leftIsNotNullConstraints.reduce(And), left)
    -      } else {
    -        left
    -      }
    -      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    -        Filter(rightIsNotNullConstraints.reduce(And), right)
    -      } else {
    -        right
    -      }
    -      if (newLeftChild != left || newRightChild != right) {
    -        Join(newLeftChild, newRightChild, joinType, condition)
    -      } else {
    -        join
    +    case join@Join(left, right, joinType, conditionOpt) =>
    +      val additionalConstraints = join.constraints.filter { c =>
    +        // Only consider constraints that can be pushed down to either the left or the right child
    +        c.references.subsetOf(left.outputSet) || c.references.subsetOf(right.outputSet)} --
    +        (left.constraints ++ right.constraints)
    +      val newConditionOpt = conditionOpt match {
    +        case Some(condition) =>
    +          val newFilters = additionalConstraints -- splitConjunctivePredicates(condition)
    +          if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), condition)) else None
    +        case None =>
    +          if (additionalConstraints.nonEmpty) Option(additionalConstraints.reduce(And)) else None
    --- End diff --
    
    uh, I see. In the code base, most of codes are using `Some`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56239773
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,41 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    --- End diff --
    
    This comment seems like a clearer version of the first sentence in the object comment. I'd remove this here and replace the first sentence of the above comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197457633
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197548312
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-195625087
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56084805
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    Great! Will expect to see your PR. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196513504
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53101/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56084341
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    I'll have the physical plan changes in a separate PR (Need to look into TPCDS q56 and q59)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196515147
  
    **[Test build #53104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53104/consoleFull)** for PR 11665 at commit [`4b0cf5f`](https://github.com/apache/spark/commit/4b0cf5f581901919a5513554b2982ef327a4d87a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197456454
  
    thanks, all comments addressed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-195604670
  
    **[Test build #52963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52963/consoleFull)** for PR 11665 at commit [`308e93c`](https://github.com/apache/spark/commit/308e93c4854d38c99c37919383aec85b5272fdf9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55959538
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
             (child.constraints ++ splitConjunctivePredicates(condition))
    -      if (newIsNotNullConstraints.nonEmpty) {
    -        Filter(And(newIsNotNullConstraints.reduce(And), condition), child)
    +      if (newFilters.nonEmpty) {
    +        Filter(And(newFilters.reduce(And), condition), child)
           } else {
             filter
           }
     
    -    case join @ Join(left, right, joinType, condition) =>
    -      val leftIsNotNullConstraints = join.constraints
    -        .filter(_.isInstanceOf[IsNotNull])
    -        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    -      val rightIsNotNullConstraints =
    -        join.constraints
    -          .filter(_.isInstanceOf[IsNotNull])
    -          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    -      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    -        Filter(leftIsNotNullConstraints.reduce(And), left)
    -      } else {
    -        left
    -      }
    -      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    -        Filter(rightIsNotNullConstraints.reduce(And), right)
    -      } else {
    -        right
    -      }
    -      if (newLeftChild != left || newRightChild != right) {
    -        Join(newLeftChild, newRightChild, joinType, condition)
    -      } else {
    -        join
    +    case join@Join(left, right, joinType, conditionOpt) =>
    +      val additionalConstraints = join.constraints.filter { c =>
    +        // Only consider constraints that can be pushed down to either the left or the right child
    +        c.references.subsetOf(left.outputSet) || c.references.subsetOf(right.outputSet)} --
    +        (left.constraints ++ right.constraints)
    +      val newConditionOpt = conditionOpt match {
    +        case Some(condition) =>
    +          val newFilters = additionalConstraints -- splitConjunctivePredicates(condition)
    +          if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), condition)) else None
    --- End diff --
    
    Same as above, although here let me use `reduceOption` instead :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55959492
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
             (child.constraints ++ splitConjunctivePredicates(condition))
    -      if (newIsNotNullConstraints.nonEmpty) {
    -        Filter(And(newIsNotNullConstraints.reduce(And), condition), child)
    +      if (newFilters.nonEmpty) {
    +        Filter(And(newFilters.reduce(And), condition), child)
           } else {
             filter
           }
     
    -    case join @ Join(left, right, joinType, condition) =>
    -      val leftIsNotNullConstraints = join.constraints
    -        .filter(_.isInstanceOf[IsNotNull])
    -        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    -      val rightIsNotNullConstraints =
    -        join.constraints
    -          .filter(_.isInstanceOf[IsNotNull])
    -          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    -      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    -        Filter(leftIsNotNullConstraints.reduce(And), left)
    -      } else {
    -        left
    -      }
    -      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    -        Filter(rightIsNotNullConstraints.reduce(And), right)
    -      } else {
    -        right
    -      }
    -      if (newLeftChild != left || newRightChild != right) {
    -        Join(newLeftChild, newRightChild, joinType, condition)
    -      } else {
    -        join
    +    case join@Join(left, right, joinType, conditionOpt) =>
    +      val additionalConstraints = join.constraints.filter { c =>
    +        // Only consider constraints that can be pushed down to either the left or the right child
    +        c.references.subsetOf(left.outputSet) || c.references.subsetOf(right.outputSet)} --
    +        (left.constraints ++ right.constraints)
    +      val newConditionOpt = conditionOpt match {
    +        case Some(condition) =>
    +          val newFilters = additionalConstraints -- splitConjunctivePredicates(condition)
    +          if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), condition)) else None
    +        case None =>
    +          if (additionalConstraints.nonEmpty) Option(additionalConstraints.reduce(And)) else None
    --- End diff --
    
    We prefer `Option(x)` over `Some(x)` as the former handles null values better (please see https://github.com/databricks/scala-style-guide#options). Here is how scala handles Option(x):
    
    ```scala
      /** An Option factory which creates Some(x) if the argument is not null,
       *  and None if it is null.
       *
       *  @param  x the value
       *  @return   Some(value) if value != null, None if value == null
       */
      def apply[A](x: A): Option[A] = if (x == null) None else Some(x)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55959514
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    The rules `PushPredicateThroughJoin` and `OuterJoinElimination` could utilize the inferred conditions in `Filter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196508028
  
    **[Test build #53101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53101/consoleFull)** for PR 11665 at commit [`4b0cf5f`](https://github.com/apache/spark/commit/4b0cf5f581901919a5513554b2982ef327a4d87a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11665


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196519802
  
    **[Test build #53104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53104/consoleFull)** for PR 11665 at commit [`4b0cf5f`](https://github.com/apache/spark/commit/4b0cf5f581901919a5513554b2982ef327a4d87a).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-195624904
  
    **[Test build #52963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52963/consoleFull)** for PR 11665 at commit [`308e93c`](https://github.com/apache/spark/commit/308e93c4854d38c99c37919383aec85b5272fdf9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-197505399
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56084104
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    yes, please see the last commit 4b0cf5f581901919a5513554b2982ef327a4d87a


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56083639
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    Thank you for letting me know it. Will you do it? Or you want me to do it? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196203085
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53051/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196203080
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55960289
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    For example, given ```case f @ Filter(filterCondition, Join(left, right, joinType, joinCondition))``` When the join type is left/right outer joins, the common filter conditions are unable to push down into Join condition. 
    
    In these cases, we can infer some left/right filter condition by using the common filter conditions, we can push down more conditions into `Join`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11665#issuecomment-196527930
  
    **[Test build #53113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53113/consoleFull)** for PR 11665 at commit [`4b0cf5f`](https://github.com/apache/spark/commit/4b0cf5f581901919a5513554b2982ef327a4d87a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13871][SQL] Support for inferring filte...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r56069506
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    By the way, slightly unrelated to this discussion but Yin, Davies and I had an offline discussion earlier today and we are considering generating the `isNotNull` filters in the physical plan to prevent some TPCDS regressions due to filter pushdown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-XXXX][SQL] Support for inferring filter...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11665#discussion_r55959308
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -598,50 +598,44 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    - * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    - * by inserting isNotNull filters in the query plan. These filters are currently inserted beneath
    - * existing Filters and Join operators and are inferred based on their data constraints.
    + * Eliminate reading unnecessary values if they are not required for correctness (and can help in
    + * optimizing the query) by inserting relevant filters in the query plan based on an operator's
    + * data constraints. These filters are currently inserted to the existing conditions in the Filter
    + * operators and on either side of Join operators.
      *
      * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
      * LeftSemi joins.
      */
    -object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper {
    +  // We generate a list of additional filters from the operator's existing constraint but remove
    +  // those that are either already part of the operator's condition or are part of the operator's
    +  // child constraints.
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case filter @ Filter(condition, child) =>
    -      // We generate a list of additional isNotNull filters from the operator's existing constraints
    -      // but remove those that are either already part of the filter condition or are part of the
    -      // operator's child constraints.
    -      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +      // For the Filter operator, we try to generate additional filters by only inferring the
    +      // IsNotNull constraints. These IsNotNull filters are then used while generating the
    +      // physical plan to quickly short circuit the null checks in the generated code.
    +      val newFilters = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    --- End diff --
    
    This was actually intentional as I couldn't think of an optimization that'd benefit from inferring other filter conditions. Do you have something on your mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org