You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sameeragarwal <gi...@git.apache.org> on 2016/02/25 19:47:21 UTC

[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/11372

    [WIP][SPARK-13495][SQL] Add Null Filters in the query plan for Filters/Joins based on their data constraints

    ## What changes were proposed in this pull request?
    
    This PR adds an optimizer rule to eliminate reading (unnecessary) NULL values if they are not required for correctness by inserting `isNotNull` filters is the query plan. These filters are currently inserted beneath existing `Filter` and `Join` operators and are inferred based on their data constraints.
     
    Note: While this optimization is applicable to all types of join, it primarily benefits `Inner` and `LeftSemi` joins.
    
    ## How was this patch tested?
    
    WIP


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark gen-isnotnull

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11372.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11372
    
----
commit 15eac821b5328ce34ab2a279fad2e48f471ccbdc
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-02-25T07:49:17Z

    optimizer rules

commit 06d74da3ad1fd2748c395a143cfdd9f99e16009c
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-02-25T18:10:50Z

    Null filtering

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192043321
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52416/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191678876
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52383/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192524065
  
    Thanks @nongli, all comments addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191432575
  
    **[Test build #52338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52338/consoleFull)** for PR 11372 at commit [`28050b3`](https://github.com/apache/spark/commit/28050b3a38607b27846cc1aa879eca82d1f21644).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191468985
  
    **[Test build #52338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52338/consoleFull)** for PR 11372 at commit [`28050b3`](https://github.com/apache/spark/commit/28050b3a38607b27846cc1aa879eca82d1f21644).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class PlanTest extends SparkFunSuite with PredicateHelper `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54460830
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -586,6 +588,62 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath
    + * existing Filters and Join operators and are inferred based on their data constraints.
    + *
    + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
    + * LeftSemi joins.
    + */
    +object NullFiltering extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case filter @ Filter(condition, child: LogicalPlan) =>
    +      val isNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull])
    +      val newCondition = if (isNotNullConstraints.nonEmpty) {
    +        And(isNotNullConstraints.reduce(And), condition)
    +      } else {
    +        condition
    +      }
    +      Filter(newCondition, child)
    +
    +    case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType,
    +      condition: Option[Expression]) =>
    +      val leftIsNotNullConstraints = join.constraints
    +        .filter(_.isInstanceOf[IsNotNull])
    +        .filter(_.references.subsetOf(left.outputSet))
    +      val rightIsNotNullConstraints =
    +        join.constraints
    +          .filter(_.isInstanceOf[IsNotNull])
    +          .filter(_.references.subsetOf(right.outputSet))
    +      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    +        Filter(leftIsNotNullConstraints.reduce(And), left)
    +      } else {
    +        left
    +      }
    +      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    +        Filter(rightIsNotNullConstraints.reduce(And), right)
    +      } else {
    +        right
    +      }
    +      Join(newLeftChild, newRightChild, joinType, condition)
    +  }
    +}
    +
    +/**
    + * Attempts to re-order the individual conjunctive predicates in a filter condition to short circuit
    + * executing relatively cheaper checks (e.g., nullability) before others.
    + */
    +object ReorderFilterPredicates extends Rule[LogicalPlan] with PredicateHelper {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case filter @ Filter(condition, child: LogicalPlan) =>
    +      val reorderedCondition = splitConjunctivePredicates(condition)
    +        .sortWith((x, _) => x.isInstanceOf[IsNotNull])
    +        .reduce(And)
    +      Filter(reorderedCondition, child)
    +  }
    +}
    --- End diff --
    
    yes, that sounds like a good idea! Could there be any other downside of not doing it in the optimizer? /cc @nongli 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-188927082
  
    **[Test build #51980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51980/consoleFull)** for PR 11372 at commit [`06d74da`](https://github.com/apache/spark/commit/06d74da3ad1fd2748c395a143cfdd9f99e16009c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192043044
  
    **[Test build #52416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52416/consoleFull)** for PR 11372 at commit [`013f97a`](https://github.com/apache/spark/commit/013f97a3af010974dcd14198ee5ce073ab73c9ce).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-189079236
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51994/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54990555
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules._
    +
    +class NullFilteringSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("NullFiltering", Once, NullFiltering) ::
    +      Batch("CombineFilters", Once, CombineFilters) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  test("filter: filter out nulls in condition") {
    +    val originalQuery = testRelation.where('a === 1)
    +    val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 1).analyze
    +    val optimized = Optimize.execute(originalQuery.analyze)
    +    comparePlans(optimized, correctAnswer)
    +  }
    +
    +  test("join: filter out nulls on either side") {
    +    val x = testRelation.subquery('x)
    +    val y = testRelation.subquery('y)
    +    val originalQuery = x.join(y,
    +      condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && "y.c".attr > 5))
    +    val left = x.where(IsNotNull('a) && IsNotNull('b))
    +    val right = y.where(IsNotNull('a) && IsNotNull('c))
    +    val correctAnswer = left.join(right,
    +      condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && "y.c".attr > 5)).analyze
    +    val optimized = Optimize.execute(originalQuery.analyze)
    +    comparePlans(optimized, correctAnswer)
    +  }
    +
    +  test("join with pre-existing filters: filter out nulls on either side") {
    +    val x = testRelation.subquery('x)
    +    val y = testRelation.subquery('y)
    +    val originalQuery = x.where('b > 5).join(y.where('c === 10),
    +      condition = Some("x.a".attr === "y.a".attr))
    +    val left = x.where(IsNotNull('a) && IsNotNull('b) && 'b > 5)
    +    val right = y.where(IsNotNull('a) && IsNotNull('c) && 'c === 10)
    +    val correctAnswer = left.join(right,
    +      condition = Some("x.a".attr === "y.a".attr)).analyze
    +    val optimized = Optimize.execute(originalQuery.analyze)
    +    comparePlans(optimized, correctAnswer)
    +  }
    --- End diff --
    
    I had a few more test cases when i tried to this. Can you see if any of them should be added?
    
    https://github.com/nongli/spark/commit/ea0edd46e080cd0a1c6a1d41374563c149a030f7
    
    We should also have outer join tests to make sure they don't add the is not null filter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-189079234
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54990183
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath
    + * existing Filters and Join operators and are inferred based on their data constraints.
    + *
    + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
    + * LeftSemi joins.
    + */
    +object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case filter @ Filter(condition, child: LogicalPlan) =>
    +      // We generate a list of additional isNotNull filters from the operator's existing constraints
    +      // but remove those that are either already part of the filter condition or are part of the
    +      // operator's child constraints.
    +      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +        (child.constraints ++ splitConjunctivePredicates(condition))
    +      val newCondition = if (newIsNotNullConstraints.nonEmpty) {
    +        And(newIsNotNullConstraints.reduce(And), condition)
    +      } else {
    +        condition
    +      }
    +      Filter(newCondition, child)
    +
    +    case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType,
    +      condition: Option[Expression]) =>
    +    val leftIsNotNullConstraints = join.constraints
    +        .filter(_.isInstanceOf[IsNotNull])
    +        .filter(_.references.subsetOf(left.outputSet)) -- left.constraints
    +      val rightIsNotNullConstraints =
    +        join.constraints
    +          .filter(_.isInstanceOf[IsNotNull])
    +          .filter(_.references.subsetOf(right.outputSet)) -- right.constraints
    +      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    +        Filter(leftIsNotNullConstraints.reduce(And), left)
    +      } else {
    +        left
    +      }
    +      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    +        Filter(rightIsNotNullConstraints.reduce(And), right)
    +      } else {
    +        right
    +      }
    +      Join(newLeftChild, newRightChild, joinType, condition)
    --- End diff --
    
    same here, would be nice to reuse `join` if it is not changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192545548
  
    **[Test build #52494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52494/consoleFull)** for PR 11372 at commit [`31b1700`](https://github.com/apache/spark/commit/31b17007d8c6b2883d8f8b2f3bd4c0576fe0ed00).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192546871
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192082317
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191643447
  
    **[Test build #52383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52383/consoleFull)** for PR 11372 at commit [`2a469e8`](https://github.com/apache/spark/commit/2a469e84a618ffe69580eaab6ca77e2a3ab1b7f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192082318
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52431/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-189066489
  
    **[Test build #51994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51994/consoleFull)** for PR 11372 at commit [`2345075`](https://github.com/apache/spark/commit/234507562be69aac9b81aa2b43ea1a1770965ee2).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191678874
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11372


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54990462
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules._
    +
    +class NullFilteringSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("NullFiltering", Once, NullFiltering) ::
    +      Batch("CombineFilters", Once, CombineFilters) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  test("filter: filter out nulls in condition") {
    +    val originalQuery = testRelation.where('a === 1)
    +    val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 1).analyze
    --- End diff --
    
    We can do that in a follow up pr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191974554
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191469704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52338/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-193422565
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-188935441
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51980/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191974032
  
    **[Test build #52406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52406/consoleFull)** for PR 11372 at commit [`80dab7e`](https://github.com/apache/spark/commit/80dab7e8cda525f7a3a375b3b77de0c297008ffb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54989882
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath
    --- End diff --
    
    "in the query plan"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192018317
  
    **[Test build #52416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52416/consoleFull)** for PR 11372 at commit [`013f97a`](https://github.com/apache/spark/commit/013f97a3af010974dcd14198ee5ce073ab73c9ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192082147
  
    **[Test build #52431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52431/consoleFull)** for PR 11372 at commit [`013f97a`](https://github.com/apache/spark/commit/013f97a3af010974dcd14198ee5ce073ab73c9ce).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192050408
  
    **[Test build #52431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52431/consoleFull)** for PR 11372 at commit [`013f97a`](https://github.com/apache/spark/commit/013f97a3af010974dcd14198ee5ce073ab73c9ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191932192
  
    **[Test build #52406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52406/consoleFull)** for PR 11372 at commit [`80dab7e`](https://github.com/apache/spark/commit/80dab7e8cda525f7a3a375b3b77de0c297008ffb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-188935438
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192049839
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54989945
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath
    + * existing Filters and Join operators and are inferred based on their data constraints.
    + *
    + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
    + * LeftSemi joins.
    + */
    +object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case filter @ Filter(condition, child: LogicalPlan) =>
    +      // We generate a list of additional isNotNull filters from the operator's existing constraints
    +      // but remove those that are either already part of the filter condition or are part of the
    +      // operator's child constraints.
    +      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +        (child.constraints ++ splitConjunctivePredicates(condition))
    +      val newCondition = if (newIsNotNullConstraints.nonEmpty) {
    --- End diff --
    
    remove newConditino and just return filter if this doesn't do anything so we can reuse that filter subplan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191974556
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52406/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192546880
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52494/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54990168
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -586,6 +587,52 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath
    + * existing Filters and Join operators and are inferred based on their data constraints.
    + *
    + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
    + * LeftSemi joins.
    + */
    +object NullFiltering extends Rule[LogicalPlan] with PredicateHelper {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case filter @ Filter(condition, child: LogicalPlan) =>
    +      // We generate a list of additional isNotNull filters from the operator's existing constraints
    +      // but remove those that are either already part of the filter condition or are part of the
    +      // operator's child constraints.
    +      val newIsNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull]) --
    +        (child.constraints ++ splitConjunctivePredicates(condition))
    +      val newCondition = if (newIsNotNullConstraints.nonEmpty) {
    +        And(newIsNotNullConstraints.reduce(And), condition)
    +      } else {
    +        condition
    +      }
    +      Filter(newCondition, child)
    +
    +    case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType,
    +      condition: Option[Expression]) =>
    +    val leftIsNotNullConstraints = join.constraints
    --- End diff --
    
    indenting


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191469697
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-188989470
  
    **[Test build #51994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51994/consoleFull)** for PR 11372 at commit [`2345075`](https://github.com/apache/spark/commit/234507562be69aac9b81aa2b43ea1a1770965ee2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r55104948
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules._
    +
    +class NullFilteringSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("NullFiltering", Once, NullFiltering) ::
    +      Batch("CombineFilters", Once, CombineFilters) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  test("filter: filter out nulls in condition") {
    +    val originalQuery = testRelation.where('a === 1)
    +    val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 1).analyze
    +    val optimized = Optimize.execute(originalQuery.analyze)
    +    comparePlans(optimized, correctAnswer)
    +  }
    +
    +  test("join: filter out nulls on either side") {
    +    val x = testRelation.subquery('x)
    +    val y = testRelation.subquery('y)
    +    val originalQuery = x.join(y,
    +      condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && "y.c".attr > 5))
    +    val left = x.where(IsNotNull('a) && IsNotNull('b))
    +    val right = y.where(IsNotNull('a) && IsNotNull('c))
    +    val correctAnswer = left.join(right,
    +      condition = Some("x.a".attr === "y.a".attr && "x.b".attr === 1 && "y.c".attr > 5)).analyze
    +    val optimized = Optimize.execute(originalQuery.analyze)
    +    comparePlans(optimized, correctAnswer)
    +  }
    +
    +  test("join with pre-existing filters: filter out nulls on either side") {
    +    val x = testRelation.subquery('x)
    +    val y = testRelation.subquery('y)
    +    val originalQuery = x.where('b > 5).join(y.where('c === 10),
    +      condition = Some("x.a".attr === "y.a".attr))
    +    val left = x.where(IsNotNull('a) && IsNotNull('b) && 'b > 5)
    +    val right = y.where(IsNotNull('a) && IsNotNull('c) && 'c === 10)
    +    val correctAnswer = left.join(right,
    +      condition = Some("x.a".attr === "y.a".attr)).analyze
    +    val optimized = Optimize.execute(originalQuery.analyze)
    +    comparePlans(optimized, correctAnswer)
    +  }
    --- End diff --
    
    Added a modified version of multi-join test based on your patch and a new test for outer join.
    
    As discussed offline, there is an additional test for non-equal keys (`t1.a !== t2.b`) that requires generating isNotNull constraints for these expressions; I'll make that change in a small followup PR  and add new tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54453127
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -586,6 +588,62 @@ object NullPropagation extends Rule[LogicalPlan] {
     }
     
     /**
    + * Attempts to eliminate reading (unnecessary) NULL values if they are not required for correctness
    + * by inserting isNotNull filters is the query plan. These filters are currently inserted beneath
    + * existing Filters and Join operators and are inferred based on their data constraints.
    + *
    + * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and
    + * LeftSemi joins.
    + */
    +object NullFiltering extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case filter @ Filter(condition, child: LogicalPlan) =>
    +      val isNotNullConstraints = filter.constraints.filter(_.isInstanceOf[IsNotNull])
    +      val newCondition = if (isNotNullConstraints.nonEmpty) {
    +        And(isNotNullConstraints.reduce(And), condition)
    +      } else {
    +        condition
    +      }
    +      Filter(newCondition, child)
    +
    +    case join @ Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType,
    +      condition: Option[Expression]) =>
    +      val leftIsNotNullConstraints = join.constraints
    +        .filter(_.isInstanceOf[IsNotNull])
    +        .filter(_.references.subsetOf(left.outputSet))
    +      val rightIsNotNullConstraints =
    +        join.constraints
    +          .filter(_.isInstanceOf[IsNotNull])
    +          .filter(_.references.subsetOf(right.outputSet))
    +      val newLeftChild = if (leftIsNotNullConstraints.nonEmpty) {
    +        Filter(leftIsNotNullConstraints.reduce(And), left)
    +      } else {
    +        left
    +      }
    +      val newRightChild = if (rightIsNotNullConstraints.nonEmpty) {
    +        Filter(rightIsNotNullConstraints.reduce(And), right)
    +      } else {
    +        right
    +      }
    +      Join(newLeftChild, newRightChild, joinType, condition)
    +  }
    +}
    +
    +/**
    + * Attempts to re-order the individual conjunctive predicates in a filter condition to short circuit
    + * executing relatively cheaper checks (e.g., nullability) before others.
    + */
    +object ReorderFilterPredicates extends Rule[LogicalPlan] with PredicateHelper {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case filter @ Filter(condition, child: LogicalPlan) =>
    +      val reorderedCondition = splitConjunctivePredicates(condition)
    +        .sortWith((x, _) => x.isInstanceOf[IsNotNull])
    +        .reduce(And)
    +      Filter(reorderedCondition, child)
    +  }
    +}
    --- End diff --
    
    I am wondering if the optimizer is the right place for this rule. My main concern is that if we can preserve this ordering through the rest of query compilation. Will it be better to do it inside the physical Filter operator (just before we start to generate the code)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192523072
  
    **[Test build #52494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52494/consoleFull)** for PR 11372 at commit [`31b1700`](https://github.com/apache/spark/commit/31b17007d8c6b2883d8f8b2f3bd4c0576fe0ed00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11372#discussion_r54990320
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NullFilteringSuite.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans._
    +import org.apache.spark.sql.catalyst.plans.logical._
    +import org.apache.spark.sql.catalyst.rules._
    +
    +class NullFilteringSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches = Batch("NullFiltering", Once, NullFiltering) ::
    +      Batch("CombineFilters", Once, CombineFilters) :: Nil
    +  }
    +
    +  val testRelation = LocalRelation('a.int, 'b.int, 'c.int)
    +
    +  test("filter: filter out nulls in condition") {
    +    val originalQuery = testRelation.where('a === 1)
    +    val correctAnswer = testRelation.where(IsNotNull('a) && 'a === 1).analyze
    --- End diff --
    
    you haven't done anything with a === 1 right? There's still no logic that a === 1 has a not nullable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-192043317
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13495][SQL] Add Null Filters in the que...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-191678584
  
    **[Test build #52383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52383/consoleFull)** for PR 11372 at commit [`2a469e8`](https://github.com/apache/spark/commit/2a469e84a618ffe69580eaab6ca77e2a3ab1b7f7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-13495][SQL] Add Null Filters in th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11372#issuecomment-188935355
  
    **[Test build #51980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51980/consoleFull)** for PR 11372 at commit [`06d74da`](https://github.com/apache/spark/commit/06d74da3ad1fd2748c395a143cfdd9f99e16009c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org