You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aokolnychyi <gi...@git.apache.org> on 2018/10/27 11:13:29 UTC

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

GitHub user aokolnychyi opened a pull request:

    https://github.com/apache/spark/pull/22857

    [SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever possible

    ## What changes were proposed in this pull request?
    
    This PR proposes a new optimization rule that replaces `Literal(null, _)` with `FalseLiteral` in conditions in `Join` and `Filter`, predicates in `If`, conditions in `CaseWhen`.
    
    The idea is that some expressions evaluate to `false` if the underlying expression is `null` (as an example see `GeneratePredicate$create` or `doGenCode` and `eval` methods in `If` and `CaseWhen`). Therefore, we can replace `Literal(null, _)` with `FalseLiteral`, which can lead to more optimizations later on.
    
    Let’s consider a few examples.
    
    ```
    val df = spark.range(1, 100).select($"id".as("l"), ($"id" > 50).as("b"))
    df.createOrReplaceTempView("t")
    df.createOrReplaceTempView("p")
    ```
    
    **Case 1**
    ```
    spark.sql("SELECT * FROM t WHERE if(l > 10, false, NULL)").explain(true)
    
    // without the new rule
    …
    == Optimized Logical Plan ==
    Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- Filter if ((id#0L > 10)) false else null
       +- Range (1, 100, step=1, splits=Some(12))
    
    == Physical Plan ==
    *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- *(1) Filter if ((id#0L > 10)) false else null
       +- *(1) Range (1, 100, step=1, splits=12)
    
    // with the new rule
    …
    == Optimized Logical Plan ==
    LocalRelation <empty>, [l#2L, s#3]
    
    == Physical Plan ==
    LocalTableScan <empty>, [l#2L, s#3]
    ```
    
    **Case 2**
    ```
    spark.sql("SELECT * FROM t WHERE CASE WHEN l < 10 THEN null WHEN l > 40 THEN false ELSE null END”).explain(true)
    
    // without the new rule
    ...
    == Optimized Logical Plan ==
    Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN false ELSE null END
       +- Range (1, 100, step=1, splits=Some(12))
    
    == Physical Plan ==
    *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    +- *(1) Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN false ELSE null END
       +- *(1) Range (1, 100, step=1, splits=12)
    
    // with the new rule
    ...
    == Optimized Logical Plan ==
    LocalRelation <empty>, [l#2L, s#3]
    
    == Physical Plan ==
    LocalTableScan <empty>, [l#2L, s#3]
    ```
    
    **Case 3**
    ```
    spark.sql("SELECT * FROM t JOIN p ON IF(t.l > p.l, null, false)").explain(true)
    
    // without the new rule
    ...
    == Optimized Logical Plan ==
    Join Inner, if ((l#2L > l#37L)) null else false
    :- Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    :  +- Range (1, 100, step=1, splits=Some(12))
    +- Project [id#0L AS l#37L, cast(id#0L as string) AS s#38]
       +- Range (1, 100, step=1, splits=Some(12))
    
    == Physical Plan ==
    BroadcastNestedLoopJoin BuildRight, Inner, if ((l#2L > l#37L)) null else false
    :- *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
    :  +- *(1) Range (1, 100, step=1, splits=12)
    +- BroadcastExchange IdentityBroadcastMode
       +- *(2) Project [id#0L AS l#37L, cast(id#0L as string) AS s#38]
          +- *(2) Range (1, 100, step=1, splits=12)
    
    
    // with the new rule
    ...
    == Optimized Logical Plan ==
    LocalRelation <empty>, [l#2L, s#3, l#37L, s#38]
    ```
    
    ## How was this patch tested?
    
    This PR comes with a set of dedicated tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aokolnychyi/spark spark-25860

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22857.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22857
    
----
commit 1d8fefd9b227c6aba50b7e012726ec292c75b5a1
Author: Anton Okolnychyi <ao...@...>
Date:   2018-10-23T09:09:23Z

    [SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever possible

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228741884
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case If(pred, trueVal, falseVal) if Seq(trueVal, falseVal).forall(isNullOrBoolean) =>
    --- End diff --
    
    Yep, I shortened this to stay in one line below. I can either rename `pred` to `p`or split line 783 into multiple.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228739894
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    --- End diff --
    
    IsNull(Literal(null, _)) => IsNull(FalseLiteral)
    
    Will this be a problem for this change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r236098841
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    +    case And(left, right) =>
    +      And(replaceNullWithFalse(left), replaceNullWithFalse(right))
    +    case Or(left, right) =>
    +      Or(replaceNullWithFalse(left), replaceNullWithFalse(right))
    +    case Literal(null, _) => FalseLiteral
    --- End diff --
    
    Here, for safety, we should check the data types. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229528767
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    --- End diff --
    
    The general rule for `LogicalPlan` at `apply` looks at `predicate` of `If`, no matter its `dataType` is `BooleanType` or not.
    
    But in `replaceNullWithFalse`, the rule for `If` only works if its `dataType` is `BooleanType`. `"replace null in predicates of If inside another If"` is a such case. The `If` inside another `If` is of `BooleanType`. If the inside `If` is not of `BooleanType`, this rule doesn't work. And I think it should be ok to replace the null with false when it is not boolean type.
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228741341
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    --- End diff --
    
    We only do the replacements when 1) within `Join` or `Filter` such as `Filter(If(cond, FalseLiteral, Literal(null, _)))`, or 2) `If(Literal(null, _), trueValue, falseValue)`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    **[Test build #98240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98240/testReport)** for PR 22857 at commit [`4c35955`](https://github.com/apache/spark/commit/4c35955582d3e38722702049780c3c9946a695da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229165497
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    --- End diff --
    
    When `i.dataType != BooleanType`, we still can do `replaceNullWithFalse(pred)`, don't we?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229133550
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case If(pred, trueVal, falseVal) if Seq(trueVal, falseVal).forall(isNullOrBoolean) =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    +    case And(left, right) =>
    --- End diff --
    
    Could you elaborate a bit more on `null && false`?
    
    I had in mind `AND(true, null)` and `OR(false, null)`, which are tricky. For example, if we use `AND(true, null)` in SELECT, we will get `null`. However, if we use it inside `Filter` or predicate of `If`, it will be semantically equivalent to `false` (e.g., `If$eval`). Therefore, the proposed rule has a limited scope. I explored the source code & comments in `And/Or` to come up with an edge case that wouldn’t work. I could not find such a case. To me, it seems safe because the rule is applied only to expressions that evaluate to `false` if the underlying expression is `null` (i.e., conditions in `Filter`/`Join`, predicates in `If`, conditions in `CaseWhen`). 
    
    Please, let me know if you have a particular case to test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    **[Test build #98240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98240/testReport)** for PR 22857 at commit [`4c35955`](https://github.com/apache/spark/commit/4c35955582d3e38722702049780c3c9946a695da).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r236098905
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    --- End diff --
    
    Let us move it to a new file. The file is growing too big. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228779010
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    --- End diff --
    
    how about `cw.dataType == BooleanType || cw.dataType == NullType`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228779125
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    --- End diff --
    
    this applies to `If` as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4615/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    **[Test build #98127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98127/testReport)** for PR 22857 at commit [`1d8fefd`](https://github.com/apache/spark/commit/1d8fefd9b227c6aba50b7e012726ec292c75b5a1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98240/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228739018
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseSuite.scala ---
    @@ -0,0 +1,324 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.{And, CaseWhen, Expression, GreaterThan, If, Literal, Or}
    +import org.apache.spark.sql.catalyst.expressions.Literal.{FalseLiteral, TrueLiteral}
    +import org.apache.spark.sql.catalyst.plans.{Inner, PlanTest}
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types.{BooleanType, IntegerType}
    +
    +class ReplaceNullWithFalseSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Replace null literals", FixedPoint(10),
    +        NullPropagation,
    +        ConstantFolding,
    +        BooleanSimplification,
    +        SimplifyConditionals,
    +        ReplaceNullWithFalse) :: Nil
    +  }
    +
    +  private val testRelation = LocalRelation('i.int, 'b.boolean)
    +  private val anotherTestRelation = LocalRelation('d.int)
    +
    +  test("successful replacement of null literals in filter and join conditions (1)") {
    +    testFilter(originalCond = Literal(null), expectedCond = FalseLiteral)
    +    testJoin(originalCond = Literal(null), expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (2)") {
    +    val originalCond = If(
    +      UnresolvedAttribute("i") > Literal(10),
    +      FalseLiteral,
    +      Literal(null, BooleanType))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (3)") {
    +    val originalCond = If(
    +      UnresolvedAttribute("i") > Literal(10),
    +      TrueLiteral && Literal(null, BooleanType),
    +      UnresolvedAttribute("b") && Literal(null, BooleanType))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (4)") {
    +    val branches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> TrueLiteral,
    +      (UnresolvedAttribute("i") > Literal(40)) -> FalseLiteral)
    +    val originalCond = CaseWhen(branches, Literal(null, BooleanType))
    +    val expectedCond = CaseWhen(branches, FalseLiteral)
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (5)") {
    +    val branches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> Literal(null, BooleanType),
    +      (UnresolvedAttribute("i") > Literal(40)) -> FalseLiteral)
    +    val originalCond = CaseWhen(branches, Literal(null))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (6)") {
    +    val originalBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) ->
    +        If(UnresolvedAttribute("i") < Literal(20), Literal(null, BooleanType), FalseLiteral),
    +      (UnresolvedAttribute("i") > Literal(40)) -> TrueLiteral)
    +    val originalCond = CaseWhen(originalBranches)
    +
    +    val expectedBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> FalseLiteral,
    +      (UnresolvedAttribute("i") > Literal(40)) -> TrueLiteral)
    +    val expectedCond = CaseWhen(expectedBranches)
    +
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (7)") {
    +    val originalBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> TrueLiteral,
    +      (Literal(6) <= Literal(1)) -> FalseLiteral,
    +      (Literal(4) === Literal(5)) -> FalseLiteral,
    +      (UnresolvedAttribute("i") > Literal(10)) -> Literal(null, BooleanType),
    +      (Literal(4) === Literal(4)) -> TrueLiteral)
    +    val originalCond = CaseWhen(originalBranches)
    +
    +    val expectedBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> TrueLiteral,
    +      (UnresolvedAttribute("i") > Literal(10)) -> FalseLiteral,
    +      TrueLiteral -> TrueLiteral)
    +    val expectedCond = CaseWhen(expectedBranches)
    +
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (8)") {
    +    val originalCond = Or(UnresolvedAttribute("b"), Literal(null))
    +    val expectedCond = UnresolvedAttribute("b")
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (9)") {
    +    val originalCond = And(UnresolvedAttribute("b"), Literal(null))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (10)") {
    +    val originalCond = And(
    +      And(UnresolvedAttribute("b"), Literal(null)),
    +      Or(Literal(null), And(Literal(null), And(UnresolvedAttribute("b"), Literal(null)))))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (11)") {
    +    val originalCond = If(
    +      UnresolvedAttribute("i") > Literal(10),
    +      FalseLiteral,
    +      And(UnresolvedAttribute("b"), Literal(null, BooleanType)))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (12)") {
    +    val originalCond = And(
    +      UnresolvedAttribute("b"),
    +      If(
    +        UnresolvedAttribute("i") > Literal(10),
    +        Literal(null),
    +        And(FalseLiteral, UnresolvedAttribute("b"))))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (13)") {
    +    val originalCond = If(
    +      If(UnresolvedAttribute("b"), Literal(null), FalseLiteral),
    +      TrueLiteral,
    +      Literal(null))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (14)") {
    +    val nestedCaseWhen = CaseWhen(Seq(UnresolvedAttribute("b") -> FalseLiteral), Literal(null))
    +    val originalCond = CaseWhen(Seq(nestedCaseWhen -> TrueLiteral), Literal(null))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("inability to replace null literals in filter and join conditions (1)") {
    +    val condition = If(
    +      UnresolvedAttribute("i") > Literal(10),
    +      Literal(5) > If(
    +        UnresolvedAttribute("i") === Literal(15),
    +        Literal(null, IntegerType),
    +        Literal(3)),
    +      FalseLiteral)
    +    testFilter(originalCond = condition, expectedCond = condition)
    +    testJoin(originalCond = condition, expectedCond = condition)
    +  }
    +
    +  test("inability to replace null literals in filter and join conditions (2)") {
    +    val nestedCaseWhen = CaseWhen(
    +      Seq((UnresolvedAttribute("i") > Literal(20)) -> Literal(2)),
    +      Literal(null, IntegerType))
    +    val branchValue = If(
    +      Literal(2) === nestedCaseWhen,
    +      TrueLiteral,
    +      FalseLiteral)
    +    val branches = Seq((UnresolvedAttribute("i") > Literal(10)) -> branchValue)
    +    val condition = CaseWhen(branches)
    +    testFilter(originalCond = condition, expectedCond = condition)
    +    testJoin(originalCond = condition, expectedCond = condition)
    +  }
    +
    +  test("inability to replace null literals in filter and join conditions (3)") {
    +    val condition = If(
    +      Literal(5) > If(
    +        UnresolvedAttribute("i") === Literal(15),
    +        Literal(null, IntegerType),
    +        Literal(3)),
    +      TrueLiteral,
    +      FalseLiteral)
    +    testFilter(originalCond = condition, expectedCond = condition)
    +    testJoin(originalCond = condition, expectedCond = condition)
    +  }
    +
    +  test("successful replacement of null literals in join conditions (1)") {
    +    // this test is only for joins as the condition involves columns from different relations
    +    val originalCond = If(
    +      UnresolvedAttribute("d") > UnresolvedAttribute("i"),
    +      Literal(null),
    +      FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in join conditions (2)") {
    +    // this test is only for joins as the condition involves columns from different relations
    +    val originalBranches = Seq(
    +      (UnresolvedAttribute("d") > UnresolvedAttribute("i")) -> Literal(null),
    +      (UnresolvedAttribute("d") === UnresolvedAttribute("i")) -> TrueLiteral)
    +
    +    val expectedBranches = Seq(
    +      (UnresolvedAttribute("d") > UnresolvedAttribute("i")) -> FalseLiteral,
    +      (UnresolvedAttribute("d") === UnresolvedAttribute("i")) -> TrueLiteral)
    +
    +    testJoin(
    +      originalCond = CaseWhen(originalBranches, FalseLiteral),
    +      expectedCond = CaseWhen(expectedBranches, FalseLiteral))
    +  }
    +
    +  test("inability to replace null literals in join conditions (1)") {
    +    // this test is only for joins as the condition involves columns from different relations
    +    val branches = Seq(
    +      (UnresolvedAttribute("d") > UnresolvedAttribute("i")) -> Literal(null, BooleanType),
    +      (UnresolvedAttribute("d") === UnresolvedAttribute("i")) -> TrueLiteral)
    +    val condition = UnresolvedAttribute("b") === CaseWhen(branches, FalseLiteral)
    +    testJoin(originalCond = condition, expectedCond = condition)
    +  }
    +
    +  test("successful replacement of null literals in if predicates (1)") {
    +    val predicate = And(GreaterThan(UnresolvedAttribute("i"), Literal(0.5)), Literal(null))
    +    testProjection(
    +      originalExpr = If(predicate, Literal(5), Literal(1)).as("out"),
    +      expectedExpr = Literal(1).as("out"))
    +  }
    +
    +  test("successful replacement of null literals in if predicates (2)") {
    +    val predicate = If(
    +      And(GreaterThan(UnresolvedAttribute("i"), Literal(0.5)), Literal(null)),
    +      TrueLiteral,
    +      FalseLiteral)
    +    testProjection(
    +      originalExpr = If(predicate, Literal(5), Literal(1)).as("out"),
    +      expectedExpr = Literal(1).as("out"))
    +  }
    +
    +  test("inability to replace null literals in if predicates") {
    +    val predicate = GreaterThan(
    +      UnresolvedAttribute("i"),
    +      If(UnresolvedAttribute("b"), Literal(null, IntegerType), Literal(4)))
    +    val column = If(predicate, Literal(5), Literal(1)).as("out")
    +    testProjection(originalExpr = column, expectedExpr = column)
    +  }
    +
    +  test("successful replacement of null literals in branches of case when (1)") {
    +    val branches = Seq(
    +      And(GreaterThan(UnresolvedAttribute("i"), Literal(0.5)), Literal(null)) -> Literal(5))
    +    testProjection(
    +      originalExpr = CaseWhen(branches, Literal(2)).as("out"),
    +      expectedExpr = Literal(2).as("out"))
    +  }
    +
    +  test("successful replacement of null literals in branches of case when (2)") {
    +    val nestedCaseWhen = CaseWhen(
    +      Seq(And(UnresolvedAttribute("b"), Literal(null)) -> Literal(5)),
    +      Literal(2))
    +    val branches = Seq(GreaterThan(Literal(3), nestedCaseWhen) -> Literal(1))
    +    testProjection(
    +      originalExpr = CaseWhen(branches).as("out"),
    +      expectedExpr = Literal(1).as("out"))
    +  }
    +
    +  test("inability to replace null literals in branches of case when") {
    +    val condition = GreaterThan(
    +      UnresolvedAttribute("i"),
    +      If(UnresolvedAttribute("b"), Literal(null, IntegerType), Literal(4)))
    +    val column = CaseWhen(Seq(condition -> Literal(5)), Literal(2)).as("out")
    +    testProjection(originalExpr = column, expectedExpr = column)
    +  }
    +
    +  private def testFilter(originalCond: Expression, expectedCond: Expression): Unit = {
    +    test((rel, exp) => rel.where(exp), originalCond, expectedCond)
    +  }
    +
    +  private def testJoin(originalCond: Expression, expectedCond: Expression): Unit = {
    +    test((rel, exp) => rel.join(anotherTestRelation, Inner, Some(exp)), originalCond, expectedCond)
    +  }
    +
    +  private def testProjection(originalExpr: Expression, expectedExpr: Expression): Unit = {
    +    test((rel, exp) => rel.select(exp), originalExpr, expectedExpr)
    +  }
    +
    +  private def test(
    +      func: (LogicalPlan, Expression) => LogicalPlan,
    +      originalExpr: Expression,
    +      expectedExpr: Expression): Unit = {
    +
    +    val originalPlan = func(testRelation, originalExpr).analyze
    +    val optimizedPlan = Optimize.execute(originalPlan)
    +    val expectedPlan = func(testRelation, expectedExpr).analyze
    +    comparePlans(optimizedPlan, expectedPlan)
    +  }
    +
    --- End diff --
    
    remove extra line.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228779097
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    --- End diff --
    
    actually just `cw.dataType == BooleanType`. If an expression is `NullType`, it should be replaced by null literal already.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228739082
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case If(pred, trueVal, falseVal) if Seq(trueVal, falseVal).forall(isNullOrBoolean) =>
    --- End diff --
    
    Nit, in other place, we use `trueValue` and `falseValue`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    **[Test build #98239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98239/testReport)** for PR 22857 at commit [`0eac890`](https://github.com/apache/spark/commit/0eac890be26cb0960e245b88fd771233d21850ee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4555/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229449496
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    --- End diff --
    
    Let me know if I got you correctly here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    **[Test build #98317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98317/testReport)** for PR 22857 at commit [`5499651`](https://github.com/apache/spark/commit/5499651485c70beff67c7a9f83ca49da626ee49b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ReplaceNullWithFalseEndToEndSuite extends QueryTest with SharedSQLContext `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228779276
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case If(pred, trueVal, falseVal) if Seq(trueVal, falseVal).forall(isNullOrBoolean) =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    +    case And(left, right) =>
    --- End diff --
    
    we need to be careful here. null && fales is false, null || true is true. Please take a look at https://github.com/apache/spark/pull/22702


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229150101
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row ("abc", 1))
         }
       }
    +
    +  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
    +
    +    def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {
    --- End diff --
    
    yea we have. Take a look at `TestHive`, and we did something similar before
    ```
    // Disable ConvertToLocalRelation for better test coverage. Test cases built on
    // LocalRelation will exercise the optimization rules better by disabling it as
    // this rule may potentially block testing of other optimization rules such as
    // ConstantPropagation etc.
    .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName)))
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228741800
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    --- End diff --
    
    Also, that's the reason why we don't use `transformExpressionsDown`. We will stop the replacement as soon as we hit an expression that is not `CaseWhen`, `If`, `And`, `Or` or `Literal(null, _)`.  Therefore, `If(IsNull(Literal(null, _)))` won't be transformed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22857


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229165719
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    --- End diff --
    
    When `cw.dataType != BooleanType`, we can still do `replaceNullWithFalse(cond)`, don't we?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    @dbtsai @gatorsmile @cloud-fan could you guys, please, take a look?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228760023
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseSuite.scala ---
    @@ -0,0 +1,324 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.optimizer
    +
    +import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions.{And, CaseWhen, Expression, GreaterThan, If, Literal, Or}
    +import org.apache.spark.sql.catalyst.expressions.Literal.{FalseLiteral, TrueLiteral}
    +import org.apache.spark.sql.catalyst.plans.{Inner, PlanTest}
    +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan}
    +import org.apache.spark.sql.catalyst.rules.RuleExecutor
    +import org.apache.spark.sql.types.{BooleanType, IntegerType}
    +
    +class ReplaceNullWithFalseSuite extends PlanTest {
    +
    +  object Optimize extends RuleExecutor[LogicalPlan] {
    +    val batches =
    +      Batch("Replace null literals", FixedPoint(10),
    +        NullPropagation,
    +        ConstantFolding,
    +        BooleanSimplification,
    +        SimplifyConditionals,
    +        ReplaceNullWithFalse) :: Nil
    +  }
    +
    +  private val testRelation = LocalRelation('i.int, 'b.boolean)
    +  private val anotherTestRelation = LocalRelation('d.int)
    +
    +  test("successful replacement of null literals in filter and join conditions (1)") {
    +    testFilter(originalCond = Literal(null), expectedCond = FalseLiteral)
    +    testJoin(originalCond = Literal(null), expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (2)") {
    +    val originalCond = If(
    +      UnresolvedAttribute("i") > Literal(10),
    +      FalseLiteral,
    +      Literal(null, BooleanType))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (3)") {
    +    val originalCond = If(
    +      UnresolvedAttribute("i") > Literal(10),
    +      TrueLiteral && Literal(null, BooleanType),
    +      UnresolvedAttribute("b") && Literal(null, BooleanType))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (4)") {
    +    val branches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> TrueLiteral,
    +      (UnresolvedAttribute("i") > Literal(40)) -> FalseLiteral)
    +    val originalCond = CaseWhen(branches, Literal(null, BooleanType))
    +    val expectedCond = CaseWhen(branches, FalseLiteral)
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (5)") {
    +    val branches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> Literal(null, BooleanType),
    +      (UnresolvedAttribute("i") > Literal(40)) -> FalseLiteral)
    +    val originalCond = CaseWhen(branches, Literal(null))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (6)") {
    +    val originalBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) ->
    +        If(UnresolvedAttribute("i") < Literal(20), Literal(null, BooleanType), FalseLiteral),
    +      (UnresolvedAttribute("i") > Literal(40)) -> TrueLiteral)
    +    val originalCond = CaseWhen(originalBranches)
    +
    +    val expectedBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> FalseLiteral,
    +      (UnresolvedAttribute("i") > Literal(40)) -> TrueLiteral)
    +    val expectedCond = CaseWhen(expectedBranches)
    +
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (7)") {
    +    val originalBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> TrueLiteral,
    +      (Literal(6) <= Literal(1)) -> FalseLiteral,
    +      (Literal(4) === Literal(5)) -> FalseLiteral,
    +      (UnresolvedAttribute("i") > Literal(10)) -> Literal(null, BooleanType),
    +      (Literal(4) === Literal(4)) -> TrueLiteral)
    +    val originalCond = CaseWhen(originalBranches)
    +
    +    val expectedBranches = Seq(
    +      (UnresolvedAttribute("i") < Literal(10)) -> TrueLiteral,
    +      (UnresolvedAttribute("i") > Literal(10)) -> FalseLiteral,
    +      TrueLiteral -> TrueLiteral)
    +    val expectedCond = CaseWhen(expectedBranches)
    +
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (8)") {
    +    val originalCond = Or(UnresolvedAttribute("b"), Literal(null))
    +    val expectedCond = UnresolvedAttribute("b")
    +    testFilter(originalCond, expectedCond)
    +    testJoin(originalCond, expectedCond)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (9)") {
    +    val originalCond = And(UnresolvedAttribute("b"), Literal(null))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (10)") {
    +    val originalCond = And(
    +      And(UnresolvedAttribute("b"), Literal(null)),
    +      Or(Literal(null), And(Literal(null), And(UnresolvedAttribute("b"), Literal(null)))))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (11)") {
    +    val originalCond = If(
    +      UnresolvedAttribute("i") > Literal(10),
    +      FalseLiteral,
    +      And(UnresolvedAttribute("b"), Literal(null, BooleanType)))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (12)") {
    +    val originalCond = And(
    +      UnresolvedAttribute("b"),
    +      If(
    +        UnresolvedAttribute("i") > Literal(10),
    +        Literal(null),
    +        And(FalseLiteral, UnresolvedAttribute("b"))))
    +    testFilter(originalCond, expectedCond = FalseLiteral)
    +    testJoin(originalCond, expectedCond = FalseLiteral)
    +  }
    +
    +  test("successful replacement of null literals in filter and join conditions (13)") {
    --- End diff --
    
    Thank you for pinging me, @dbtsai . And, thank you for contribution, @aokolnychyi .
    I also clearly feel the benefit of this optimizer. It's worth to be review throughly. 
    
    BTW, the test case names are very unclear to me. It only looks like `positive case (1)~(13)` and `negative case (1) ~ (3)`. Can we have more elaborated and specific names? It will help readability of these test cases and shorten review process.
    ```
    - successful replacement of null literals in filter and join conditions (1)
    ...
    - successful replacement of null literals in filter and join conditions (13)
    - inability to replace null literals in filter and join conditions (1)
    ...
    - inability to replace null literals in filter and join conditions (3)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228760200
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    --- End diff --
    
    The examples are good, but we have to be more clear the scope of this optimizer.
    For now, this PR touches not only predicates in WHERE, but also some expressions in SELECT.
    Also, it's unclear with aggregation like HAVING. Could you clearly enumerate the targets in this documentation, @aokolnychyi ?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98317/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229151278
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    --- End diff --
    
    Can we make it more general? I think the expected expression is:
    1. It's `NullIntolerant`. If any child is null, it will be null.
    2. it has a null child.
    
    so I would write something like
    ```
    case f @ Filter(cond, _) if alwaysNull(cond) => f.copy(condition = false)
    ...
    
    def alwaysNull(e: Expression): Boolean = e match {
      case Literal(null, _) => true
      case n: NullIntolerant => n.children.exists(alwaysNull)
      case _ => false
    }
    
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    LGTM except the end-to-end test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229449194
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row ("abc", 1))
         }
       }
    +
    +  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
    +
    +    def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {
    --- End diff --
    
    I see, thanks.
    
    So, you mean to use `withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") {...}` to ensure that `ConvertToLocalRelation` is not excluded?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228779505
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row ("abc", 1))
         }
       }
    +
    +  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
    +
    +    def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {
    --- End diff --
    
    this assumes we run `ConvertToLocalRelation`, let's use `withSQLConf` to make sure this rule is on.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Thanks all for reviewing! The latest change looks good to me too. Merged into master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229150341
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        CaseWhen(newBranches, elseValue)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case If(pred, trueVal, falseVal) if Seq(trueVal, falseVal).forall(isNullOrBoolean) =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    +    case And(left, right) =>
    --- End diff --
    
    I don't have a particular case, this is just to double check that these corner cases are considered. I think we are fine now :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229133793
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2578,4 +2578,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row ("abc", 1))
         }
       }
    +
    +  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
    +
    +    def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {
    --- End diff --
    
    Do we actually have a way to enable/disable `ConvertToLocalRelation`? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229445313
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    --- End diff --
    
    This case is handled in `apply` and tested in `"replace null in predicates of If"`, `"replace null in predicates of If inside another If"`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r238450750
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -31,14 +31,14 @@ import org.apache.spark.scheduler.{SparkListener, SparkListenerJobEnd}
     import org.apache.spark.sql.catalyst.TableIdentifier
     import org.apache.spark.sql.catalyst.expressions.Uuid
     import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
    -import org.apache.spark.sql.catalyst.plans.logical.{Filter, OneRowRelation, Union}
    +import org.apache.spark.sql.catalyst.plans.logical.{OneRowRelation, Union}
    --- End diff --
    
    BTW, please do not remove these in a huge feature PR. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    **[Test build #98239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98239/testReport)** for PR 22857 at commit [`0eac890`](https://github.com/apache/spark/commit/0eac890be26cb0960e245b88fd771233d21850ee).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228760320
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -83,6 +83,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
             BooleanSimplification,
             SimplifyConditionals,
             RemoveDispensableExpressions,
    +        ReplaceNullWithFalse,
    --- End diff --
    
    Logically, `ReplaceNullWithFalse` can use the result of `SimplifyBinaryComparison`. How about postponing this after `SimplifyBinaryComparison`? In other words, switch `ReplaceNullWithFalse` and `SimplifyBinaryComparison`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229705741
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b")))
       }
    +
    +  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
    +
    +    def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {
    +      case s: LocalTableScanExec => assert(s.rows.isEmpty)
    +      case p => fail(s"$p is not LocalTableScanExec")
    +    }
    +
    +    val df1 = Seq((1, true), (2, false)).toDF("l", "b")
    +    val df2 = Seq(2, 3).toDF("l")
    +
    +    val q1 = df1.where("IF(l > 10, false, b AND null)")
    +    checkAnswer(q1, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q1)
    +
    +    val q2 = df1.where("CASE WHEN l < 10 THEN null WHEN l > 40 THEN false ELSE null END")
    +    checkAnswer(q2, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q2)
    +
    +    val q3 = df1.join(df2, when(df1("l") > df2("l"), lit(null)).otherwise(df1("b") && lit(null)))
    +    checkAnswer(q3, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q3)
    +
    +    val q4 = df1.where("IF(IF(b, null, false), true, null)")
    +    checkAnswer(q4, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q4)
    +
    +    val q5 = df1.selectExpr("IF(l > 1 AND null, 5, 1) AS out")
    +    checkAnswer(q5, Row(1) :: Row(1) :: Nil)
    +    q5.queryExecution.executedPlan.foreach { p =>
    +      assert(p.expressions.forall(e => e.find(_.isInstanceOf[If]).isEmpty))
    --- End diff --
    
    You are right, this can pass if `ConvertToLocalRelation` is enabled. When I tested this check, I did not take into account that `SharedSparkSession` disables `ConvertToLocalRelation`. So, the check worked correctly but only because `ConvertToLocalRelation` was disabled in `SharedSparkSession`. Let’s switch to tables. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Please be really careful in null handling. It could easily introduce the correctness bugs like what we recently fixed. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229442843
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    --- End diff --
    
    I like your snippet because it is clean. We also considered a similar approach. 
    
    1. Unfortunately, it does not handle nested `If`/`CaseWhen` expressions as they are not `NullIntolerant`. For example, cases like `If(If(a > 1, FalseLiteral, Literal(null, _)), 1, 2)` will not be optimized if we remove branches for `If`/`CaseWhen`.
    2. If we just add one more brach to handle all `NullIntolerant` expressions, I am not sure it will give a lot of benefits as those expressions are transformed into `Literal(null, _)` by `NullPropagation` and we operate in the same batch.
    3. As @gatorsmile said, we should be really careful with generalization. For example, `Not` is `NullIntolerant`. `Not(null)` is transformed into `null` by `NullPropagation`. We need to ensure that we do not replace `null` inside `Not` and do not convert `Not(null)` into `Not(FalseLiteral)`.
    
    Therefore, the intention was to keep things simple to be safe.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4665/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    **[Test build #98127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98127/testReport)** for PR 22857 at commit [`1d8fefd`](https://github.com/apache/spark/commit/1d8fefd9b227c6aba50b7e012726ec292c75b5a1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229537395
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b")))
       }
    +
    +  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
    +
    +    def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {
    +      case s: LocalTableScanExec => assert(s.rows.isEmpty)
    +      case p => fail(s"$p is not LocalTableScanExec")
    +    }
    +
    +    val df1 = Seq((1, true), (2, false)).toDF("l", "b")
    +    val df2 = Seq(2, 3).toDF("l")
    +
    +    val q1 = df1.where("IF(l > 10, false, b AND null)")
    +    checkAnswer(q1, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q1)
    +
    +    val q2 = df1.where("CASE WHEN l < 10 THEN null WHEN l > 40 THEN false ELSE null END")
    +    checkAnswer(q2, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q2)
    +
    +    val q3 = df1.join(df2, when(df1("l") > df2("l"), lit(null)).otherwise(df1("b") && lit(null)))
    +    checkAnswer(q3, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q3)
    +
    +    val q4 = df1.where("IF(IF(b, null, false), true, null)")
    +    checkAnswer(q4, Seq.empty)
    +    checkPlanIsEmptyLocalScan(q4)
    +
    +    val q5 = df1.selectExpr("IF(l > 1 AND null, 5, 1) AS out")
    +    checkAnswer(q5, Row(1) :: Row(1) :: Nil)
    +    q5.queryExecution.executedPlan.foreach { p =>
    +      assert(p.expressions.forall(e => e.find(_.isInstanceOf[If]).isEmpty))
    --- End diff --
    
    This test can pass without the optimization. The `ConvertToLocalRelation` rule will eliminate the `Project`.
    
    Can we use a table as input data? e.g.
    ```
    withTable("t1", "t2") {
      Seq((1, true), (2, false)).toDF("l", "b").write.saveAsTable("t1")
      Seq(2, 3).toDF("l").write.saveAsTable("t2")
      val df1 = spark.table("t1")
      val df2 = spark.table("t2")
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r228738623
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,65 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + *
    + * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
    + * conditions in [[CaseWhen]].
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case CaseWhen(branches, elseValue) =>
    --- End diff --
    
    Nit,
    
    ```scala
    case cw @ CaseWhen(branches, _) =>
      ..
      ..
      cw.copy(branches = newBranches)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r238489445
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -31,14 +31,14 @@ import org.apache.spark.scheduler.{SparkListener, SparkListenerJobEnd}
     import org.apache.spark.sql.catalyst.TableIdentifier
     import org.apache.spark.sql.catalyst.expressions.Uuid
     import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
    -import org.apache.spark.sql.catalyst.plans.logical.{Filter, OneRowRelation, Union}
    +import org.apache.spark.sql.catalyst.plans.logical.{OneRowRelation, Union}
    --- End diff --
    
    Yea, also it's unrelated import cleanup. It should be discouraged because it might make backporting / reverting potentially difficult, and sometimes those changes make readers confused.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4614/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98127/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229445682
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    --- End diff --
    
    This case is also covered and tested in `"replace null in conditions of CaseWhen"`, `"replace null in conditions of CaseWhen inside another CaseWhen"`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    LGTM. 
    
    @cloud-fan and @gatorsmile, this is the PR I mentioned to you earlier this year in the SF Spark summit which can simplify some of our queries. 
    
    Also add @dongjoon-hyun and @viirya 
    
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229529772
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -736,3 +736,60 @@ object CombineConcats extends Rule[LogicalPlan] {
           flattenConcats(concat)
       }
     }
    +
    +/**
    + * A rule that replaces `Literal(null, _)` with `FalseLiteral` for further optimizations.
    + *
    + * This rule applies to conditions in [[Filter]] and [[Join]]. Moreover, it transforms predicates
    + * in all [[If]] expressions as well as branch conditions in all [[CaseWhen]] expressions.
    + *
    + * For example, `Filter(Literal(null, _))` is equal to `Filter(FalseLiteral)`.
    + *
    + * Another example containing branches is `Filter(If(cond, FalseLiteral, Literal(null, _)))`;
    + * this can be optimized to `Filter(If(cond, FalseLiteral, FalseLiteral))`, and eventually
    + * `Filter(FalseLiteral)`.
    + *
    + * As this rule is not limited to conditions in [[Filter]] and [[Join]], arbitrary plans can
    + * benefit from it. For example, `Project(If(And(cond, Literal(null)), Literal(1), Literal(2)))`
    + * can be simplified into `Project(Literal(2))`.
    + *
    + * As a result, many unnecessary computations can be removed in the query optimization phase.
    + */
    +object ReplaceNullWithFalse extends Rule[LogicalPlan] {
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case f @ Filter(cond, _) => f.copy(condition = replaceNullWithFalse(cond))
    +    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
    +    case p: LogicalPlan => p transformExpressions {
    +      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
    +      case cw @ CaseWhen(branches, _) =>
    +        val newBranches = branches.map { case (cond, value) =>
    +          replaceNullWithFalse(cond) -> value
    +        }
    +        cw.copy(branches = newBranches)
    +    }
    +  }
    +
    +  /**
    +   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
    +   *
    +   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
    +   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
    +   */
    +  private def replaceNullWithFalse(e: Expression): Expression = e match {
    +    case cw: CaseWhen if cw.dataType == BooleanType =>
    +      val newBranches = cw.branches.map { case (cond, value) =>
    +        replaceNullWithFalse(cond) -> replaceNullWithFalse(value)
    +      }
    +      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
    +      CaseWhen(newBranches, newElseValue)
    +    case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
    +      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
    --- End diff --
    
    ah, I see. `replaceNullWithFalse` should only work in boolean type cases. Then I think we are fine with it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22857
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98239/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22857#discussion_r229537117
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b")))
       }
    +
    +  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
    --- End diff --
    
    it's weird to put optimizer end-to-end test in `DataFrameSuite`. Can we create a `ReplaceNullWithFalseEndToEndSuite`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org