You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rednaxelafx <gi...@git.apache.org> on 2018/11/18 09:16:13 UTC

[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

GitHub user rednaxelafx opened a pull request:

    https://github.com/apache/spark/pull/23079

    [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter

    ## What changes were proposed in this pull request?
    
    Extend the `ReplaceNullWithFalse` optimizer rule introduced in SPARK-25860 (https://github.com/apache/spark/pull/22857) to also support optimizing predicates in higher-order functions of `ArrayExists`, `ArrayFilter`, `MapFilter`.
    
    Also rename the rule to `ReplaceNullWithFalseInPredicate` to better reflect its intent.
    
    ## How was this patch tested?
    
    Added new unit test cases to the `ReplaceNullWithFalseInPredicateSuite` (renamed from `ReplaceNullWithFalseSuite`).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rednaxelafx/apache-spark catalyst-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23079
    
----
commit 710c8862b3138f6146fe2309d6379707f8d4ac14
Author: Kris Mok <kr...@...>
Date:   2018-11-18T09:09:53Z

    Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23079#discussion_r234508866
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala ---
    @@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
         testProjection(originalExpr = column, expectedExpr = column)
       }
     
    +  test("replace nulls in lambda function of ArrayFilter") {
    +    val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
    --- End diff --
    
    Actually I intentionally made all three lambda the same (the `MapFilter` one only differs in the lambda parameter). I can encapsulate this lambda function into a test utility function. Let me update the PR and see what you think.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23079#discussion_r234534798
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala ---
    @@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
         testProjection(originalExpr = column, expectedExpr = column)
       }
     
    +  test("replace nulls in lambda function of ArrayFilter") {
    +    val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
    --- End diff --
    
    Updated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5118/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23079#discussion_r234474562
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
               replaceNullWithFalse(cond) -> value
             }
             cw.copy(branches = newBranches)
    +      case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
    --- End diff --
    
    shall we add a `withNewFunctions` method in `HigherOrderFunction`? Then we can simplify this rule to
    ```
    case f: HigherOrderFunction => f.withNewFunctions(f.functions.map(replaceNullWithFalse))
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23079#discussion_r234639734
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
               replaceNullWithFalse(cond) -> value
             }
             cw.copy(branches = newBranches)
    +      case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
    --- End diff --
    
    ah i see. Sorry I missed it. Then it's safer to use a white-list here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5136/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23079#discussion_r234508561
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
               replaceNullWithFalse(cond) -> value
             }
             cw.copy(branches = newBranches)
    +      case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
    --- End diff --
    
    I'm not sure if that's useful or not. First of all, the `replaceNullWithFalse` handling doesn't apply to all higher-order functions. In fact it only applies to a very narrow set, ones where a lambda function returns `BooleanType` and is immediately used as a predicate. So having a generic utility can certainly help make this PR slightly simpler, but I don't know how useful it is for other cases.
    I'd prefer waiting for more such transformation cases to introduce a new utility for the pattern. WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23079#discussion_r234467085
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala ---
    @@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
         testProjection(originalExpr = column, expectedExpr = column)
       }
     
    +  test("replace nulls in lambda function of ArrayFilter") {
    +    val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
    --- End diff --
    
    Test cases for `ArrayFilter` and `ArrayExists` seem to be identical. As we have those tests anyway, would it make sense to cover different lambda functions?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    @rednaxelafx I am glad the rule gets more adoption. Renaming also makes sense to me.
    
    Shall we extend `ReplaceNullWithFalseEndToEndSuite` as well?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    **[Test build #98996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98996/testReport)** for PR 23079 at commit [`6646a96`](https://github.com/apache/spark/commit/6646a96c8b9e905e3cad0b29e7f4063551b23c4c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ReplaceNullWithFalseInPredicateEndToEndSuite extends QueryTest with SharedSQLContext `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23079


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    **[Test build #98975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98975/testReport)** for PR 23079 at commit [`710c886`](https://github.com/apache/spark/commit/710c8862b3138f6146fe2309d6379707f8d4ac14).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98996/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    **[Test build #98975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98975/testReport)** for PR 23079 at commit [`710c886`](https://github.com/apache/spark/commit/710c8862b3138f6146fe2309d6379707f8d4ac14).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    cc @aokolnychyi : I'd like to propose renaming the rule you introduced to add a `-InPredicate` suffix, because obviously we can't replace arbitrary `null`s with `false`, but only the ones that are going to be directly used in a boolean predicate context (e.g. `If(cond, _, _)`, `Filter` etc that you've already nicely identified). Does that make sense to you?
    
    BTW thank you very much for introducing that rule. It's really neat!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98975/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    **[Test build #98996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98996/testReport)** for PR 23079 at commit [`6646a96`](https://github.com/apache/spark/commit/6646a96c8b9e905e3cad0b29e7f4063551b23c4c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    LGTM as well. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23079
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org