You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rednaxelafx <gi...@git.apache.org> on 2018/11/18 09:16:13 UTC
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
GitHub user rednaxelafx opened a pull request:
https://github.com/apache/spark/pull/23079
[SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter
## What changes were proposed in this pull request?
Extend the `ReplaceNullWithFalse` optimizer rule introduced in SPARK-25860 (https://github.com/apache/spark/pull/22857) to also support optimizing predicates in higher-order functions of `ArrayExists`, `ArrayFilter`, `MapFilter`.
Also rename the rule to `ReplaceNullWithFalseInPredicate` to better reflect its intent.
## How was this patch tested?
Added new unit test cases to the `ReplaceNullWithFalseInPredicateSuite` (renamed from `ReplaceNullWithFalseSuite`).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rednaxelafx/apache-spark catalyst-master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23079.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23079
----
commit 710c8862b3138f6146fe2309d6379707f8d4ac14
Author: Kris Mok <kr...@...>
Date: 2018-11-18T09:09:53Z
Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:
https://github.com/apache/spark/pull/23079#discussion_r234508866
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala ---
@@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
testProjection(originalExpr = column, expectedExpr = column)
}
+ test("replace nulls in lambda function of ArrayFilter") {
+ val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
--- End diff --
Actually I intentionally made all three lambda the same (the `MapFilter` one only differs in the lambda parameter). I can encapsulate this lambda function into a test utility function. Let me update the PR and see what you think.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:
https://github.com/apache/spark/pull/23079#discussion_r234534798
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala ---
@@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
testProjection(originalExpr = column, expectedExpr = column)
}
+ test("replace nulls in lambda function of ArrayFilter") {
+ val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
--- End diff --
Updated.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5118/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23079#discussion_r234474562
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
replaceNullWithFalse(cond) -> value
}
cw.copy(branches = newBranches)
+ case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
--- End diff --
shall we add a `withNewFunctions` method in `HigherOrderFunction`? Then we can simplify this rule to
```
case f: HigherOrderFunction => f.withNewFunctions(f.functions.map(replaceNullWithFalse))
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23079#discussion_r234639734
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
replaceNullWithFalse(cond) -> value
}
cw.copy(branches = newBranches)
+ case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
--- End diff --
ah i see. Sorry I missed it. Then it's safer to use a white-list here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5136/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:
https://github.com/apache/spark/pull/23079#discussion_r234508561
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
replaceNullWithFalse(cond) -> value
}
cw.copy(branches = newBranches)
+ case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
--- End diff --
I'm not sure if that's useful or not. First of all, the `replaceNullWithFalse` handling doesn't apply to all higher-order functions. In fact it only applies to a very narrow set, ones where a lambda function returns `BooleanType` and is immediately used as a predicate. So having a generic utility can certainly help make this PR slightly simpler, but I don't know how useful it is for other cases.
I'd prefer waiting for more such transformation cases to introduce a new utility for the pattern. WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/23079#discussion_r234467085
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala ---
@@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
testProjection(originalExpr = column, expectedExpr = column)
}
+ test("replace nulls in lambda function of ArrayFilter") {
+ val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
--- End diff --
Test cases for `ArrayFilter` and `ArrayExists` seem to be identical. As we have those tests anyway, would it make sense to cover different lambda functions?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23079
@rednaxelafx I am glad the rule gets more adoption. Renaming also makes sense to me.
Shall we extend `ReplaceNullWithFalseEndToEndSuite` as well?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23079
**[Test build #98996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98996/testReport)** for PR 23079 at commit [`6646a96`](https://github.com/apache/spark/commit/6646a96c8b9e905e3cad0b29e7f4063551b23c4c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class ReplaceNullWithFalseInPredicateEndToEndSuite extends QueryTest with SharedSQLContext `
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/23079
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23079
**[Test build #98975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98975/testReport)** for PR 23079 at commit [`710c886`](https://github.com/apache/spark/commit/710c8862b3138f6146fe2309d6379707f8d4ac14).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98996/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23079
**[Test build #98975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98975/testReport)** for PR 23079 at commit [`710c886`](https://github.com/apache/spark/commit/710c8862b3138f6146fe2309d6379707f8d4ac14).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/23079
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:
https://github.com/apache/spark/pull/23079
cc @aokolnychyi : I'd like to propose renaming the rule you introduced to add a `-InPredicate` suffix, because obviously we can't replace arbitrary `null`s with `false`, but only the ones that are going to be directly used in a boolean predicate context (e.g. `If(cond, _, _)`, `Filter` etc that you've already nicely identified). Does that make sense to you?
BTW thank you very much for introducing that rule. It's really neat!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98975/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23079
**[Test build #98996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98996/testReport)** for PR 23079 at commit [`6646a96`](https://github.com/apache/spark/commit/6646a96c8b9e905e3cad0b29e7f4063551b23c4c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on the issue:
https://github.com/apache/spark/pull/23079
LGTM as well.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicat...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23079
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org