You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by KaiXinXiaoLei <gi...@git.apache.org> on 2018/02/25 07:21:03 UTC
[GitHub] spark pull request #20670: add constranits
GitHub user KaiXinXiaoLei opened a pull request:
https://github.com/apache/spark/pull/20670
add constranits
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small table ,and the number is one. The `catalog_sales` table is a big table, and the number is 10 billion. The task will be hang up. And i find the many null values of `cs_order_number` in the `catalog_sales` table. I think the null value should be removed in the logical plan.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/KaiXinXiaoLei/spark Spark-23405
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20670.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20670
----
commit 705ed462bb307871e65199ce02576f12d60d2176
Author: KaiXinXiaoLei <58...@...>
Date: 2018-02-25T06:06:39Z
add constranits
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87727/testReport)** for PR 20670 at commit [`f7d764e`](https://github.com/apache/spark/commit/f7d764efa435327ba34e829da53c16a6ec16f403).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87727/testReport)** for PR 20670 at commit [`f7d764e`](https://github.com/apache/spark/commit/f7d764efa435327ba34e829da53c16a6ec16f403).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87766/testReport)** for PR 20670 at commit [`b3f2ade`](https://github.com/apache/spark/commit/b3f2ade5f1dc2ad3349f4dc21fe353590e8bbbfd).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1035/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1140/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87817/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1144/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1109/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87691/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20670
Also, a better title for this PR would be:
```
Generate additional constraints for Join's children
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: add constranits
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87648/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20670
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Add constranits
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r170529019
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
*/
lazy val constraints: ExpressionSet = {
if (conf.constraintPropagationEnabled) {
+ var relevantOutPutSet: AttributeSet = outputSet
+ constraints.foreach {
+ case eq @ EqualTo(l: Attribute, r: Attribute) =>
+ if (l.references.subsetOf(relevantOutPutSet)
--- End diff --
You can avoid computing each `subsetOf` twice here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1190/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87648/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20670
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87726/testReport)** for PR 20670 at commit [`1e0f78a`](https://github.com/apache/spark/commit/1e0f78a50bd70a3f94382887a74cc70f7fefe3c6).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87772/testReport)** for PR 20670 at commit [`ed5c170`](https://github.com/apache/spark/commit/ed5c170c35d8786df241921ac19d95520ace3836).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Add constranits
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r170529062
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
*/
lazy val constraints: ExpressionSet = {
if (conf.constraintPropagationEnabled) {
+ var relevantOutPutSet: AttributeSet = outputSet
+ constraints.foreach {
+ case eq @ EqualTo(l: Attribute, r: Attribute) =>
+ if (l.references.subsetOf(relevantOutPutSet)
+ && !r.references.subsetOf(relevantOutPutSet)) {
+ relevantOutPutSet = relevantOutPutSet.++(r.references)
--- End diff --
Use ` ++ ` syntax, rather than write it as a method invocation.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171182870
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala ---
@@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
comparePlans(Optimize.execute(original.analyze), correct.analyze)
}
+
+ test("SPARK-23405:single left-semi join, filter out nulls on either side on equi-join keys") {
+ val x = testRelation.subquery('x)
+ val y = testRelation.subquery('y)
+ val originalQuery = x.join(y, LeftSemi,
+ condition = Some("x.a".attr === "y.a".attr)).analyze
--- End diff --
nit: we can create a `val condition = Some("x.a".attr === "y.a".attr)` to reduce duplicated code
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1177/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1168/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87691/testReport)** for PR 20670 at commit [`f44a92a`](https://github.com/apache/spark/commit/f44a92ad20895a94577cf2b4de54fc320b0f934b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171201415
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -22,21 +22,30 @@ import org.apache.spark.sql.catalyst.expressions._
trait QueryPlanConstraints { self: LogicalPlan =>
+ /**
+ * An [[ExpressionSet]] that contains an additional set of constraints about equality
--- End diff --
The comment is not acute, we may have various kinds of constraints.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87772/testReport)** for PR 20670 at commit [`ed5c170`](https://github.com/apache/spark/commit/ed5c170c35d8786df241921ac19d95520ace3836).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20670
Good catch! This is a real problem, but the fix looks hacky.
By definition, I think `plan.contraints` should only include constraints that refer to `plan.output`, as that's the promise a plan can make to its parent. However, join is special as `Join.condition` can refer to both of the join sides, and we add the constraints to `Join.condition`, which is kind of we are making a promise to Join's children, not parent. My proposal:
```
lazy val constraints: ExpressionSet = {
if (conf.constraintPropagationEnabled) {
allConstraints.filter { c =>
c.references.nonEmpty && c.references.subsetOf(outputSet) && c.deterministic
}
} else {
ExpressionSet(Set.empty)
}
}
lazy val allConstraints = ExpressionSet(validConstraints
.union(inferAdditionalConstraints(validConstraints))
.union(constructIsNotNullConstraints(validConstraints)))
```
Then we can call `plan.allConstraints` when inferring contraints for join.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87804/testReport)** for PR 20670 at commit [`023f2f7`](https://github.com/apache/spark/commit/023f2f709db484d82cde22b00db0bad33ac72279).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:
https://github.com/apache/spark/pull/20670
@gatorsmile thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171102033
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -23,20 +23,23 @@ import org.apache.spark.sql.catalyst.expressions._
trait QueryPlanConstraints { self: LogicalPlan =>
/**
- * An [[ExpressionSet]] that contains invariants about the rows output by this operator. For
- * example, if this set contains the expression `a = 2` then that expression is guaranteed to
- * evaluate to `true` for all rows produced.
- */
+ * An [[ExpressionSet]] that contains an additional set of constraints about equality
+ * constraints and `isNotNull` constraints.
+ */
+ lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
--- End diff --
This should also be guarded by `constraintPropagationEnabled`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:
https://github.com/apache/spark/pull/20670
@cloud-fan @srowen @jiangxb1987 i have changed the code and title , please help me review. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87836/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Add constranits
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r170528989
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
*/
lazy val constraints: ExpressionSet = {
if (conf.constraintPropagationEnabled) {
+ var relevantOutPutSet: AttributeSet = outputSet
+ constraints.foreach {
+ case eq @ EqualTo(l: Attribute, r: Attribute) =>
--- End diff --
`eq` isn't used
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87648/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171463022
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._
trait QueryPlanConstraints { self: LogicalPlan =>
+ /**
+ * An [[ExpressionSet]] that contains an additional set of constraints, such as equality
+ * constraints and `isNotNull` constraints, etc.
+ */
+ lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
--- End diff --
We still need `if (conf.constraintPropagationEnabled)`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1110/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171276798
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala ---
@@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
comparePlans(Optimize.execute(original.analyze), correct.analyze)
}
+
+ test("SPARK-23405: left-semi equal-join should filter out null join keys on both sides") {
+ val x = testRelation.subquery('x)
+ val y = testRelation.subquery('y)
+ val condition = Some("x.a".attr === "y.a".attr)
+ val originalQuery = x.join(y, LeftSemi, condition).analyze
+ val left = x.where(IsNotNull('a))
+ val right = y.where(IsNotNull('a))
+ val correctAnswer = left.join(right, LeftSemi, condition)
+ .analyze
--- End diff --
this doesn't need to be in a new line
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87804/testReport)** for PR 20670 at commit [`023f2f7`](https://github.com/apache/spark/commit/023f2f709db484d82cde22b00db0bad33ac72279).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20670
LGTM only nits
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87772/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20670
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r170840898
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -27,16 +27,15 @@ trait QueryPlanConstraints { self: LogicalPlan =>
* example, if this set contains the expression `a = 2` then that expression is guaranteed to
* evaluate to `true` for all rows produced.
--- End diff --
The comment belongs to `constraints` not `allConstraints`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171182102
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -22,21 +22,30 @@ import org.apache.spark.sql.catalyst.expressions._
trait QueryPlanConstraints { self: LogicalPlan =>
+ /**
+ * An [[ExpressionSet]] that contains an additional set of constraints about equality
+ * constraints and `isNotNull` constraints.
+ */
+ lazy val allConstraints: ExpressionSet = {
+ if (conf.constraintPropagationEnabled) {
+ ExpressionSet(validConstraints
+ .union(inferAdditionalConstraints(validConstraints))
+ .union(constructIsNotNullConstraints(validConstraints)))
+ } else {
+ ExpressionSet(Set.empty)
+ }
+ }
+
/**
* An [[ExpressionSet]] that contains invariants about the rows output by this operator. For
* example, if this set contains the expression `a = 2` then that expression is guaranteed to
* evaluate to `true` for all rows produced.
*/
lazy val constraints: ExpressionSet = {
if (conf.constraintPropagationEnabled) {
--- End diff --
now we don't need this if.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/20670
This is still lacking detail about 'why'. It's not my area either. I think you should not have reopened this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171462811
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
@@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._
trait QueryPlanConstraints { self: LogicalPlan =>
+ /**
+ * An [[ExpressionSet]] that contains an additional set of constraints, such as equality
+ * constraints and `isNotNull` constraints, etc.
+ */
+ lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
+ .union(inferAdditionalConstraints(validConstraints))
+ .union(constructIsNotNullConstraints(validConstraints)))
--- End diff --
Nit: indents
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87727/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87766/testReport)** for PR 20670 at commit [`b3f2ade`](https://github.com/apache/spark/commit/b3f2ade5f1dc2ad3349f4dc21fe353590e8bbbfd).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20670
LGTM except several minor comments
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87836/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87726/testReport)** for PR 20670 at commit [`1e0f78a`](https://github.com/apache/spark/commit/1e0f78a50bd70a3f94382887a74cc70f7fefe3c6).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20670#discussion_r171182439
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala ---
@@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
comparePlans(Optimize.execute(original.analyze), correct.analyze)
}
+
+ test("SPARK-23405:single left-semi join, filter out nulls on either side on equi-join keys") {
--- End diff --
nit: `SPARK-23405: left-semi equa-join should filter out null join keys on both sides`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87651/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20670
Agree with that @cloud-fan proposed to have constraints for a plan and the children. However, that requires a relative wider change as well as a find set of test cases, please don't be hesitate to ask for help if you run into any issues working on this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87766/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/20670
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87817/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).
* This patch **fails from timeout after a configured wait of \`300m\`**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87726/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87691/testReport)** for PR 20670 at commit [`f44a92a`](https://github.com/apache/spark/commit/f44a92ad20895a94577cf2b4de54fc320b0f934b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87651/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87651/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20670
LGTM except we should add a test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:
https://github.com/apache/spark/pull/20670
@srowen i redescribe the problem. Now i hive a small table `ls` with one row , and a big table `catalog_sales` with One hundred billion rows. And in the big table, the non null value about `cs_order_number` field has one million.
Then i join this tables with the query:`select ls.cs_order_number from ls left semi join catalog_sales cs on ls.cs_order_number = cs.cs_order_number`. My job is running, and there has been a data skew. Then i find the null value cause this phenomenon.
The join condition is `ls.cs_order_number = cs.cs_order_number`. In the Optimized Logical Plan, the left table has "Filter isnotnull(cs_order_number#1)" action, so i think the right table should have “Filter isnotnull” action. Then the right table will filter null value firstly , and join with left table secondly. So the data skew will not be caused by null value.
Using this idea, my sql runs success.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20670
**[Test build #87817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87817/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1038/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20670
You shall also add test cases.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87836/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1079/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20670
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87804/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:
https://github.com/apache/spark/pull/20670
@SparkQA i think this error is not caused by my patch. please ok to test.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Add constranits
Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:
https://github.com/apache/spark/pull/20670
@srowen @wangyum help me review, thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org