You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/08/28 06:44:15 UTC
[GitHub] spark pull request #22250: [SPARK-25259][SQL] left/right join support push d...
GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/22250
[SPARK-25259][SQL] left/right join support push down during-join predicates
## What changes were proposed in this pull request?
Prepare data:
```sql
create temporary view EMPLOYEE as select * from values
("000010", "HAAS", "A00"),
("000010", "THOMPSON", "B01"),
("000030", "KWAN", "C01"),
("000110", "LUCCHESSI", "A00"),
("000120", "O'CONNELL", "A))"),
("000130", "QUINTANA", "C01")
as EMPLOYEE(EMPNO, LASTNAME, WORKDEPT);
create temporary view DEPARTMENT as select * from values
("A00", "SPIFFY COMPUTER SERVICE DIV.", "000010"),
("B01", "PLANNING", "000020"),
("C01", "INFORMATION CENTER", "000030"),
("D01", "DEVELOPMENT CENTER", null)
as EMPLOYEE(DEPTNO, DEPTNAME, MGRNO);
create temporary view PROJECT as select * from values
("AD3100", "ADMIN SERVICES", "D01"),
("IF1000", "QUERY SERVICES", "C01"),
("IF2000", "USER EDUCATION", "E01"),
("MA2100", "WELD LINE AUDOMATION", "D01"),
("PL2100", "WELD LINE PLANNING", "01")
as EMPLOYEE(PROJNO, PROJNAME, DEPTNO);
```
For the below SQL, we can push `DEPTNO='E01'` to right side to reduce data reading:
```sql
SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
FROM PROJECT P LEFT OUTER JOIN DEPARTMENT D
ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';
```
Optimized SQL is equivalent to:
```sql
SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
FROM PROJECT P LEFT OUTER JOIN (SELECT * FROM DEPARTMENT WHERE DEPTNO='E01') D
ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';
```
This pr enhancement `PushPredicateThroughJoin` to support this feature.
## How was this patch tested?
unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-25259
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22250.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22250
----
commit f9b32d5d044a899529959ad5042f8cf95c789ea8
Author: Yuming Wang <yu...@...>
Date: 2018-08-28T06:18:05Z
left/right join support push down during-join predicates
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22250
cc @cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95328/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22250
Fixed by [SPARK-21479](https://issues.apache.org/jira/browse/SPARK-21479).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22250
**[Test build #95328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95328/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2606/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2601/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22250: [SPARK-25259][SQL] left/right join support push d...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22250#discussion_r213557635
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -1190,11 +1191,13 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
// push down the join filter into sub query scanning if applicable
case j @ Join(left, right, joinType, joinCondition) =>
- val (leftJoinConditions, rightJoinConditions, commonJoinCondition) =
- split(joinCondition.map(splitConjunctivePredicates).getOrElse(Nil), left, right)
+ val condition = joinCondition.map(splitConjunctivePredicates).getOrElse(Nil)
+ val additionalCondition = inferAdditionalConstraints(condition.toSet)
--- End diff --
IIRC we can only do this in `InferFiltersFromConstraints`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22250: [SPARK-25259][SQL] left/right join support push d...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum closed the pull request at:
https://github.com/apache/spark/pull/22250
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22250
**[Test build #95335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95335/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22250
**[Test build #95328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95328/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22250
**[Test build #95335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95335/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22250
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22250
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95335/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org