You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/08/28 06:44:15 UTC

[GitHub] spark pull request #22250: [SPARK-25259][SQL] left/right join support push d...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22250

    [SPARK-25259][SQL] left/right join support push down during-join predicates

    ## What changes were proposed in this pull request?
    Prepare data:
    ```sql
    create temporary view EMPLOYEE as select * from values
      ("000010", "HAAS", "A00"),
      ("000010", "THOMPSON", "B01"),
      ("000030", "KWAN", "C01"),
      ("000110", "LUCCHESSI", "A00"),
      ("000120", "O'CONNELL", "A))"),
      ("000130", "QUINTANA", "C01")
      as EMPLOYEE(EMPNO, LASTNAME, WORKDEPT);
    
    create temporary view DEPARTMENT as select * from values
      ("A00", "SPIFFY COMPUTER SERVICE DIV.", "000010"),
      ("B01", "PLANNING", "000020"),
      ("C01", "INFORMATION CENTER", "000030"),
      ("D01", "DEVELOPMENT CENTER", null)
      as EMPLOYEE(DEPTNO, DEPTNAME, MGRNO);
    
    create temporary view PROJECT as select * from values
      ("AD3100", "ADMIN SERVICES", "D01"),
      ("IF1000", "QUERY SERVICES", "C01"),
      ("IF2000", "USER EDUCATION", "E01"),
      ("MA2100", "WELD LINE AUDOMATION", "D01"),
      ("PL2100", "WELD LINE PLANNING", "01")
      as EMPLOYEE(PROJNO, PROJNAME, DEPTNO);
    ```
    For the below SQL, we can push `DEPTNO='E01'` to right side to reduce data reading:
    ```sql
    SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
    FROM PROJECT P LEFT OUTER JOIN DEPARTMENT D
    ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';
    ```
    Optimized SQL is equivalent to:
    ```sql
    SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME
    FROM PROJECT P LEFT OUTER JOIN (SELECT * FROM DEPARTMENT WHERE DEPTNO='E01') D
    ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01';
    ```
    
    This pr enhancement `PushPredicateThroughJoin` to support this feature.
    
    ## How was this patch tested?
    
    unit tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25259

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22250
    
----
commit f9b32d5d044a899529959ad5042f8cf95c789ea8
Author: Yuming Wang <yu...@...>
Date:   2018-08-28T06:18:05Z

    left/right join support push down during-join predicates

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    cc @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95328/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Fixed by  [SPARK-21479](https://issues.apache.org/jira/browse/SPARK-21479).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    **[Test build #95328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95328/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2606/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2601/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22250: [SPARK-25259][SQL] left/right join support push d...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22250#discussion_r213557635
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -1190,11 +1191,13 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
     
         // push down the join filter into sub query scanning if applicable
         case j @ Join(left, right, joinType, joinCondition) =>
    -      val (leftJoinConditions, rightJoinConditions, commonJoinCondition) =
    -        split(joinCondition.map(splitConjunctivePredicates).getOrElse(Nil), left, right)
    +      val condition = joinCondition.map(splitConjunctivePredicates).getOrElse(Nil)
    +      val additionalCondition = inferAdditionalConstraints(condition.toSet)
    --- End diff --
    
    IIRC we can only do this in `InferFiltersFromConstraints`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22250: [SPARK-25259][SQL] left/right join support push d...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum closed the pull request at:

    https://github.com/apache/spark/pull/22250


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    **[Test build #95335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95335/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    **[Test build #95328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95328/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    **[Test build #95335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95335/testReport)** for PR 22250 at commit [`f9b32d5`](https://github.com/apache/spark/commit/f9b32d5d044a899529959ad5042f8cf95c789ea8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22250: [SPARK-25259][SQL] left/right join support push down dur...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22250
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95335/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org