You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maryannxue <gi...@git.apache.org> on 2018/02/14 22:20:14 UTC

[GitHub] spark pull request #20613: SPARK-23368 Avoid unnecessary Exchange or Sort af...

GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/20613

    SPARK-23368 Avoid unnecessary Exchange or Sort after projection

    ## What changes were proposed in this pull request?
    
    1. Add "project" methods for both Partitioning and Ordering (Seq[SortOrder]), which returns what should be the partitioning or ordering after the projection, i.e., substitutes the projected expressions with aliases specified in the project.
    
    2. In ProjectExec, XXXAggregateExec, make outputPartitioning and outputOrdering return "child.outputPartitioning.project(projectList)" or "SortOrder.projectOrderings(child.outputOrdering, projectList)" instead of "child.outputPartitioning" or "child.outputOrdering".
    
    ## How was this patch tested?
    
    1. Add 2 tests in SQLQuerySuite to verify that the unnecessary Exchange and/or Sort have been eliminated from the execution plan.
    2. Add 1 unit test in DistributionSuite.
    
    ## Note
    Note that there could be some variation in the implementation of "Partitioning.project(projectList)", depending on whether or not we'd choose to retain the original partitioning together with the projected partitioning, and how far we'd go to include all possible valid partitionings.
    Since it is usually impossible to refer to the original expression once it is projected with an alias, unless it appears elsewhere in the projection list (without alias or with a different alias). This would lead to the implementation being more complex and generating a cartesian product of all substituted/unsubstituted expressions if 1) a partitioning contains more than one expression; 2) a partitioning contains expressions that consist of more than one projected expression.
    That said, I consider "an expression being projected twice in one projection" to be a rare case which does not make much sense, so the current implementation should be good enough.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-23368

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20613
    
----
commit 91b3cd0ab4aa16781866fb635b571c805ae4359b
Author: maryannxue <ma...@...>
Date:   2018-02-14T21:56:33Z

    SPARK-23368 Avoid unnecessary Exchange or Sort after projection

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Did you get a chance to look at it, @dongjoon-hyun?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/972/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/899/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/977/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    @gatorsmile, @kiszk, any update on this one?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    cc @dongjoon-hyun  Do you want to make a try to help review this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87574/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87454/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or ...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue closed the pull request at:

    https://github.com/apache/spark/pull/20613


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    **[Test build #87565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87565/testReport)** for PR 20613 at commit [`27a1af2`](https://github.com/apache/spark/commit/27a1af22e94f49b7801c4f49443ebadb1ff35571).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    **[Test build #87454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87454/testReport)** for PR 20613 at commit [`91b3cd0`](https://github.com/apache/spark/commit/91b3cd0ab4aa16781866fb635b571c805ae4359b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    **[Test build #87454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87454/testReport)** for PR 20613 at commit [`91b3cd0`](https://github.com/apache/spark/commit/91b3cd0ab4aa16781866fb635b571c805ae4359b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    **[Test build #87574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87574/testReport)** for PR 20613 at commit [`27a1af2`](https://github.com/apache/spark/commit/27a1af22e94f49b7801c4f49443ebadb1ff35571).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    **[Test build #87565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87565/testReport)** for PR 20613 at commit [`27a1af2`](https://github.com/apache/spark/commit/27a1af22e94f49b7801c4f49443ebadb1ff35571).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    **[Test build #87574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87574/testReport)** for PR 20613 at commit [`27a1af2`](https://github.com/apache/spark/commit/27a1af22e94f49b7801c4f49443ebadb1ff35571).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87565/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20613: [SPARK-23368][SQL] Avoid unnecessary Exchange or Sort af...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20613
  
    @dongjoon-hyun Try to review this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org