You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by tejasapatil <gi...@git.apache.org> on 2017/05/08 21:24:15 UTC

[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16985#discussion_r115356786
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -41,6 +41,42 @@ case class SortMergeJoinExec(
         left: SparkPlan,
         right: SparkPlan) extends BinaryExecNode with CodegenSupport {
     
    +  lazy val (reorderedLeftKeys, reorderedRightKeys) = {
    --- End diff --
    
    I looked at #17339 and its doing something orthogonal to whats done here.
    
    #17339 is ensuring that the join outputs' sort ordering has attributes from both relations. 
    This PR is ensuring that the order of join kets (in both distribution and sort order) is not blindly picked from the order of occurrence in the query string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org