You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tejasapatil <gi...@git.apache.org> on 2017/02/19 00:26:35 UTC

[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16985#discussion_r101905768
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ---
    @@ -33,8 +33,8 @@ import org.apache.spark.util.collection.BitSet
      * Performs a sort merge join of two child relations.
      */
     case class SortMergeJoinExec(
    -    leftKeys: Seq[Expression],
    -    rightKeys: Seq[Expression],
    +    var leftKeys: Seq[Expression],
    --- End diff --
    
    This seems ugly but I can't think of a better way. The problem is: I want to mutate this ordering at some point in the query planning. I cannot do that when `SortMergeJoinExec` object is generated because there wont be ample information available at that time.
    
    I tried to add class attributes which would be altered and don't mutate this. Doing that, I saw that that tasks on executor do not see the updated values of the local class attributes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org