You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by hvanhovell <gi...@git.apache.org> on 2018/06/21 12:40:03 UTC

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16677#discussion_r197116936
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala ---
    @@ -247,6 +253,10 @@ object ShuffleExchangeExec {
             val projection = UnsafeProjection.create(h.partitionIdExpression :: Nil, outputAttributes)
             row => projection(row).getInt(0)
           case RangePartitioning(_, _) | SinglePartition => identity
    +      case LocalPartitioning(_, _) =>
    +        (row: InternalRow) => {
    +          TaskContext.get().partitionId()
    --- End diff --
    
    Can we try to do this once per partition instead of for each row?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org