You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "ulysses-you (via GitHub)" <gi...@apache.org> on 2023/03/03 07:41:05 UTC

[GitHub] [spark] ulysses-you commented on pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

ulysses-you commented on PR #40262:
URL: https://github.com/apache/spark/pull/40262#issuecomment-1453105982

   @zhengruifeng thank you for your thought.
    
   The original idea of driver sort is to avoid one shuffle. Requires SinglePartition seems does not help since it still requires a shuffle.
   
   Besides, finally, the result would go to driver, i.e. `df.sort.collect` (It's the reason I match `ReturnAnswer`), so it should be fine to do at driver. Plus, do sort at driver is not the first code place except driver sort. e.g., the merge function of rdd.takeOrdered. It should be safe if the size of plan is small enough.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org