You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/18 01:35:27 UTC

[GitHub] [spark] mridulm edited a comment on pull request #30392: [SPARK-33465][CORE] RDD.takeOrdered should get rid of usage of reduce or use treeReduce instead

mridulm edited a comment on pull request #30392:
URL: https://github.com/apache/spark/pull/30392#issuecomment-729318977


   > I think so. In some cases, unnecessary executor-side reduce might invoke an additional map task although it just returns the single element. So this is just a minor concern for me.
   
   There will not be an additional map task - it will get pipelined with the `mapPartitions` - with the `iter.reduceLeft` in `reduce` working on a single element. Essentially, I am not sure what this change is buying us.
   
   If the concern had been that the driver is handling all the priority queue's - I can see that being an issue (that is a general critique on reduce itself).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org