You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by aymkhalil <ay...@gmail.com> on 2016/05/05 05:54:50 UTC

Locality aware tree reduction

Hello,

Is there a way to instruct treeReduce() to reduce RDD partitions on the same
node locally?

In my case, I'm using treeReduce() to reduce map results in parallel. My
reduce function is just arithmetically adding map values (i.e. no notion of
aggregation by key). As far as I understand, a shuffle will happen at each
treeReduce() stage using a hash partitioner with the RDD partition index as
input. I would like to enforce RDD partitions on the same node to be reduced
locally (i.e. no shuffling) and only shuffle when each node has one RDD
partition left and before the results are sent to the driver. 

I have few nodes and lots of partitions so I think this will give better
performance.

Thank you,
Ayman



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Locality-aware-tree-reduction-tp26885.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org