You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by nwali <no...@utbm.fr> on 2016/01/19 13:55:56 UTC

Is there a way to co-locate partitions from two partitioned RDDs?

Hi,

I am working with Spark in Java on top of a HDFS cluster. In my code two
RDDs are partitioned with the same partitioner (HashPartitioner with the
same number of partitions), so they are co-partitioned.
Thus same keys are on the same partitions' number but that does not mean
that both RDDs are necessarily co-located, that's to say that same
partitions are on same nodes.
For example partition#1 from RDD#1 may not be on the same node as
partition#1 from RDD#2. I would like to co-locate partitioned RDDs to reduce
data transfer between nodes when applying a join operation on the RDDs. 
Is there a way to do that?

Thank you



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-way-to-co-locate-partitions-from-two-partitioned-RDDs-tp26008.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org