You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by dsiegel <de...@gmail.com> on 2014/12/03 22:01:51 UTC

Re: Insert new data into specific partition of an RDD

I'm not sure about .union(), but at least in the case of .join(), as long as
you have hash partitioned the original RDDs and persisted them, calls to
.join() take advantage of already knowing which partition the keys are on,
and will not repartition rdd1. 

val rdd1 = log.partitionBy(new HashPartitioner(10)).persist() 
val rdd3 = rdd1.join(rdd2)

I suspect you want to use one of the key aware operations anyways, rather
than .union()
I know other operations are also partitioner aware like this, though I don't
know which ones. Perhaps use the partitioner property in order to test your
operation? 

cheers, 
ds



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Insert-new-data-into-specific-partition-of-an-RDD-tp20156p20291.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org