You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Long Cheng <pa...@gmail.com> on 2015/03/20 14:13:40 UTC

about Partition Index

Dear all,

About the index of each partition of an RDD, I am wondering whether we
can keep their numbering on each physical machine in a hash
partitioning process. For example, a cluster containing three physical
machines A,B,C (all are workers), for an RDD with six partitions,
assume that the two partitions with index 0 and 3 are in A, partitions
with index 1 and 4 are in B and the ones with index 2 and 5 are in C.
Then, if I hash partition the RDD using "partitionBy(new
HashPartitioner(6))", will the new created RDD still have the same
partition index on each machine? Is it possible that the partitions
with index 0 and 3 are now on B but not A? If it is, is there any
method that we can use to keep both the RDDs having the same numbering
on each physical machine?

Thanks in advance.

Long

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org