You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Christopher Piggott <cp...@gmail.com> on 2018/01/08 20:51:55 UTC
Spark MakeRDD preferred workers
Hi,
def makeRDD[T](seq: Seq[(T, Seq[String])])(implicit arg0: ClassTag[T]):
RDD[T]
list of tuples of data and location preferences (hostnames of Spark
nodes)
Is that list a list of acceptable choices, and it will choose one of them?
Or is it an ordered list? I'm trying to ascertain how well it will
distribute if there's a lot of overlap between partitions and nodes.
In my particular case, my RDD is Seq of (filePath, hosts[]) where hosts
are nodes on which the file's blocks are local.
--C