You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "pankaj.arora" <pa...@gmail.com> on 2014/09/12 19:27:50 UTC

Re: split a RDD by pencetage

You can use MapPartitions to achieve this.

/split each partition into 10 equal parts with each part having number as
its id
val splittedRDD = self.mapPartitions((itr)=> {
Iterate over this iterator and breaks this iterator into 10 parts.
val iterators = Array[ArrayBuffer[T]](10)
var i =0
for(tuple <- itr) {
  iterators(i%10) = tuple
i+=1
}
i = 0
iterators.map((i,_))
})

//filter rdd for each part broken above and flat map to get array of RDDs
var rddArray = (0 to 10).toArray.map(i => splittedRDD.filter(_._1 ==
i).flatMap(x=>x)

The code is not written in IDE it will work with little modifications



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/split-a-RDD-by-pencetage-tp333p14106.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org