You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Samarth Bhargav <ma...@gmail.com> on 2015/03/11 07:15:39 UTC

Split in Java

Hey folks,

I am trying to split a data set into two parts. Since I am using Spark
1.0.0 I cannot use the randomSplit method. I found this SO question :
http://stackoverflow.com/questions/24864828/spark-scala-shuffle-rdd-split-rdd-into-two-random-parts-randomly

which contains this implementation in Scala and Spark 1.0.0:

      def randomSplit(weights: Array[Double], seed: Long =
Utils.random.nextLong): Array[RDD[T]] = {
    val sum = weights.sum
    val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _)
    normalizedCumWeights.sliding(2).map { x =>
       new PartitionwiseSampledRDD[T, T](this, new
BernoulliSampler[T](x(0), x(1)),seed)
    }.toArray

I am using Java, and I tried implementing the above code, but I am unable
to figure out how to do that.

Any Ideas?

Using: Spark 1.0.0 and Java 1.7

Thanks,
Samarth
​