You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ihainan <ih...@gmail.com> on 2015/08/17 18:13:27 UTC

What's the logic in RangePartitioner.rangeBounds method of Apache Spark

*Firstly so sorry for my poor English.*

I was reading the source code of Apache Spark 1.4.1 and I really got stuck
at the logic of RangePartitioner.rangeBounds method. The code is shown
below.



So can anyone please explain me that:

1. What is "3.0 *" for in the code line of "val sampleSizePerPartition =
math.ceil(3.0 * sampleSize / rdd.partitions.size).toInt"? Why choose 3.0
rather than other values?

2. Why "fraction * n > sampleSizePerPartition" means that a partition
contains much more than the average number of items. Can you give an example
that we need to re-sample the partition?

Thanks a lot!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-s-the-logic-in-RangePartitioner-rangeBounds-method-of-Apache-Spark-tp24296.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org