You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2019/06/11 09:48:00 UTC
[jira] [Commented] (SPARK-25360) Parallelized RDDs of Ranges could
have known partitioner
[ https://issues.apache.org/jira/browse/SPARK-25360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860756#comment-16860756 ]
zhengruifeng commented on SPARK-25360:
--------------------------------------
[~holdenk] i am afraid it is not doable to add a partitioner to \{RDD[Long]} generated by \{sc.range}, refering to the defination of partitioner.
{code:java}
/**
* An object that defines how the elements in a key-value pair RDD are partitioned by key.
* Maps each key to a partition ID, from 0 to `numPartitions - 1`.
*
* Note that, partitioner must be deterministic, i.e. it must return the same partition id given
* the same partition key.
*/{code}
Since the returned RDD[Long] is not a \{PairRDD}, so that following ops (like join, sort) which can utilize upstreaming partitioner.
An alternative is to add some method like `sc.tabulate[T](start, end, step, numSlices)(f: Long => T)`, so that the partitioner can be used in future ops.
> Parallelized RDDs of Ranges could have known partitioner
> --------------------------------------------------------
>
> Key: SPARK-25360
> URL: https://issues.apache.org/jira/browse/SPARK-25360
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.4.0
> Reporter: holdenk
> Priority: Trivial
>
> We already have the logic to split up the generator, we could expose the same logic as a partitioner. This would be useful when joining a small parallelized collection with a larger collection and other cases.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org