You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Renat Bekbolatov (JIRA)" <ji...@apache.org> on 2015/05/04 21:05:06 UTC

[jira] [Created] (SPARK-7342) Partitioner implementation that uses Int keys directly

Renat Bekbolatov created SPARK-7342:
---------------------------------------

             Summary: Partitioner implementation that uses Int keys directly
                 Key: SPARK-7342
                 URL: https://issues.apache.org/jira/browse/SPARK-7342
             Project: Spark
          Issue Type: Question
          Components: Spark Core
            Reporter: Renat Bekbolatov
            Priority: Trivial


I wanted to find out if we could find it useful to have a partitioner implementation that directly uses integer keys.

E.g. for an element (i, t) in RDD[(Int, T)], partition id would be (i % numPartitions).

This can be useful when we want to have a better control over partitions, simply by using key portion of a pair-RDD to communicate partition id.

Currently, HashPartitioner can be used for this, but having such "direct" partitioner would allow us to skip key object hash computation and also prevent partition collisions (HashPartitioner uses: key.hashCode % numPartitions), if that is desirable to the user.

One use-case is in RDD.treeAggregate where we already compute partition id and putting it into a key, before reduceByKey operation.

Another possibility is that explicitly having such a "direct" partitioner, might encourage developers to introduce more sophisticated communication patterns between executors.

Here is a pull request that has a sketch of that: https://github.com/apache/spark/pull/5884

This is an insignificant change. If we want to keep our core Spark Partitioner implementations lean, we can just skip this, just throwing an idea for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org