You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Kevin Mader (JIRA)" <ji...@apache.org> on 2014/11/27 16:34:12 UTC

[jira] [Created] (SPARK-4640) FixedRangePartitioner for partitioning items with a known range

Kevin Mader created SPARK-4640:
----------------------------------

             Summary: FixedRangePartitioner for partitioning items with a known range
                 Key: SPARK-4640
                 URL: https://issues.apache.org/jira/browse/SPARK-4640
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Kevin Mader


For the large datasets I work with, it is common to have light-weight keys and very heavy values (integers and large double arrays for example). The key values are however known and unchanging. It would be nice if Spark had a built in partitioner which could take advantage of this. A FixedRangePartitioner[T](keys: Seq[T], partitions: Int) would be ideal. Furthermore this partitioner type could be extended to a PartitionerWithKnownKeys that had a getAllKeys function allowing for a list of keys to be obtained without querying through the entire RDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org