You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/06/06 00:41:01 UTC

[jira] [Resolved] (SPARK-8133) sticky partitions

     [ https://issues.apache.org/jira/browse/SPARK-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-8133.
------------------------------
    Resolution: Invalid

I am not sure this makes sense in the context of Spark Streaming. There is no persistent partition to speak of. There is a stream of RDDs which have partitions. In general there's no reason to expect events to fall into one partition or the other but you can certainly make an RDD from the interval RDDs with the partitioning you want. In some special cases the RDD partitioning will follow the upstream source's partitioning like with a Kafka direct stream. So, I suppose this is for practical purposes already entirely supported.

In any event this is a question for user@, not a JIRA.

> sticky partitions
> -----------------
>
>                 Key: SPARK-8133
>                 URL: https://issues.apache.org/jira/browse/SPARK-8133
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>    Affects Versions: 1.3.1
>            Reporter: sid
>
> We are trying to replace Apache Storm with Apache Spark streaming.
> In storm; we partitioned stream based on "Customer ID" so that msgs with a range of "customer IDs" will be routed to same bolt (worker).
> We do this because each worker will cache customer details (from DB).
> So we split into 4 partitions and each bolt (worker) will have 1/4 of the entire range.
> I am hoping we have a solution to this in Spark Streaming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org