You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "sandeep pournami (JIRA)" <ji...@apache.org> on 2016/11/15 00:55:58 UTC

[jira] [Commented] (SPARK-8133) sticky partitions

    [ https://issues.apache.org/jira/browse/SPARK-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665590#comment-15665590 ] 

sandeep pournami commented on SPARK-8133:
-----------------------------------------

+1 
as when using Spark streaming, the underlying storage could be anything, and depending on the storage, we might need to avoid frequent reads in every batch for the same key. this can enhance the performance by multiple folds.

> sticky partitions
> -----------------
>
>                 Key: SPARK-8133
>                 URL: https://issues.apache.org/jira/browse/SPARK-8133
>             Project: Spark
>          Issue Type: New Feature
>          Components: DStreams
>    Affects Versions: 1.3.1
>            Reporter: sid
>
> We are trying to replace Apache Storm with Apache Spark streaming.
> In storm; we partitioned stream based on "Customer ID" so that msgs with a range of "customer IDs" will be routed to same bolt (worker).
> We do this because each worker will cache customer details (from DB).
> So we split into 4 partitions and each bolt (worker) will have 1/4 of the entire range.
> I am hoping we have a solution to this in Spark Streaming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org