You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sudarshan Kadambi (JIRA)" <ji...@apache.org> on 2015/09/01 18:57:46 UTC

[jira] [Commented] (SPARK-10320) Kafka Support new topic subscriptions without requiring restart of the streaming context

    [ https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725687#comment-14725687 ] 

Sudarshan Kadambi commented on SPARK-10320:
-------------------------------------------

"That's not the way checkpoints work. ..."
Sure, that makes sense.

"Can you say a little more about what you're actually doing here? "
Consider a long running streaming application, exposing a service endpoint where new subscription topics are provided via HTTP requests. When this application is initialized, the streaming context is started. New topic subscriptions are addressed by the event handler that caters to new topic subscription messages. This handler has a reference to the streaming context. 

> Kafka Support new topic subscriptions without requiring restart of the streaming context
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-10320
>                 URL: https://issues.apache.org/jira/browse/SPARK-10320
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Sudarshan Kadambi
>
> Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe to current ones once the streaming context has been started. Restarting the streaming context increases the latency of update handling.
> Consider a streaming application subscribed to n topics. Let's say 1 of the topics is no longer needed in streaming analytics and hence should be dropped. We could do this by stopping the streaming context, removing that topic from the topic list and restarting the streaming context. Since with some DStreams such as DirectKafkaStream, the per-partition offsets are maintained by Spark, we should be able to resume uninterrupted (I think?) from where we left off with a minor delay. However, in instances where expensive state initialization (from an external datastore) may be needed for datasets published to all topics, before streaming updates can be applied to it, it is more convenient to only subscribe or unsubcribe to the incremental changes to the topic list. Without such a feature, updates go unprocessed for longer than they need to be, thus affecting QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org