You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Ramanan, Buvana (Nokia - US/Murray Hill)" <bu...@nokia-bell-labs.com> on 2017/10/28 02:46:06 UTC

Kafka Direct Stream - dynamic topic subscription

Hello,

Using Spark 2.2.0. Interested in seeing the action of dynamic topic subscription.

Tried this example: streaming.DirectKafkaWordCount (which uses org.apache.spark.streaming.kafka010)

I start with 8 Kafka partitions in my topic and found that Spark Streaming executes 8 tasks (one per partition), which is what is expected. While this example process was going on, I increased the Kafka partitions to 16 and started producing data to the new partitions as well.

I expected that the Kafka consumer that Spark uses, would detect this change and spawn new tasks for the new partitions. But I find that it only reads from the old partitions and does not read from new partitions. When I do a restart, it reads from all 16 partitions.

Is this expected?

What is meant by dynamic topic subscription?

Does it apply only to topics with a name that matches a regular expression and it does not apply to dynamically growing partitions?

Thanks,
Buvana


Re: FW: Kafka Direct Stream - dynamic topic subscription

Posted by Cody Koeninger <co...@koeninger.org>.
As it says in SPARK-10320 and in the docs at
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#consumerstrategies
, you can use SubscribePattern

On Sun, Oct 29, 2017 at 3:56 PM, Ramanan, Buvana (Nokia - US/Murray
Hill) <bu...@nokia-bell-labs.com> wrote:
> Hello Cody,
>
>
>
> As the stake holders of JIRA SPARK-10320 issue, can you please explain the
> purpose of dynamic topic subscription? Does it mean adapting the consumer to
> read from the new partitions that might get created after the SparkStreaming
> job begins? Is there a succinct writeup on the dynamic topic subscription
> feature that you can share?
>
>
>
> Also, is there  a way I can subscribe to topics whose name matches a regular
> expression (some Kafka consumers such as kafka-python python library support
> that)?
>
>
>
> I forward the email I sent to spark users group that contains a little more
> background on my question.
>
>
>
> Thank you,
>
> Regards,
>
> Buvana
>
>
>
> From: Ramanan, Buvana (Nokia - US/Murray Hill)
> [mailto:buvana.ramanan@nokia-bell-labs.com]
> Sent: Friday, October 27, 2017 10:46 PM
> To: user@spark.apache.org
> Subject: Kafka Direct Stream - dynamic topic subscription
>
>
>
> Hello,
>
>
>
> Using Spark 2.2.0. Interested in seeing the action of dynamic topic
> subscription.
>
>
>
> Tried this example: streaming.DirectKafkaWordCount (which uses
> org.apache.spark.streaming.kafka010)
>
>
>
> I start with 8 Kafka partitions in my topic and found that Spark Streaming
> executes 8 tasks (one per partition), which is what is expected. While this
> example process was going on, I increased the Kafka partitions to 16 and
> started producing data to the new partitions as well.
>
>
>
> I expected that the Kafka consumer that Spark uses, would detect this change
> and spawn new tasks for the new partitions. But I find that it only reads
> from the old partitions and does not read from new partitions. When I do a
> restart, it reads from all 16 partitions.
>
>
>
> Is this expected?
>
>
>
> What is meant by dynamic topic subscription?
>
>
>
> Does it apply only to topics with a name that matches a regular expression
> and it does not apply to dynamically growing partitions?
>
>
>
> Thanks,
>
> Buvana
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org