You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nisrina Luthfiyati <ni...@gmail.com> on 2015/08/13 12:38:19 UTC

Spark Streaming: Change Kafka topics on runtime

Hi all,

I want to write a Spark Streaming program that listens to Kafka for a list
of topics.
The list of topics that I want to consume is stored in a DB and might
change dynamically. I plan to periodically refresh this list of topics in
the Spark Streaming app.

My question is is it possible to add/remove a Kafka topic that is consumed
by a stream, or probably create a new stream at runtime?
Would I need to stop/start the program or is there any other way to do this?

Thanks!
Nisrina

Re: Spark Streaming: Change Kafka topics on runtime

Posted by Cody Koeninger <co...@koeninger.org>.
There's a long recent thread in this list about stopping apps, subject was
"stopping spark stream app"

at 1 second I wouldn't run repeated rdds, no.

I'd take a look at subclassing, personally (you'll have to rebuild the
streaming kafka project since a lot is private), but if topic changes dont
happen that often, restarting the app when they do should be fine.

On Fri, Aug 14, 2015 at 6:34 AM, Nisrina Luthfiyati <
nisrina.luthfiyati@gmail.com> wrote:

> Hi Cody,
>
> by start/stopping, do you mean the streaming context or the app entirely?
> From what I understand once a streaming context has been stopped it cannot
> be restarted, but I also haven't found a way to stop the app
> programmatically.
>
> The batch duration will probably be around 1-10 seconds. I think this is
> small enough to not make it a batch job?
>
> Thanks again
>
> On Thu, Aug 13, 2015 at 10:15 PM, Cody Koeninger <co...@koeninger.org>
> wrote:
>
>> The current kafka stream implementation assumes the set of topics doesn't
>> change during operation.
>>
>> You could either take a crack at writing a subclass that does what you
>> need; stop/start; or if your batch duration isn't too small, you could run
>> it as a series of RDDs (using the existing KafkaUtils.createRDD) where the
>> set of topics is determined before each rdd.
>>
>> On Thu, Aug 13, 2015 at 4:38 AM, Nisrina Luthfiyati <
>> nisrina.luthfiyati@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I want to write a Spark Streaming program that listens to Kafka for a
>>> list of topics.
>>> The list of topics that I want to consume is stored in a DB and might
>>> change dynamically. I plan to periodically refresh this list of topics in
>>> the Spark Streaming app.
>>>
>>> My question is is it possible to add/remove a Kafka topic that is
>>> consumed by a stream, or probably create a new stream at runtime?
>>> Would I need to stop/start the program or is there any other way to do
>>> this?
>>>
>>> Thanks!
>>> Nisrina
>>>
>>
>>
>
>
> --
> Nisrina Luthfiyati - Ilmu Komputer Fasilkom UI 2010
> http://www.facebook.com/nisrina.luthfiyati
> http://id.linkedin.com/in/nisrina
>
>

Re: Spark Streaming: Change Kafka topics on runtime

Posted by Nisrina Luthfiyati <ni...@gmail.com>.
Hi Cody,

by start/stopping, do you mean the streaming context or the app entirely?
>From what I understand once a streaming context has been stopped it cannot
be restarted, but I also haven't found a way to stop the app
programmatically.

The batch duration will probably be around 1-10 seconds. I think this is
small enough to not make it a batch job?

Thanks again

On Thu, Aug 13, 2015 at 10:15 PM, Cody Koeninger <co...@koeninger.org> wrote:

> The current kafka stream implementation assumes the set of topics doesn't
> change during operation.
>
> You could either take a crack at writing a subclass that does what you
> need; stop/start; or if your batch duration isn't too small, you could run
> it as a series of RDDs (using the existing KafkaUtils.createRDD) where the
> set of topics is determined before each rdd.
>
> On Thu, Aug 13, 2015 at 4:38 AM, Nisrina Luthfiyati <
> nisrina.luthfiyati@gmail.com> wrote:
>
>> Hi all,
>>
>> I want to write a Spark Streaming program that listens to Kafka for a
>> list of topics.
>> The list of topics that I want to consume is stored in a DB and might
>> change dynamically. I plan to periodically refresh this list of topics in
>> the Spark Streaming app.
>>
>> My question is is it possible to add/remove a Kafka topic that is
>> consumed by a stream, or probably create a new stream at runtime?
>> Would I need to stop/start the program or is there any other way to do
>> this?
>>
>> Thanks!
>> Nisrina
>>
>
>


-- 
Nisrina Luthfiyati - Ilmu Komputer Fasilkom UI 2010
http://www.facebook.com/nisrina.luthfiyati
http://id.linkedin.com/in/nisrina

Re: Spark Streaming: Change Kafka topics on runtime

Posted by Cody Koeninger <co...@koeninger.org>.
The current kafka stream implementation assumes the set of topics doesn't
change during operation.

You could either take a crack at writing a subclass that does what you
need; stop/start; or if your batch duration isn't too small, you could run
it as a series of RDDs (using the existing KafkaUtils.createRDD) where the
set of topics is determined before each rdd.

On Thu, Aug 13, 2015 at 4:38 AM, Nisrina Luthfiyati <
nisrina.luthfiyati@gmail.com> wrote:

> Hi all,
>
> I want to write a Spark Streaming program that listens to Kafka for a list
> of topics.
> The list of topics that I want to consume is stored in a DB and might
> change dynamically. I plan to periodically refresh this list of topics in
> the Spark Streaming app.
>
> My question is is it possible to add/remove a Kafka topic that is consumed
> by a stream, or probably create a new stream at runtime?
> Would I need to stop/start the program or is there any other way to do
> this?
>
> Thanks!
> Nisrina
>