You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "sagarcasual ." <sa...@gmail.com> on 2016/09/02 17:28:04 UTC

Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

Hello, this is for
Pausing spark kafka streaming (direct) or exclude/include some partitions
on the fly per batch
=========================================================
I have following code that creates a direct stream using Kafka connector
for Spark.

final JavaInputDStream<KafkaMessage> msgRecords =
KafkaUtils.createDirectStream(
            jssc, String.class, String.class, StringDecoder.class,
StringDecoder.class,
            KafkaMessage.class, kafkaParams, topicsPartitions,
            message -> {
                return KafkaMessage.builder()
                        .
                        .build();
            }
    );

However I want to handle a situation, where I can decide that this
streaming needs to pause for a while on conditional basis, is there any way
to achieve this? Say my Kafka is undergoing some maintenance, so between
10AM to 12PM stop processing, and then again pick up at 12PM from the last
offset, how do I do it?

Also, assume all of a sudden we want to take one-or-more of the partitions
for a pull and add it back after some pulls, how do I achieve that?

-Regards
Sagar

Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

Posted by Cody Koeninger <co...@koeninger.org>.
Not built in, you're going to have to do some work.

On Sep 2, 2016 16:33, "sagarcasual ." <sa...@gmail.com> wrote:

> Hi Cody, thanks for the reply.
> I am using Spark 1.6.1 with Kafka 0.9.
> When I want to stop streaming, stopping the context sounds ok, but for
> temporarily excluding partitions is there any way I can supply
> topic-partition info on the fly at the beginning of every pull dynamically.
> Will streaminglistener be of any help?
>
> On Fri, Sep 2, 2016 at 10:37 AM, Cody Koeninger <co...@koeninger.org>
> wrote:
>
>> If you just want to pause the whole stream, just stop the app and then
>> restart it when you're ready.
>>
>> If you want to do some type of per-partition manipulation, you're
>> going to need to write some code.  The 0.10 integration makes the
>> underlying kafka consumer pluggable, so you may be able to wrap a
>> consumer to do what you need.
>>
>> On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . <sa...@gmail.com>
>> wrote:
>> > Hello, this is for
>> > Pausing spark kafka streaming (direct) or exclude/include some
>> partitions on
>> > the fly per batch
>> > =========================================================
>> > I have following code that creates a direct stream using Kafka
>> connector for
>> > Spark.
>> >
>> > final JavaInputDStream<KafkaMessage> msgRecords =
>> > KafkaUtils.createDirectStream(
>> >             jssc, String.class, String.class, StringDecoder.class,
>> > StringDecoder.class,
>> >             KafkaMessage.class, kafkaParams, topicsPartitions,
>> >             message -> {
>> >                 return KafkaMessage.builder()
>> >                         .
>> >                         .build();
>> >             }
>> >     );
>> >
>> > However I want to handle a situation, where I can decide that this
>> streaming
>> > needs to pause for a while on conditional basis, is there any way to
>> achieve
>> > this? Say my Kafka is undergoing some maintenance, so between 10AM to
>> 12PM
>> > stop processing, and then again pick up at 12PM from the last offset,
>> how do
>> > I do it?
>> >
>> > Also, assume all of a sudden we want to take one-or-more of the
>> partitions
>> > for a pull and add it back after some pulls, how do I achieve that?
>> >
>> > -Regards
>> > Sagar
>> >
>>
>
>

Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

Posted by "sagarcasual ." <sa...@gmail.com>.
Hi Cody, thanks for the reply.
I am using Spark 1.6.1 with Kafka 0.9.
When I want to stop streaming, stopping the context sounds ok, but for
temporarily excluding partitions is there any way I can supply
topic-partition info on the fly at the beginning of every pull dynamically.
Will streaminglistener be of any help?

On Fri, Sep 2, 2016 at 10:37 AM, Cody Koeninger <co...@koeninger.org> wrote:

> If you just want to pause the whole stream, just stop the app and then
> restart it when you're ready.
>
> If you want to do some type of per-partition manipulation, you're
> going to need to write some code.  The 0.10 integration makes the
> underlying kafka consumer pluggable, so you may be able to wrap a
> consumer to do what you need.
>
> On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . <sa...@gmail.com>
> wrote:
> > Hello, this is for
> > Pausing spark kafka streaming (direct) or exclude/include some
> partitions on
> > the fly per batch
> > =========================================================
> > I have following code that creates a direct stream using Kafka connector
> for
> > Spark.
> >
> > final JavaInputDStream<KafkaMessage> msgRecords =
> > KafkaUtils.createDirectStream(
> >             jssc, String.class, String.class, StringDecoder.class,
> > StringDecoder.class,
> >             KafkaMessage.class, kafkaParams, topicsPartitions,
> >             message -> {
> >                 return KafkaMessage.builder()
> >                         .
> >                         .build();
> >             }
> >     );
> >
> > However I want to handle a situation, where I can decide that this
> streaming
> > needs to pause for a while on conditional basis, is there any way to
> achieve
> > this? Say my Kafka is undergoing some maintenance, so between 10AM to
> 12PM
> > stop processing, and then again pick up at 12PM from the last offset,
> how do
> > I do it?
> >
> > Also, assume all of a sudden we want to take one-or-more of the
> partitions
> > for a pull and add it back after some pulls, how do I achieve that?
> >
> > -Regards
> > Sagar
> >
>

Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

Posted by Cody Koeninger <co...@koeninger.org>.
If you just want to pause the whole stream, just stop the app and then
restart it when you're ready.

If you want to do some type of per-partition manipulation, you're
going to need to write some code.  The 0.10 integration makes the
underlying kafka consumer pluggable, so you may be able to wrap a
consumer to do what you need.

On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . <sa...@gmail.com> wrote:
> Hello, this is for
> Pausing spark kafka streaming (direct) or exclude/include some partitions on
> the fly per batch
> =========================================================
> I have following code that creates a direct stream using Kafka connector for
> Spark.
>
> final JavaInputDStream<KafkaMessage> msgRecords =
> KafkaUtils.createDirectStream(
>             jssc, String.class, String.class, StringDecoder.class,
> StringDecoder.class,
>             KafkaMessage.class, kafkaParams, topicsPartitions,
>             message -> {
>                 return KafkaMessage.builder()
>                         .
>                         .build();
>             }
>     );
>
> However I want to handle a situation, where I can decide that this streaming
> needs to pause for a while on conditional basis, is there any way to achieve
> this? Say my Kafka is undergoing some maintenance, so between 10AM to 12PM
> stop processing, and then again pick up at 12PM from the last offset, how do
> I do it?
>
> Also, assume all of a sudden we want to take one-or-more of the partitions
> for a pull and add it back after some pulls, how do I achieve that?
>
> -Regards
> Sagar
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org