You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yogesh Vyas <in...@gmail.com> on 2016/06/15 11:30:41 UTC

Handle empty kafka in Spark Streaming

Hi,

Does anyone knows how to handle empty Kafka while Spark Streaming job
is running ?

Regards,
Yogesh

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


RE: Handle empty kafka in Spark Streaming

Posted by David Newberger <da...@wandcorp.com>.
Hi Yogesh,

I'm not sure if this is possible or not. I'd be interested in knowing. My gut thinks it would be an anti-pattern if it's possible to do something like this and that's why I handle it in either the foreachRDD or foreachPartition. The way I look at spark streaming is as an application which is always running and doing something like windowed batching or microbatching or whatever I'm trying to accomplish. IF an RDD I get from Kafka is empty then I don't run the rest of the job.  IF the RDD I'm get from Kafka has some number of events then I'll process the RDD further. 

David Newberger

-----Original Message-----
From: Yogesh Vyas [mailto:informyogi@gmail.com] 
Sent: Wednesday, June 15, 2016 8:30 AM
To: David Newberger
Subject: Re: Handle empty kafka in Spark Streaming

I am looking for something which checks the JavaPairReceiverInputDStreambefore further going for any operations.
For example, if I have get JavaPairReceiverInputDStream in following
manner:

JavaPairReceiverInputDStream<String, String> message=KafkaUtils.createStream(ssc, zkQuorum, group, topics, StorageLevel.MEMORY_AND_DISK_SER());

Then I would like check whether message is empty or not. If it not empty then go for further operations else wait for some data in Kafka.

On Wed, Jun 15, 2016 at 6:31 PM, David Newberger <da...@wandcorp.com> wrote:
> If you're asking how to handle no messages in a batch window then I would add an isEmpty check like:
>
> dStream.foreachRDD(rdd => {
> if (!rdd.isEmpty())
> ...
> }
>
> Or something like that.
>
>
> David Newberger
>
> -----Original Message-----
> From: Yogesh Vyas [mailto:informyogi@gmail.com]
> Sent: Wednesday, June 15, 2016 6:31 AM
> To: user
> Subject: Handle empty kafka in Spark Streaming
>
> Hi,
>
> Does anyone knows how to handle empty Kafka while Spark Streaming job is running ?
>
> Regards,
> Yogesh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For 
> additional commands, e-mail: user-help@spark.apache.org
>

RE: Handle empty kafka in Spark Streaming

Posted by David Newberger <da...@wandcorp.com>.
If you're asking how to handle no messages in a batch window then I would add an isEmpty check like:

dStream.foreachRDD(rdd => {
if (!rdd.isEmpty()) 
...
}

Or something like that. 


David Newberger

-----Original Message-----
From: Yogesh Vyas [mailto:informyogi@gmail.com] 
Sent: Wednesday, June 15, 2016 6:31 AM
To: user
Subject: Handle empty kafka in Spark Streaming

Hi,

Does anyone knows how to handle empty Kafka while Spark Streaming job is running ?

Regards,
Yogesh

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org