You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by aviemzur <av...@gmail.com> on 2015/07/26 15:29:23 UTC

Long running streaming application - worker death

Hi all,

I have a question about long running streaming applications and workers that
act as consumers.

Specifically my program runs on a spark standalone cluster with a small
number of workers, acting as kafka consumers using spark streaming.

What I noticed was that in a long running application, if one of the workers
dies for some reason and then a new worker registers to replace it, we have
effectively lost that worker as a consumer.

When the driver first runs, I create a configured amount of
KafkaInputDStream instances, in my case, the same number as the number of
workers in the cluster, and spark distributes these among the workers, so
each one of my workers consumes from Kafka.

I then unify the streams to a single stream using SparkStreamingContext
union.

This code never runs again though, and there is no code that monitors that
we have X number of consumers at all time.

So when a worker dies, we effectively lose a consumer, and never create a
new one, then the lag in Kafka starts growing.

Does anybody have a solution / ideas regarding this issue?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-running-streaming-application-worker-death-tp23997.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Long running streaming application - worker death

Posted by Tathagata Das <td...@databricks.com>.

How are you figuring out that the receiver isnt running? What does the
streaming UI say regarding the location of the receivers? Also what version
of Spark are you using?

On Sun, Jul 26, 2015 at 8:47 AM, Ashwin Giridharan <as...@gmail.com>
wrote:

> Hi Aviemzur,
>
> As of now the Spark workers just run indefinitely in a loop irrespective
> of whether the data source (kafka) is active or lost its connection, due to
> the fact that it just reads the zookeeper for the offset of the data to be
> consumed. So when your DStream receiver is lost, its LOST!
>
> As mentioned here (
> http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/)
> ,
>
> "One crude workaround is to restart your streaming application whenever it
> runs into an upstream data source failure or a receiver failure. This
> workaround may not help you though if your use case requires you to set the
> Kafka configuration option auto.offset.reset to “smallest” – because of a
> known bug in Spark Streaming the resulting behavior of your streaming
> application may not be what you want."
>
> The jira "https://spark-project.atlassian.net/browse/SPARK-1340"
> corresponding to this bug is yet to be resolved.
>
> Also have a look at
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-td3347.html
>
> Thanks,
> Ashwin
>
>
> On Sun, Jul 26, 2015 at 9:29 AM, aviemzur <av...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a question about long running streaming applications and workers
>> that
>> act as consumers.
>>
>> Specifically my program runs on a spark standalone cluster with a small
>> number of workers, acting as kafka consumers using spark streaming.
>>
>> What I noticed was that in a long running application, if one of the
>> workers
>> dies for some reason and then a new worker registers to replace it, we
>> have
>> effectively lost that worker as a consumer.
>>
>> When the driver first runs, I create a configured amount of
>> KafkaInputDStream instances, in my case, the same number as the number of
>> workers in the cluster, and spark distributes these among the workers, so
>> each one of my workers consumes from Kafka.
>>
>> I then unify the streams to a single stream using SparkStreamingContext
>> union.
>>
>> This code never runs again though, and there is no code that monitors that
>> we have X number of consumers at all time.
>>
>> So when a worker dies, we effectively lose a consumer, and never create a
>> new one, then the lag in Kafka starts growing.
>>
>> Does anybody have a solution / ideas regarding this issue?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Long-running-streaming-application-worker-death-tp23997.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
>
> --
> Thanks & Regards,
> Ashwin Giridharan
>

Re: Long running streaming application - worker death

Posted by Ashwin Giridharan <as...@gmail.com>.

Hi Aviemzur,

As of now the Spark workers just run indefinitely in a loop irrespective of
whether the data source (kafka) is active or lost its connection, due to
the fact that it just reads the zookeeper for the offset of the data to be
consumed. So when your DStream receiver is lost, its LOST!

As mentioned here (
http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/)
,

"One crude workaround is to restart your streaming application whenever it
runs into an upstream data source failure or a receiver failure. This
workaround may not help you though if your use case requires you to set the
Kafka configuration option auto.offset.reset to “smallest” – because of a
known bug in Spark Streaming the resulting behavior of your streaming
application may not be what you want."

The jira "https://spark-project.atlassian.net/browse/SPARK-1340"
corresponding to this bug is yet to be resolved.

Also have a look at
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-and-the-spark-shell-td3347.html

Thanks,
Ashwin

On Sun, Jul 26, 2015 at 9:29 AM, aviemzur <av...@gmail.com> wrote:

> Hi all,
>
> I have a question about long running streaming applications and workers
> that
> act as consumers.
>
> Specifically my program runs on a spark standalone cluster with a small
> number of workers, acting as kafka consumers using spark streaming.
>
> What I noticed was that in a long running application, if one of the
> workers
> dies for some reason and then a new worker registers to replace it, we have
> effectively lost that worker as a consumer.
>
> When the driver first runs, I create a configured amount of
> KafkaInputDStream instances, in my case, the same number as the number of
> workers in the cluster, and spark distributes these among the workers, so
> each one of my workers consumes from Kafka.
>
> I then unify the streams to a single stream using SparkStreamingContext
> union.
>
> This code never runs again though, and there is no code that monitors that
> we have X number of consumers at all time.
>
> So when a worker dies, we effectively lose a consumer, and never create a
> new one, then the lag in Kafka starts growing.
>
> Does anybody have a solution / ideas regarding this issue?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Long-running-streaming-application-worker-death-tp23997.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

-- 
Thanks & Regards,
Ashwin Giridharan