You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Cassa L <lc...@gmail.com> on 2015/12/01 07:13:37 UTC

Spark streaming job hangs

Hi,
 I am reading data from Kafka into spark. It runs fine for sometime but
then hangs forever with following output. I don't see and errors in logs.
How do I debug this?

2015-12-01 06:04:30,697 [dag-scheduler-event-loop] INFO  (Logging.scala:59)
- Adding task set 19.0 with 4 tasks
2015-12-01 06:04:30,872 [pool-13-thread-1] INFO  (Logging.scala:59) -
Disconnected from Cassandra cluster: APG DEV Cluster
2015-12-01 06:04:35,060 [JobGenerator] INFO  (Logging.scala:59) - Added
jobs for time 1448949875000 ms
2015-12-01 06:04:40,054 [JobGenerator] INFO  (Logging.scala:59) - Added
jobs for time 1448949880000 ms
2015-12-01 06:04:45,034 [JobGenerator] INFO  (Logging.scala:59) - Added
jobs for time 1448949885000 ms
2015-12-01 06:04:50,100 [JobGenerator] INFO  (Logging.scala:59) - Added
jobs for time 1448949890000 ms
2015-12-01 06:04:55,064 [JobGenerator] INFO  (Logging.scala:59) - Added
jobs for time 1448949895000 ms
2015-12-01 06:05:00,125 [JobGenerator] INFO  (Logging.scala:59) - Added
jobs for time 1448949900000 ms


Thanks
LCassa

Re: Spark streaming job hangs

Posted by Archit Thakur <ar...@gmail.com>.

Which version of spark you are runinng? Have you created Kafka-Directstream
? I am asking coz you might / might not be using receivers.
Also, When you say hangs, you mean there is no other log after this and
process still up?
Or do you mean, it kept on adding the jobs but did nothing else. (I am
optimistic :) ).

On Tue, Dec 1, 2015 at 4:12 PM, Paul Leclercq <pa...@tabmo.io>
wrote:

> You might not have enough cores to process data from Kafka
>
>
>> When running a Spark Streaming program locally, do not use “local” or
>> “local[1]” as the master URL. Either of these means that only one thread
>> will be used for running tasks locally. If you are using a input DStream
>> based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single
>> thread will be used to run the receiver, leaving no thread for processing
>> the received data. *Hence, when running locally, always use “local[n]”
>> as the master URL, *where n > number of receivers to run (see Spark
>> Properties for information on how to set the master).*
>
>
>
>  https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers
> <https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers>
>
> 2015-12-01 7:13 GMT+01:00 Cassa L <lc...@gmail.com>:
>
>> Hi,
>>  I am reading data from Kafka into spark. It runs fine for sometime but
>> then hangs forever with following output. I don't see and errors in logs.
>> How do I debug this?
>>
>> 2015-12-01 06:04:30,697 [dag-scheduler-event-loop] INFO
>> (Logging.scala:59) - Adding task set 19.0 with 4 tasks
>> 2015-12-01 06:04:30,872 [pool-13-thread-1] INFO  (Logging.scala:59) -
>> Disconnected from Cassandra cluster: APG DEV Cluster
>> 2015-12-01 06:04:35,060 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949875000 ms
>> 2015-12-01 06:04:40,054 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949880000 ms
>> 2015-12-01 06:04:45,034 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949885000 ms
>> 2015-12-01 06:04:50,100 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949890000 ms
>> 2015-12-01 06:04:55,064 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949895000 ms
>> 2015-12-01 06:05:00,125 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949900000 ms
>>
>>
>> Thanks
>> LCassa
>>
>
>
>
> --
>
> Paul Leclercq | Data engineer
>
>
>  paul.leclercq@tabmo.io  |  http://www.tabmo.fr/
>

Re: Spark streaming job hangs

Posted by Archit Thakur <ar...@gmail.com>.

Which version of spark you are runinng? Have you created Kafka-Directstream
? I am asking coz you might / might not be using receivers.
Also, When you say hangs, you mean there is no other log after this and
process still up?
Or do you mean, it kept on adding the jobs but did nothing else. (I am
optimistic :) ).

On Tue, Dec 1, 2015 at 4:12 PM, Paul Leclercq <pa...@tabmo.io>
wrote:

> You might not have enough cores to process data from Kafka
>
>
>> When running a Spark Streaming program locally, do not use “local” or
>> “local[1]” as the master URL. Either of these means that only one thread
>> will be used for running tasks locally. If you are using a input DStream
>> based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single
>> thread will be used to run the receiver, leaving no thread for processing
>> the received data. *Hence, when running locally, always use “local[n]”
>> as the master URL, *where n > number of receivers to run (see Spark
>> Properties for information on how to set the master).*
>
>
>
>  https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers
> <https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers>
>
> 2015-12-01 7:13 GMT+01:00 Cassa L <lc...@gmail.com>:
>
>> Hi,
>>  I am reading data from Kafka into spark. It runs fine for sometime but
>> then hangs forever with following output. I don't see and errors in logs.
>> How do I debug this?
>>
>> 2015-12-01 06:04:30,697 [dag-scheduler-event-loop] INFO
>> (Logging.scala:59) - Adding task set 19.0 with 4 tasks
>> 2015-12-01 06:04:30,872 [pool-13-thread-1] INFO  (Logging.scala:59) -
>> Disconnected from Cassandra cluster: APG DEV Cluster
>> 2015-12-01 06:04:35,060 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949875000 ms
>> 2015-12-01 06:04:40,054 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949880000 ms
>> 2015-12-01 06:04:45,034 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949885000 ms
>> 2015-12-01 06:04:50,100 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949890000 ms
>> 2015-12-01 06:04:55,064 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949895000 ms
>> 2015-12-01 06:05:00,125 [JobGenerator] INFO  (Logging.scala:59) - Added
>> jobs for time 1448949900000 ms
>>
>>
>> Thanks
>> LCassa
>>
>
>
>
> --
>
> Paul Leclercq | Data engineer
>
>
>  paul.leclercq@tabmo.io  |  http://www.tabmo.fr/
>

Re: Spark streaming job hangs

Posted by Paul Leclercq <pa...@tabmo.io>.

You might not have enough cores to process data from Kafka


> When running a Spark Streaming program locally, do not use “local” or
> “local[1]” as the master URL. Either of these means that only one thread
> will be used for running tasks locally. If you are using a input DStream
> based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single
> thread will be used to run the receiver, leaving no thread for processing
> the received data. *Hence, when running locally, always use “local[n]” as
> the master URL, *where n > number of receivers to run (see Spark
> Properties for information on how to set the master).*


 https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers
<https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers>

2015-12-01 7:13 GMT+01:00 Cassa L <lc...@gmail.com>:

> Hi,
>  I am reading data from Kafka into spark. It runs fine for sometime but
> then hangs forever with following output. I don't see and errors in logs.
> How do I debug this?
>
> 2015-12-01 06:04:30,697 [dag-scheduler-event-loop] INFO
> (Logging.scala:59) - Adding task set 19.0 with 4 tasks
> 2015-12-01 06:04:30,872 [pool-13-thread-1] INFO  (Logging.scala:59) -
> Disconnected from Cassandra cluster: APG DEV Cluster
> 2015-12-01 06:04:35,060 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949875000 ms
> 2015-12-01 06:04:40,054 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949880000 ms
> 2015-12-01 06:04:45,034 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949885000 ms
> 2015-12-01 06:04:50,100 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949890000 ms
> 2015-12-01 06:04:55,064 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949895000 ms
> 2015-12-01 06:05:00,125 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949900000 ms
>
>
> Thanks
> LCassa
>



-- 

Paul Leclercq | Data engineer


 paul.leclercq@tabmo.io  |  http://www.tabmo.fr/

Re: Spark streaming job hangs

Posted by Paul Leclercq <pa...@tabmo.io>.

You might not have enough cores to process data from Kafka


> When running a Spark Streaming program locally, do not use “local” or
> “local[1]” as the master URL. Either of these means that only one thread
> will be used for running tasks locally. If you are using a input DStream
> based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single
> thread will be used to run the receiver, leaving no thread for processing
> the received data. *Hence, when running locally, always use “local[n]” as
> the master URL, *where n > number of receivers to run (see Spark
> Properties for information on how to set the master).*


 https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers
<https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers>

2015-12-01 7:13 GMT+01:00 Cassa L <lc...@gmail.com>:

> Hi,
>  I am reading data from Kafka into spark. It runs fine for sometime but
> then hangs forever with following output. I don't see and errors in logs.
> How do I debug this?
>
> 2015-12-01 06:04:30,697 [dag-scheduler-event-loop] INFO
> (Logging.scala:59) - Adding task set 19.0 with 4 tasks
> 2015-12-01 06:04:30,872 [pool-13-thread-1] INFO  (Logging.scala:59) -
> Disconnected from Cassandra cluster: APG DEV Cluster
> 2015-12-01 06:04:35,060 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949875000 ms
> 2015-12-01 06:04:40,054 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949880000 ms
> 2015-12-01 06:04:45,034 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949885000 ms
> 2015-12-01 06:04:50,100 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949890000 ms
> 2015-12-01 06:04:55,064 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949895000 ms
> 2015-12-01 06:05:00,125 [JobGenerator] INFO  (Logging.scala:59) - Added
> jobs for time 1448949900000 ms
>
>
> Thanks
> LCassa
>



-- 

Paul Leclercq | Data engineer


 paul.leclercq@tabmo.io  |  http://www.tabmo.fr/