You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Issac Buenrostro (JIRA)" <ji...@apache.org> on 2014/05/30 19:55:02 UTC

[jira] [Commented] (SPARK-1975) Spark streaming with kafka source stuck at runJob at ReceiverTracker.scala:275

    [ https://issues.apache.org/jira/browse/SPARK-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014011#comment-14014011 ] 

Issac Buenrostro commented on SPARK-1975:
-----------------------------------------

In commit https://github.com/apache/spark/commit/04c37b6f749dc2418cc28c89964cdc687dfcbd51 lines in streaming/scheduler/ReceiverTracker.scala were changed. In particular lines 254-257 in startReceivers() seem to wait for the termination of the receivers (which should not finish until the program exits) before returning the resources to the scheduler. To me, this means that if there are less workers than Kafka partitions, the program will never progress?

> Spark streaming with kafka source stuck at runJob at ReceiverTracker.scala:275
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-1975
>                 URL: https://issues.apache.org/jira/browse/SPARK-1975
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Issac Buenrostro
>
> Spark streaming application running on YARN. We have a Kafka topic with 30 partitions. We create 30 Kafka streams each consuming from a single partition.
> Looking at the spark stages, we see the following:
> collect at ReceiverTracker.scala:270 finished in 0.3s
> reduceByKey at ReceiverTracker.scala:270 finished in 3s
> runJob at ReceiverTracker.scala:275 has been running for 12+ minutes, no progress
> map at core.scala:224 (our processing class), has not started
> It seems to me that the ReceiverTracker is intended to run permanently in the background, but the scheduler is waiting for it to finish before scheduling other tasks?



--
This message was sent by Atlassian JIRA
(v6.2#6252)