You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Riccardo Vincelli (JIRA)" <ji...@apache.org> on 2018/02/01 17:51:00 UTC

[jira] [Commented] (SPARK-19275) Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"

    [ https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348996#comment-16348996 ] 

Riccardo Vincelli commented on SPARK-19275:
-------------------------------------------

Hi, I would like to point out that a timeout could always be a symptom of a long-sitting thread not getting enough cpu time - have a look at the thread dump and be suspicious of threads sitting RUNNABLE assigned to tasks which are not complex at all and usually take little time. Thanks,

> Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-19275
>                 URL: https://issues.apache.org/jira/browse/SPARK-19275
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0
>         Environment: Apache Spark 2.0.0, Kafka 0.10 for Scala 2.11
>            Reporter: Dmitry Ochnev
>            Priority: Major
>
> We have a Spark Streaming application reading records from Kafka 0.10.
> Some tasks are failed because of the following error:
> "java.lang.AssertionError: assertion failed: Failed to get records for (...) after polling for 512"
> The first attempt fails and the second attempt (retry) completes successfully, - this is the pattern that we see for many tasks in our logs. These fails and retries consume resources.
> A similar case with a stack trace are described here:
> https://www.mail-archive.com/user@spark.apache.org/msg56564.html
> https://gist.github.com/SrikanthTati/c2e95c4ac689cd49aab817e24ec42767
> Here is the line from the stack trace where the error is raised:
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
> We tried several values for "spark.streaming.kafka.consumer.poll.ms", - 2, 5, 10, 30 and 60 seconds, but the error appeared in all the cases except the last one. Moreover, increasing the threshold led to increasing total Spark stage duration.
> In other words, increasing "spark.streaming.kafka.consumer.poll.ms" led to fewer task failures but with cost of total stage duration. So, it is bad for performance when processing data streams.
> We have a suspicion that there is a bug in CachedKafkaConsumer (and/or other related classes) which inhibits the reading process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org