You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Riccardo Vincelli (JIRA)" <ji...@apache.org> on 2018/02/01 17:51:00 UTC
[jira] [Commented] (SPARK-19275) Spark Streaming, Kafka receiver,
"Failed to get records for ... after polling for 512"
[ https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348996#comment-16348996 ]
Riccardo Vincelli commented on SPARK-19275:
-------------------------------------------
Hi, I would like to point out that a timeout could always be a symptom of a long-sitting thread not getting enough cpu time - have a look at the thread dump and be suspicious of threads sitting RUNNABLE assigned to tasks which are not complex at all and usually take little time. Thanks,
> Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"
> --------------------------------------------------------------------------------------
>
> Key: SPARK-19275
> URL: https://issues.apache.org/jira/browse/SPARK-19275
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 2.0.0
> Environment: Apache Spark 2.0.0, Kafka 0.10 for Scala 2.11
> Reporter: Dmitry Ochnev
> Priority: Major
>
> We have a Spark Streaming application reading records from Kafka 0.10.
> Some tasks are failed because of the following error:
> "java.lang.AssertionError: assertion failed: Failed to get records for (...) after polling for 512"
> The first attempt fails and the second attempt (retry) completes successfully, - this is the pattern that we see for many tasks in our logs. These fails and retries consume resources.
> A similar case with a stack trace are described here:
> https://www.mail-archive.com/user@spark.apache.org/msg56564.html
> https://gist.github.com/SrikanthTati/c2e95c4ac689cd49aab817e24ec42767
> Here is the line from the stack trace where the error is raised:
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
> We tried several values for "spark.streaming.kafka.consumer.poll.ms", - 2, 5, 10, 30 and 60 seconds, but the error appeared in all the cases except the last one. Moreover, increasing the threshold led to increasing total Spark stage duration.
> In other words, increasing "spark.streaming.kafka.consumer.poll.ms" led to fewer task failures but with cost of total stage duration. So, it is bad for performance when processing data streams.
> We have a suspicion that there is a bug in CachedKafkaConsumer (and/or other related classes) which inhibits the reading process.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org