You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dibyendu Bhattacharya (JIRA)" <ji...@apache.org> on 2015/06/19 06:16:00 UTC
[jira] [Created] (SPARK-8474) [STREAMING] Kafka DirectStream API
stops receiving messages if collective size of the messages specified in
spark.streaming.kafka.maxRatePerPartition exceeds the default fetch size (
fetch.message.max.bytes) of SimpleConsumer
Dibyendu Bhattacharya created SPARK-8474:
--------------------------------------------
Summary: [STREAMING] Kafka DirectStream API stops receiving messages if collective size of the messages specified in spark.streaming.kafka.maxRatePerPartition exceeds the default fetch size ( fetch.message.max.bytes) of SimpleConsumer
Key: SPARK-8474
URL: https://issues.apache.org/jira/browse/SPARK-8474
Project: Spark
Issue Type: Bug
Components: Streaming
Affects Versions: 1.4.0
Reporter: Dibyendu Bhattacharya
Priority: Critical
The issue is , if in Kafka there are variable size messages ranging from few KB to few hundred KBs , setting the rate limiting by number of messages can leads to potential issue.
let say size of messages in Kafka are such that for default fetch.message.max.bytes limit ONLY 1000 messages can be pulled, whereas I specified the spark.streaming.kafka.maxRatePerPartition limit as say 2000. Now with this settings when Kafka RDD pulls messages for its offset range , it will only pull 1000 messages and can never be able to pull messages till the desired untilOffset and in KafkaRDD it failed in this assert call..
assert(requestOffset == part.untilOffset, errRanOutBeforeEnd(part))
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org