You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Richard Moorhead (JIRA)" <ji...@apache.org> on 2017/11/27 03:14:01 UTC
[jira] [Created] (SPARK-22611) Spark Kinesis
ProvisionedThroughputExceededException leads to dropped records
Richard Moorhead created SPARK-22611:
----------------------------------------
Summary: Spark Kinesis ProvisionedThroughputExceededException leads to dropped records
Key: SPARK-22611
URL: https://issues.apache.org/jira/browse/SPARK-22611
Project: Spark
Issue Type: Bug
Components: DStreams
Affects Versions: 2.2.0
Reporter: Richard Moorhead
Ive loaded a Kinesis stream with a single shard with ~20M records and have created a simple spark streaming application that writes those records to s3. When the streaming interval is set sufficiently wide such that 2MB/s read rates are violated, the receiver's KCL processes throw ProvisionedThroughputExceededExceptions. While these exceptions are expected, the output record counts in s3 do not match the record counts in the Spark Streaming UI and worse, the records never appear to be fetched in future batches. This problem can be mitigated by setting the streaming interval to a narrow window such that batches are small enough that throughput limits arent exceeded but this isnt guaranteed in a production system.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org