You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Richard Moorhead (JIRA)" <ji...@apache.org> on 2017/11/27 03:14:01 UTC

[jira] [Created] (SPARK-22611) Spark Kinesis ProvisionedThroughputExceededException leads to dropped records

Richard Moorhead created SPARK-22611:
----------------------------------------

             Summary: Spark Kinesis ProvisionedThroughputExceededException leads to dropped records
                 Key: SPARK-22611
                 URL: https://issues.apache.org/jira/browse/SPARK-22611
             Project: Spark
          Issue Type: Bug
          Components: DStreams
    Affects Versions: 2.2.0
            Reporter: Richard Moorhead


Ive loaded a Kinesis stream with a single shard with ~20M records and have created a simple spark streaming application that writes those records to s3. When the streaming interval is set sufficiently wide such that 2MB/s read rates are violated, the receiver's KCL processes throw ProvisionedThroughputExceededExceptions. While these exceptions are expected, the output record counts in s3 do not match the record counts in the Spark Streaming UI and worse, the records never appear to be fetched in future batches. This problem can be mitigated by setting the streaming interval to a narrow window such that batches are small enough that throughput limits arent exceeded but this isnt guaranteed in a production system.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org