You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 03:59:25 UTC

[jira] [Updated] (SPARK-22611) Spark Kinesis ProvisionedThroughputExceededException leads to dropped records

     [ https://issues.apache.org/jira/browse/SPARK-22611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-22611:
---------------------------------
    Labels: bulk-closed  (was: )

> Spark Kinesis ProvisionedThroughputExceededException leads to dropped records
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-22611
>                 URL: https://issues.apache.org/jira/browse/SPARK-22611
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.2.0
>            Reporter: Richard Moorhead
>            Priority: Major
>              Labels: bulk-closed
>
> Ive loaded a Kinesis stream with a single shard with ~20M records and have created a simple spark streaming application that writes those records to s3. When the streaming interval is set sufficiently wide such that 2MB/s read rates are violated, the receiver's KCL processes throw ProvisionedThroughputExceededExceptions. While these exceptions are expected, the output record counts in s3 do not match the record counts in the Spark Streaming UI and worse, the records never appear to be fetched in future batches. This problem can be mitigated by setting the streaming interval to a narrow window such that batches are small enough that throughput limits arent exceeded but this isnt guaranteed in a production system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org