You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cody Koeninger (JIRA)" <ji...@apache.org> on 2016/10/15 13:48:20 UTC
[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting

    [ https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578133#comment-15578133 ] 

Cody Koeninger commented on SPARK-17938:
----------------------------------------

There was pretty extensive discussion of this on list, should link or summarize it.

Couple of things here:

 100 is the default minimum rate for pidestimator.  If you're willing to write code, put more logging in to determine why that rate isn't being configured, or hardcode it to a different number. I have successfully adjusted that rate using spark configuration.

The other thing is that if your system takes way longer than 1 second to process 100k records, 100k obviously isn't a reasonable max. Many large batches will be defined during the time that first batch is running, before back pressure is involved at all. Try a lower max.

> Backpressure rate not adjusting
> -------------------------------
>
>                 Key: SPARK-17938
>                 URL: https://issues.apache.org/jira/browse/SPARK-17938
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Samy Dindane
>
> spark-streaming 2.0.1 and spark-streaming-kafka-0-10 version is 2.0.1. Same behavior with 2.0.0 though.
> spark.streaming.kafka.consumer.poll.ms is set to 30000
> spark.streaming.kafka.maxRatePerPartition is set to 100000
> spark.streaming.backpressure.enabled is set to true
> `batchDuration` of the streaming context is set to 1 second.
> I consume a Kafka topic using KafkaUtils.createDirectStream().
> My system can handle 100k records batches, but it'd take more than 1 seconds to process them all. I'd thus expect the backpressure to reduce the number of records that would be fetched in the next batch to keep the processing delay inferior to 1 second.
> Only this does not happen and the rate of the backpressure stays the same: stuck in `100.0`, no matter how the other variables change (processing time, error, etc.).
> Here's a log showing how all these variables change but the chosen rate stays the same: https://gist.github.com/Dinduks/d9fa67fc8a036d3cad8e859c508acdba (I would have attached a file but I don't see how).
> Is this the expected behavior and I am missing something, or is this  a bug?
> I'll gladly help by providing more information or writing code if necessary.
> Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org