You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/07 02:58:36 UTC
[GitHub] [spark] hehuiyuan opened a new pull request #23997: Branch 2.4
hehuiyuan opened a new pull request #23997: Branch 2.4
URL: https://github.com/apache/spark/pull/23997
## What changes were proposed in this pull request?
In streaming-programming-guide.md, as follows:
*Setting the max receiving rate* - If the cluster resources is not large enough for the streaming
application to process data as fast as it is being received, the receivers can be rate limited
by setting a maximum rate limit in terms of records / sec.
See the [configuration parameters](configuration.html#spark-streaming)
`spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition`
for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure* that
eliminate the need to set this rate limit, as Spark Streaming automatically figures out the
rate limits and dynamically adjusts them if the processing conditions change. This backpressure
can be enabled by setting the [configuration parameter](configuration.html#spark-streaming)
`spark.streaming.backpressure.enabled` to `true`.
I think we should be more rigorous. The first batch may be processing all the time and can not run normally when the first batch of data is very large for Direct Kafka approach .
Add additional explanation:
In Spark 1.5, we have introduced a feature called *backpressure* that
eliminate the need to set this rate limit, as Spark Streaming automatically figures out the
rate limits and dynamically adjusts them if the processing conditions change.Setting this rate limit when the first batch of data is very large which causes the task not to work properly,
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org