You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/07 02:58:36 UTC

[GitHub] [spark] hehuiyuan opened a new pull request #23997: Branch 2.4

hehuiyuan opened a new pull request #23997: Branch 2.4
URL: https://github.com/apache/spark/pull/23997
 
 
   ## What changes were proposed in this pull request?
   
   In streaming-programming-guide.md, as follows:
   
   *Setting the max receiving rate* - If the cluster resources is not large enough for the streaming
     application to process data as fast as it is being received, the receivers can be rate limited
     by setting a maximum rate limit in terms of records / sec.
     See the [configuration parameters](configuration.html#spark-streaming)
     `spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition`
     for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure* that
     eliminate the need to set this rate limit, as Spark Streaming automatically figures out the
     rate limits and dynamically adjusts them if the processing conditions change. This backpressure
     can be enabled by setting the [configuration parameter](configuration.html#spark-streaming)
     `spark.streaming.backpressure.enabled` to `true`.
   
   I think we should be more rigorous. The first batch may be processing all the time and can not run normally when the first batch of data is very large for Direct Kafka approach .
   
   Add additional explanation:
   
   In Spark 1.5, we have introduced a feature called *backpressure* that
     eliminate the need to set this rate limit, as Spark Streaming automatically figures out the
     rate limits and dynamically adjusts them if the processing conditions change.Setting this rate limit when  the first batch of data is very large which causes the task not to work properly,
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org