You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@bahir.apache.org by "datasherlock (via GitHub)" <gi...@apache.org> on 2023/02/04 14:33:22 UTC

[GitHub] [bahir] datasherlock commented on pull request #101: [BAHIR-295] Added backpressure & ratelimit support

datasherlock commented on PR #101:
URL: https://github.com/apache/bahir/pull/101#issuecomment-1416768410

   The backpressure implementation seems buggy. My understanding is that the backpressure mechanism will control the input rate but never exceed the `spark.streaming.receiver.maxRate`. But this doesn't seem to be honoured since we're noticing that the receiver input rate breaches the `spark.streaming.receiver.maxRate` every now and then and tends to put a lot of pressure on the pipeline. 
   
   Context - I created a Spark Scala app with 900 receivers, `spark.streaming.receiver.maxRate=1500` and `batchInterval=60s`. My understanding is that the total number of records per batch should not be greater than `900*1500*60 = 81,000,000 records`. But I am noticing that some batches are going as high as 776,732,455 records where the `processing time is >>> batchInterval`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@bahir.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org