You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2019/10/01 23:35:28 UTC

[GitHub] [incubator-pinot] mcvsubbu opened a new issue #4663: Add jitter for segment completion threshold

mcvsubbu opened a new issue #4663: Add jitter for segment completion threshold
URL: https://github.com/apache/incubator-pinot/issues/4663
 
 
   As of now, we have all partitions of a stream topic completing segments at roughly the same time on the consuming servers. This can cause GC issues on the server, since a whole lot of old-gen memory may be released in a short while -- not to mention additional memory used while generating the segment.
   
   It will be nice if we can add some jitter to the completion thresholds, but we need to make sure that the jitter is same across multiple replicas.
   
   For time-jitter, we can compute a segment endtime  as (for example)
      endTime = configuredEndTime - someRandomValue
   
   The random value can be at most (say) 10% of the configured end time, and computed based on the partition number.
   
   The time jitter may not help in cases where auto-tuning is used, because we aim to hit the row-limit (computed by the auto-tuning algorithm) rather than the time limit in the optimal case.
   
   For num rows jitter, it gets a bit more complex. The auto-tuning algorithms can introduce a small variant in the number of rows across partitions (randomness computed on the basis of the partition number). But then in a stream that ingests data at a very high rate, the difference in number of rows may not change anything (e.g. if the server takes a few more seconds to consume the rows). We need to use the ingestion rate in to this equation as well.
   
   It should be possible to indicate to the controller (via segment completion protocol) the ingestion rate computed by the server, so it can be done (in theory). 
   
   Adding jitter in the time component is definitely a good start.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org