You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/14 09:06:40 UTC

[GitHub] [druid] yuanlihan edited a comment on issue #11256: Reduce method invocation of reservoir sampling

yuanlihan edited a comment on issue #11256:
URL: https://github.com/apache/druid/issues/11256#issuecomment-841112760


   > > Adding a new Reservoir Sampling method to sample K elements each time instead of only one element each time.
   > 
   > I'm not sure how this can improve the performance. Does the new sampling method need to loop to sample K segments anyway? I guess I'm probably missing something. Would you please add more details on the proposed changes?
   
   Thanks for having a look at this. The default implementation samples only one segment in an iteration of all segments. Let's assume that:
   
   - the list of server holders contains 1 million segments
   - 1000 segments need to be picked up from these server holders
   
   Then the current implementation needs to call the sampling method 1000 times and each time needs to iterate 1 million segments. 
   I found that the Reservoir Sampling actually can sample K elements a single pass over the items. So in this case, the new method can sample 1000 segments in a method invocation, which means it'll iterate 1 million segments only once.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org