You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/02/02 14:14:51 UTC

[GitHub] [incubator-datasketches-java] richardstartin edited a comment on issue #298: Alternative to reservoir sketches

richardstartin edited a comment on issue #298: Alternative to reservoir sketches
URL: https://github.com/apache/incubator-datasketches-java/issues/298#issuecomment-581139532
 
 
   @jmalkin actually the idea behind the algorithms is of generating the number of elements to skip `s` rather than doing so implicitly by `s` failed trials, exactly as you mention, and I had not considered mergeability at all. There having been a previous discussion resulting in using algorithm R for the sake of mergeability gives me second thoughts about preparing a contribution.
   
   @leerho I read Vitter's paper on [reservoir sampling](http://www.cs.umd.edu/~samir/498/vitter.pdf). I wrote up my notes on the paper [here](https://richardstartin.github.io/posts/reservoir-sampling#implementations-and-evaluation-of-algorithms-r-x-and-z), and have implementations [here](https://github.com/richardstartin/reservoir-sampling), which show that the algorithms produce similar samples, and that the overhead of R is significant relative to X and Z when the work to produce the sampled value is cheap. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org