You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/01/31 19:50:27 UTC

[GitHub] [incubator-datasketches-java] jmalkin commented on issue #298: Alternative to reservoir sketches

jmalkin commented on issue #298: Alternative to reservoir sketches
URL: https://github.com/apache/incubator-datasketches-java/issues/298#issuecomment-580884961
 
 
   We're always interested in improved algorithms. The key question is whether the alternatives are properly mergeable, especially across different reservoir sizes.
   
   I know there are some stand-alone improvements that could be made, for instance (from memory discussing this several year sago) you can pre-compute how many items you'll skip before the next acceptance and avoid calling the random number generator each time. But validly merging the sampling state in that case is ... hard.
   
   We know that merging only preserves first-order probabilities (there's a uniform likelihood of inclusion for each item, but there can be correlations between item pairs). Since there isn't a (known?) way around that, that's ok. But mergeability is considered a key property for the library. Error bounds, too, but if the rest of the algo is valid that shouldn't matter here.
   
   So before writing new code, can we check that the alternatives can be merged? If yes, we can continue the discussion on how best to validate that and get it included. Replace vs complement comes down to things like whether there's a compatible way to migrate previously serialized binary images.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org