You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2021/07/29 19:50:19 UTC

[GitHub] [datasketches-java] jmalkin commented on issue #357: Performance comparison between DoubleSketch and QuantileSketch

jmalkin commented on issue #357:
URL: https://github.com/apache/datasketches-java/issues/357#issuecomment-889413730


   Briefly skimming the DD Sketch paper, I agree with @AlexanderSaydakov that it's definitely not based on the Mergeable Summaries paper. But that's also not the rank-error quantiles sketch we recommend at this point: KLL provides superior performance at a smaller sketch size.
   
   Based on the error guarantees, it seems like our REQ (Relative Error Quantiles) sketch is a more appropriate comparison. But there are enough differences to make that comparison not entirely straightforward.
   
   DDSketch is defined only for strictly positive values: REQ is defined over any arbitrary domain. You can use 2 DDSketches to handle negatives, but that still seems to exclude 0, which can be a non-trivial portion of probability mass for some queries.
   
   REQ has high accuracy on either the high or low end, and is fully symmetric in performance if you switch between them. Looking at the binning approach of DDSketch, (and I could be wrong here) I think its accuracy guarantee is only what we'd count as high-end accuracy. (Log transform + a second sketch for negative values, etc, perhaps?)
   
   DDSketch's bucketing approach allows for deletions, which REQ does not.
   
   And I believe REQ provides a significantly wider range of possible queries.
   
   There are ultimately enough differences that finding a comparison that doesn't play to the strengths of one based on a specific problem structure is itself something of a challenge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org