You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/07/01 19:14:42 UTC

[GitHub] [incubator-pinot] tdunning commented on pull request #7076: Upgrade t-digest to 3.3

tdunning commented on pull request #7076:
URL: https://github.com/apache/incubator-pinot/pull/7076#issuecomment-872488605


   Yes. There is an update.
   
   I think that there is a fundamental problem with the way that the test is
   phrased.
   
   Any digest that has limited memory will ultimately have to lose some
   information and will estimate quantiles. That will lead to errors in sample
   space roughly equal to the separation between adjacent samples. Different
   digests may keep more samples or fewer and thus adjust when this happens
   and they may prioritize samples in different parts of the distribution, but
   there will always be an approximation error when samples are combined.
   
   This means that if the samples are more than 0.02 x max apart, you have the
   risk of error unless the digest is not a digest at all. The max separation
   for N uniform random samples has a 99% percentile value of 4.6 / N. That
   means that you will need 230 samples before you have a 99% chance of
   having no gap bigger than 99%. In turn, that means that your digest will
   need to keep at least 230 samples before coalescing to be reasonably
   assured of passing the test. Some digests might do a bit better than this,
   but not massively. For instance, the t-digest will focus coalescence into a
   fraction of the interval and thus have a probability of failure of 1/2 to
   1/10 of this value.
   
   On the other hand, if you recast the error back into quantile space by
   calculating the empirical CDF of the estimated quantile, you immediately
   have a well-behaved system. Your unavoidable errors will be bounded by 1/N
   and a t-digest that keeps 50 samples or more and limits errors to one
   sample size should have the guarantees that you want.
   
   
   
   On Thu, Jul 1, 2021 at 10:42 AM Xiaotian (Jackie) Jiang <
   ***@***.***> wrote:
   
   > @tdunning <https://github.com/tdunning> Any update on this?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/incubator-pinot/pull/7076#issuecomment-872433461>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAB5E6VIUKHHS5FQAXWXWEDTVSSKDANCNFSM47AL7QEA>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org