You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2021/11/04 07:20:52 UTC

[GitHub] [datasketches-cpp] FluorineDog opened a new issue #251: Is it possible to estimate the fraction of a single frequent value?

FluorineDog opened a new issue #251:
URL: https://github.com/apache/datasketches-cpp/issues/251


   After building the kll_sketch with a stream of values, we call `get_quantiles(101)` to get 100 buckets with lower bounds and upper bounds, and estmate their fraction calling the get_ratio for each bound to get closest estimate. 
   
   We found that if there are a frequent value X having 30% fraction, there will be around 30 buckets holding lower_bound=upper_bound=X, which is understandable and we will merge these buckets into one. 
   
   However, I wonder if it's possible to get the estimated count directly from kll_sketch? Doing that will make our life much easier. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] FluorineDog closed issue #251: Is it possible to estimate the fraction of a single frequent value?

Posted by GitBox <gi...@apache.org>.
FluorineDog closed issue #251:
URL: https://github.com/apache/datasketches-cpp/issues/251


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [datasketches-cpp] AlexanderSaydakov commented on issue #251: Is it possible to estimate the fraction of a single frequent value?

Posted by GitBox <gi...@apache.org>.
AlexanderSaydakov commented on issue #251:
URL: https://github.com/apache/datasketches-cpp/issues/251#issuecomment-962243468


   There is no easy way to get a probability distribution mass for a single item currently. And you need to know the item first. If you knew the next item in the rank order, you could do rank(next) - rank(item).
   
   This seems like an attempt to solve a problem with a wrong tool. There is a Frequent Items sketch in the library for that. If you are looking for very heavy items, your sketch can be rather small. For instance, size of 32 is enough to capture items heavier than about 11% of the input.
   
   https://datasketches.apache.org/docs/Frequency/FrequentItemsErrorTable.html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org