You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2022/07/20 13:46:07 UTC

[GitHub] [hbase] bbeaudreault commented on pull request #4635: HBASE-27224 HFile tool statistic sampling produces misleading results

bbeaudreault commented on PR #4635:
URL: https://github.com/apache/hbase/pull/4635#issuecomment-1190308613

   Closing this PR -- going to go in a different direction.
   
   I realized that MutableRangeHistogram's buckets actually are very inaccurate on the first call to `histogram.snapshot()`. The initial bins are configured for very large ranges, and as `snapshot()` is called over time those are resized to fit the actual data based on the distribution at that time. The HistogramImpl.getQuantiles method does some complicated math to estimate the quantiles despite incorrect bins, but the `getCountAtOrBelow` does not. I thought about trying to account for that, but it seems overly complicated for somethign that is used pretty much everywhere.
   
   Instead I'm going to revert to using codahale metrics, fix to use UniformDistribution, and add some supplemental range tracking just for the HFilePrettyPrinter. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org