You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2022/07/21 00:06:00 UTC

[jira] [Updated] (HBASE-27225) Add BucketAllocator bucket size statistic logging

     [ https://issues.apache.org/jira/browse/HBASE-27225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Beaudreault updated HBASE-27225:
--------------------------------------
    Labels: patch-available  (was: )
    Status: Patch Available  (was: Open)

[https://github.com/apache/hbase/pull/4637]

This new output is very helpful for getting a glimpse into the block size distribution for a cluster, as well as how many buckets are left for redistributing to new buckets.

One interesting addition is the {{waistedBytes}} – I realized that if you choose block sizes that don't nicely divide the bucketCapacity you end up with left over space in each bucket which can add up to a lot in aggregate. One can use this information to try to choose bucket sizes which better divide into the bucketCapacity to reduce waste.

> Add BucketAllocator bucket size statistic logging
> -------------------------------------------------
>
>                 Key: HBASE-27225
>                 URL: https://issues.apache.org/jira/browse/HBASE-27225
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Major
>              Labels: patch-available
>
> BucketCache places blocks into configurably sized buckets based on the block size. The default sizes aim to provide broad coverage for common block sizes, but should probably be tuned for certain use-cases. However, we provide no way for operators to gain insight into the distribution of buckets.
> There already exists a BucketAllocator#logStatistics method, but it is not called anywhere. I suggest that we hook that up in BucketCache#logStats (which is called periodically by a stats thread). We can go from there.
> Looking at the IndexStatistics used in that method, it looks like a good start. One thing I'd like to add is a count of freeBuckets and completelyFreeBuckets per index. I think this will be useful for indicating how much more wiggle room we have for redistributing buckets among the various block sizes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)