You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2022/08/01 12:24:00 UTC

[jira] [Commented] (HBASE-27264) Add options to consider compressed size when delimiting blocks during hfile writes

    [ https://issues.apache.org/jira/browse/HBASE-27264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573733#comment-17573733 ] 

Bryan Beaudreault commented on HBASE-27264:
-------------------------------------------

My only thought here is that the existing unified.encoded.blocksize.ratio config is a bit hard to configure, and now we're adding 2 more configs in a similar area. I wonder if there's some sort of simplification we can do here to make it easier on users. Often block encoding and compression go hand-in-hand. Can we have a single unified config for them? Or is there some other easier way to auto-tune these for users, or at least add logs/metrics to make it easier to know what to set it to?

> Add options to consider compressed size when delimiting blocks during hfile writes
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-27264
>                 URL: https://issues.apache.org/jira/browse/HBASE-27264
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> In HBASE-27232 we had modified "hbase.writer.unified.encoded.blocksize.ratio" property soo that it can allow for the encoded size to be considered when delimiting hfiles blocks during writes.
> Here we propose two additional properties,"hbase.block.size.limit.compressed" and  "hbase.block.size.max.compressed" that would allow for consider the compressed size (if compression is in use) for delimiting blocks during hfile writing. When compression is enabled, certain datasets can have very high compression efficiency, so that the default 64KB block size and 10GB max file size can lead to hfiles with very large number of blocks. 
> In this proposal, "hbase.block.size.limit.compressed" is a boolean flag that switches to compressed size for delimiting blocks, and "hbase.block.size.max.compressed" is an int with the limit, in bytes for the compressed block size, in order to avoid very large uncompressed blocks (defaulting to 320KB).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)