You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Wellington Chevreuil (Jira)" <ji...@apache.org> on 2022/09/23 10:02:00 UTC

[jira] [Resolved] (HBASE-27386) Use encoded size for calculating compression ratio in block size predicator

     [ https://issues.apache.org/jira/browse/HBASE-27386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wellington Chevreuil resolved HBASE-27386.
------------------------------------------
    Resolution: Fixed

Merged into master and branch-2. Thanks for reviewing, [~ankit.singhal] !

> Use encoded size for calculating compression ratio in block size predicator
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-27386
>                 URL: https://issues.apache.org/jira/browse/HBASE-27386
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha-3
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>             Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> In HBASE-27264 we had introduced the notion of block size predicators to define hfile block boundaries when writing a new hfile, and provided the
> PreviousBlockCompressionRatePredicator implementation for calculating block sizes based on a compression ratio. It was using the raw data size written to the block so far to calculate the compression ratio, but in the case where encoding is enabled, this could lead to a very high compression ratio and therefore, larger block sizes. We should use the encoded size to calculate compression ratio, instead.
> Here's a example scenario:
> 1) Sample block size when not using the  PreviousBlockCompressionRatePredicator as implemented by HBASE-27264:
> {noformat}
> onDiskSizeWithoutHeader=6613, uncompressedSizeWithoutHeader=32928 {noformat}
> 2) Sample block size when using PreviousBlockCompressionRatePredicator as implemented by HBASE-27264 (uses raw data size to calculate compression rate):
> {noformat}
> onDiskSizeWithoutHeader=126920, uncompressedSizeWithoutHeader=655393
> {noformat}
> 3) Sample block size when using PreviousBlockCompressionRatePredicator with encoded size for calculating compression rate:
> {noformat}
> onDiskSizeWithoutHeader=54299, uncompressedSizeWithoutHeader=328051
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)