You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2021/10/07 02:21:00 UTC

[jira] [Comment Edited] (HBASE-26316) Per-table or per-CF compression codec setting overrides

    [ https://issues.apache.org/jira/browse/HBASE-26316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425313#comment-17425313 ] 

Andrew Kyle Purtell edited comment on HBASE-26316 at 10/7/21, 2:20 AM:
-----------------------------------------------------------------------

Confirmed functionality with a single host cluster:

{noformat}
hbase> create "IntegrationTestLoadCommonCrawl", \
    { NAME => 'c', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT => 'ZSTD', BLOCKSIZE => 131072 }, \
    { NAME => 'i', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT => 'ZSTD', BLOCKSIZE => 8192 }, \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '1' }
{noformat}

Loaded one WARC from common crawl.

Major compaction takes 11 seconds.

{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '10' }
{noformat}

Major compaction now takes 42 seconds. Total size on disk reduced by 11.6%.

{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '22' }
{noformat}

Major compaction now takes 17 minutes 15 seconds. Total size on disk reduced by 13.3% vs level 1. (Sure, this level is crazy in practice.) 


was (Author: apurtell):
Confirmed functionality with a single host cluster:

{noformat}
hbase> create "IntegrationTestLoadCommonCrawl", \
    { NAME => 'c', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT => 'ZSTD', BLOCKSIZE => 131072 }, \
    { NAME => 'i', VERSIONS => 1000, COMPRESSION => 'LZ4', COMPRESSION_COMPACT => 'ZSTD', BLOCKSIZE => 8192 }, \
    CONFIGURATION => { 'hbase.io.compress.zstd.level' => '1' }
{noformat}

Loaded one WARC from common crawl.

Major compaction takes 11 seconds.

{noformat}
hbase> alter "IntegrationTestLoadCommonCrawl", \
    CONFIGURATION => {'hbase.io.compress.zstd.level' => '10' }
{noformat}

Major compaction now takes 42 seconds. Total size on disk reduced by 11.6%.

> Per-table or per-CF compression codec setting overrides
> -------------------------------------------------------
>
>                 Key: HBASE-26316
>                 URL: https://issues.apache.org/jira/browse/HBASE-26316
>             Project: HBase
>          Issue Type: Sub-task
>          Components: HFile, Operability
>    Affects Versions: 2.5.0, 3.0.0-alpha-2
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Minor
>             Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> This won't work as expected today...
> {noformat}
> hbase> create 'sometable', \
>   { NAME => 'somefamily', VERSIONS => 1000, COMPRESSION => 'ZSTD' }, \
>   CONFIGURATION => { 'hbase.io.compress.zstd.level' => '9' }
> {noformat}
> ... but it should. We get and retain Compressor instances in HFileBlockDefaultEncodingContext, and could in theory call Compressor#reinit when setting up the context, to update compression parameters like compression level and buffer size per the ambient configuration, but we do not plumb through the CompoundConfiguration from the Store into HFileBlockDefaultEncodingContext. Instead can only update codec parameters globally in system site conf files.
> This is actually pretty important for algorithms like ZSTD, which offers more than 20 different compression levels, where at level 1 it is almost as fast at compression as LZ4, and where at levels > 19 it utilizes computationally expensive techniques to rival LZMA at compression ratio (and poor compression speed). It is very likely that the ZSTD level you'd want to employ for a given table's data will vary by use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)