You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2019/12/05 05:04:00 UTC

[jira] [Comment Edited] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1

    [ https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981598#comment-16981598 ] 

Viraj Jasani edited comment on HBASE-23279 at 12/5/19 5:03 AM:
---------------------------------------------------------------

We have TestDataBlockEncodingTool with 3 tests, 

output of test testHFileAllCellsWithTags: (ROW_INDEX_V1 is the only encoding with negative savings)
{code:java}
INFO  [Time-limited test] regionserver.DataBlockEncodingTool(329): Starting a throughput benchmark for data block encoding codecs
PREFIX:
  Encoding performance:           37.69 MB/s (+/- 1.26 MB/s)
  Decoding performance:          51.97 MB/s (+/- 20.91 MB/s)
DIFF:
  Encoding performance:           41.34 MB/s (+/- 2.10 MB/s)
  Decoding performance:           65.79 MB/s (+/- 5.72 MB/s)
FAST_DIFF:
  Encoding performance:           41.28 MB/s (+/- 1.55 MB/s)
  Decoding performance:           48.65 MB/s (+/- 8.98 MB/s)
ROW_INDEX_V1:
  Encoding performance:           51.68 MB/s (+/- 6.04 MB/s)
  Decoding performance:           23.70 MB/s (+/- 6.11 MB/s)
GZ:
  Compression performance:        32.56 MB/s (+/- 0.22 MB/s)
  Decompression performance:    213.64 MB/s (+/- 38.99 MB/s)
Raw data size:
  Raw bytes:                                          26,364
  Key bytes:                                11,492 (42.50 %)
  Value bytes:                                1,352 (5.00 %)
  KV infrastructure:                        13,520 (50.00 %)
  CF overhead:                                1,352 (5.00 %)
  Total key redundancy:                       2,000 (7.40 %)
  GZ only size:                                        5,123
  GZ only savings:                          80.57 % (5.15 x)
PREFIX
  Encoded bytes:                                      21,023
  Key encoding savings:                     21.35 % (1.27 x)
  Total encoding savings:                   20.26 % (1.25 x)
  Encoding + GZ size:                                  4,231
  Encoding + GZ savings:                    83.95 % (6.23 x)
  Encoding with GZ savings:                 17.41 % (1.21 x)
DIFF
  Encoded bytes:                                      12,922
  Key encoding savings:                     53.74 % (2.16 x)
  Total encoding savings:                   50.99 % (2.04 x)
  Encoding + GZ size:                                  3,824
  Encoding + GZ savings:                    85.50 % (6.89 x)
  Encoding with GZ savings:                 25.36 % (1.34 x)
FAST_DIFF
  Encoded bytes:                                      12,924
  Key encoding savings:                     53.73 % (2.16 x)
  Total encoding savings:                   50.98 % (2.04 x)
  Encoding + GZ size:                                  3,826
  Encoding + GZ savings:                    85.49 % (6.89 x)
  Encoding with GZ savings:                 25.32 % (1.34 x)
ROW_INDEX_V1
  Encoded bytes:                                      29,787
  Key encoding savings:                    -13.69 % (0.88 x)
  Total encoding savings:                  -12.98 % (0.89 x)
  Encoding + GZ size:                                  7,214
  Encoding + GZ savings:                    72.64 % (3.65 x)
  Encoding with GZ savings:                -40.82 % (0.71 x)
{code}


was (Author: vjasani):
We have TestDataBlockEncodingTool with 3 tests, 

output of test testHFileAllCellsWithTags:
{code:java}
INFO  [Time-limited test] regionserver.DataBlockEncodingTool(329): Starting a throughput benchmark for data block encoding codecs
PREFIX:
  Encoding performance:           37.69 MB/s (+/- 1.26 MB/s)
  Decoding performance:          51.97 MB/s (+/- 20.91 MB/s)
DIFF:
  Encoding performance:           41.34 MB/s (+/- 2.10 MB/s)
  Decoding performance:           65.79 MB/s (+/- 5.72 MB/s)
FAST_DIFF:
  Encoding performance:           41.28 MB/s (+/- 1.55 MB/s)
  Decoding performance:           48.65 MB/s (+/- 8.98 MB/s)
ROW_INDEX_V1:
  Encoding performance:           51.68 MB/s (+/- 6.04 MB/s)
  Decoding performance:           23.70 MB/s (+/- 6.11 MB/s)
GZ:
  Compression performance:        32.56 MB/s (+/- 0.22 MB/s)
  Decompression performance:    213.64 MB/s (+/- 38.99 MB/s)
Raw data size:
  Raw bytes:                                          26,364
  Key bytes:                                11,492 (42.50 %)
  Value bytes:                                1,352 (5.00 %)
  KV infrastructure:                        13,520 (50.00 %)
  CF overhead:                                1,352 (5.00 %)
  Total key redundancy:                       2,000 (7.40 %)
  GZ only size:                                        5,123
  GZ only savings:                          80.57 % (5.15 x)
PREFIX
  Encoded bytes:                                      21,023
  Key encoding savings:                     21.35 % (1.27 x)
  Total encoding savings:                   20.26 % (1.25 x)
  Encoding + GZ size:                                  4,231
  Encoding + GZ savings:                    83.95 % (6.23 x)
  Encoding with GZ savings:                 17.41 % (1.21 x)
DIFF
  Encoded bytes:                                      12,922
  Key encoding savings:                     53.74 % (2.16 x)
  Total encoding savings:                   50.99 % (2.04 x)
  Encoding + GZ size:                                  3,824
  Encoding + GZ savings:                    85.50 % (6.89 x)
  Encoding with GZ savings:                 25.36 % (1.34 x)
FAST_DIFF
  Encoded bytes:                                      12,924
  Key encoding savings:                     53.73 % (2.16 x)
  Total encoding savings:                   50.98 % (2.04 x)
  Encoding + GZ size:                                  3,826
  Encoding + GZ savings:                    85.49 % (6.89 x)
  Encoding with GZ savings:                 25.32 % (1.34 x)
ROW_INDEX_V1
  Encoded bytes:                                      29,787
  Key encoding savings:                    -13.69 % (0.88 x)
  Total encoding savings:                  -12.98 % (0.89 x)
  Encoding + GZ size:                                  7,214
  Encoding + GZ savings:                    72.64 % (3.65 x)
  Encoding with GZ savings:                -40.82 % (0.71 x)
{code}

> Switch default block encoding to ROW_INDEX_V1
> ---------------------------------------------
>
>                 Key: HBASE-23279
>                 URL: https://issues.apache.org/jira/browse/HBASE-23279
>             Project: HBase
>          Issue Type: Wish
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Lars Hofhansl
>            Assignee: Viraj Jasani
>            Priority: Minor
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: HBASE-23279.master.000.patch, HBASE-23279.master.001.patch, HBASE-23279.master.002.patch, HBASE-23279.master.003.patch
>
>
> Currently we set both block encoding and compression to NONE.
> ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles are slightly larger about 3% or so). I think that would a better default than NONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)