You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2019/12/05 05:04:00 UTC
[jira] [Comment Edited] (HBASE-23279) Switch default block encoding
to ROW_INDEX_V1
[ https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981598#comment-16981598 ]
Viraj Jasani edited comment on HBASE-23279 at 12/5/19 5:03 AM:
---------------------------------------------------------------
We have TestDataBlockEncodingTool with 3 tests,
output of test testHFileAllCellsWithTags: (ROW_INDEX_V1 is the only encoding with negative savings)
{code:java}
INFO [Time-limited test] regionserver.DataBlockEncodingTool(329): Starting a throughput benchmark for data block encoding codecs
PREFIX:
Encoding performance: 37.69 MB/s (+/- 1.26 MB/s)
Decoding performance: 51.97 MB/s (+/- 20.91 MB/s)
DIFF:
Encoding performance: 41.34 MB/s (+/- 2.10 MB/s)
Decoding performance: 65.79 MB/s (+/- 5.72 MB/s)
FAST_DIFF:
Encoding performance: 41.28 MB/s (+/- 1.55 MB/s)
Decoding performance: 48.65 MB/s (+/- 8.98 MB/s)
ROW_INDEX_V1:
Encoding performance: 51.68 MB/s (+/- 6.04 MB/s)
Decoding performance: 23.70 MB/s (+/- 6.11 MB/s)
GZ:
Compression performance: 32.56 MB/s (+/- 0.22 MB/s)
Decompression performance: 213.64 MB/s (+/- 38.99 MB/s)
Raw data size:
Raw bytes: 26,364
Key bytes: 11,492 (42.50 %)
Value bytes: 1,352 (5.00 %)
KV infrastructure: 13,520 (50.00 %)
CF overhead: 1,352 (5.00 %)
Total key redundancy: 2,000 (7.40 %)
GZ only size: 5,123
GZ only savings: 80.57 % (5.15 x)
PREFIX
Encoded bytes: 21,023
Key encoding savings: 21.35 % (1.27 x)
Total encoding savings: 20.26 % (1.25 x)
Encoding + GZ size: 4,231
Encoding + GZ savings: 83.95 % (6.23 x)
Encoding with GZ savings: 17.41 % (1.21 x)
DIFF
Encoded bytes: 12,922
Key encoding savings: 53.74 % (2.16 x)
Total encoding savings: 50.99 % (2.04 x)
Encoding + GZ size: 3,824
Encoding + GZ savings: 85.50 % (6.89 x)
Encoding with GZ savings: 25.36 % (1.34 x)
FAST_DIFF
Encoded bytes: 12,924
Key encoding savings: 53.73 % (2.16 x)
Total encoding savings: 50.98 % (2.04 x)
Encoding + GZ size: 3,826
Encoding + GZ savings: 85.49 % (6.89 x)
Encoding with GZ savings: 25.32 % (1.34 x)
ROW_INDEX_V1
Encoded bytes: 29,787
Key encoding savings: -13.69 % (0.88 x)
Total encoding savings: -12.98 % (0.89 x)
Encoding + GZ size: 7,214
Encoding + GZ savings: 72.64 % (3.65 x)
Encoding with GZ savings: -40.82 % (0.71 x)
{code}
was (Author: vjasani):
We have TestDataBlockEncodingTool with 3 tests,
output of test testHFileAllCellsWithTags:
{code:java}
INFO [Time-limited test] regionserver.DataBlockEncodingTool(329): Starting a throughput benchmark for data block encoding codecs
PREFIX:
Encoding performance: 37.69 MB/s (+/- 1.26 MB/s)
Decoding performance: 51.97 MB/s (+/- 20.91 MB/s)
DIFF:
Encoding performance: 41.34 MB/s (+/- 2.10 MB/s)
Decoding performance: 65.79 MB/s (+/- 5.72 MB/s)
FAST_DIFF:
Encoding performance: 41.28 MB/s (+/- 1.55 MB/s)
Decoding performance: 48.65 MB/s (+/- 8.98 MB/s)
ROW_INDEX_V1:
Encoding performance: 51.68 MB/s (+/- 6.04 MB/s)
Decoding performance: 23.70 MB/s (+/- 6.11 MB/s)
GZ:
Compression performance: 32.56 MB/s (+/- 0.22 MB/s)
Decompression performance: 213.64 MB/s (+/- 38.99 MB/s)
Raw data size:
Raw bytes: 26,364
Key bytes: 11,492 (42.50 %)
Value bytes: 1,352 (5.00 %)
KV infrastructure: 13,520 (50.00 %)
CF overhead: 1,352 (5.00 %)
Total key redundancy: 2,000 (7.40 %)
GZ only size: 5,123
GZ only savings: 80.57 % (5.15 x)
PREFIX
Encoded bytes: 21,023
Key encoding savings: 21.35 % (1.27 x)
Total encoding savings: 20.26 % (1.25 x)
Encoding + GZ size: 4,231
Encoding + GZ savings: 83.95 % (6.23 x)
Encoding with GZ savings: 17.41 % (1.21 x)
DIFF
Encoded bytes: 12,922
Key encoding savings: 53.74 % (2.16 x)
Total encoding savings: 50.99 % (2.04 x)
Encoding + GZ size: 3,824
Encoding + GZ savings: 85.50 % (6.89 x)
Encoding with GZ savings: 25.36 % (1.34 x)
FAST_DIFF
Encoded bytes: 12,924
Key encoding savings: 53.73 % (2.16 x)
Total encoding savings: 50.98 % (2.04 x)
Encoding + GZ size: 3,826
Encoding + GZ savings: 85.49 % (6.89 x)
Encoding with GZ savings: 25.32 % (1.34 x)
ROW_INDEX_V1
Encoded bytes: 29,787
Key encoding savings: -13.69 % (0.88 x)
Total encoding savings: -12.98 % (0.89 x)
Encoding + GZ size: 7,214
Encoding + GZ savings: 72.64 % (3.65 x)
Encoding with GZ savings: -40.82 % (0.71 x)
{code}
> Switch default block encoding to ROW_INDEX_V1
> ---------------------------------------------
>
> Key: HBASE-23279
> URL: https://issues.apache.org/jira/browse/HBASE-23279
> Project: HBase
> Issue Type: Wish
> Affects Versions: 3.0.0, 2.3.0
> Reporter: Lars Hofhansl
> Assignee: Viraj Jasani
> Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-23279.master.000.patch, HBASE-23279.master.001.patch, HBASE-23279.master.002.patch, HBASE-23279.master.003.patch
>
>
> Currently we set both block encoding and compression to NONE.
> ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles are slightly larger about 3% or so). I think that would a better default than NONE.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)