You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Kadir Ozdemir (Jira)" <ji...@apache.org> on 2021/06/04 23:47:00 UTC

[jira] [Created] (HBASE-25972) Single and Multi-version HFiles

Kadir Ozdemir created HBASE-25972:
-------------------------------------

Summary: Single and Multi-version HFiles
Key: HBASE-25972
URL: https://issues.apache.org/jira/browse/HBASE-25972
Project: HBase
Issue Type: Improvement
Reporter: Kadir Ozdemir

HBase stores tables row by row in its files, HFiles. An HFile is composed of blocks. The number of rows stored in a block depends on the row sizes. The number of rows per block gets lower when the rows has more than one version since HBase stores all row versions sequentially in the same HFile after compaction. However, applications (e.g., Phoenix) mostly query the most recent row versions.

Let us assume that the compaction generates two HFiles instead of one. One of these files stores only the most recent cell versions. Let’s call this single-version HFile. The other HFile stores all the previous cell versions. Let’s call this multi-version HFile. The files that are generated by memstore flushes will be of type multi version. The major and minor compaction processes will generate single-version files as well as multi-version files. This means for the queries on the most recent row versions, HBase does not need to look into multi-version HFiles that are older than the latest single-version HFiles.

The blocks of single-version HFiles will be denser than the current HFiles in general and this will improve the query times for most recent row version queries.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)