You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xiaolin Ha (Jira)" <ji...@apache.org> on 2021/11/04 12:46:00 UTC

[jira] [Updated] (HBASE-25322) Redundant Reference file in bottom region of split

     [ https://issues.apache.org/jira/browse/HBASE-25322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiaolin Ha updated HBASE-25322:
-------------------------------
    Priority: Blocker  (was: Major)

> Redundant Reference file in bottom region of split
> --------------------------------------------------
>
>                 Key: HBASE-25322
>                 URL: https://issues.apache.org/jira/browse/HBASE-25322
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Blocker
>
> When we split a region ranges from (,), the bottom region should contain keys of(,split key), and the top region should contain keys of [split key, ).
> Currently, if we do the following operations:
>  # put rowkeys 100,101,102,103,104,105 to a table, and flush the memstore to make a hfile with rowkeys 100,101,102,103,104,105;
>  # put rowkeys 200,201,202,203,204,205 to the table, and flush the memstore to make a hfile with rowkeys 200,201,202,203,204,205;
>  # split the table region, using split key 200;
>  # then the bottom region will has two Reference files, while the top region only has one.
> But we expect the bottom region has only one Reference file as the the top region.
> That's because when generating Reference files in child region,  the bottom region used the `PrivateCellUtil.createLastOnRow(splitRow)` cell to compare to first keys in the hfiles, while the top region used `PrivateCellUtil.createFirstOnRow(splitRow)` cell to compare to last keys in the hfiles.
> `LastOnRow(splitRow)` means the maximum row generated by the split row, while `FirstOnRow(splitRow)` means the minimus row generated by the split row. The split row should be in the top region. And we should use `FirstOnRow(splitRow)` compare to hfile first and last keys in both bottom and top region. 
> Though the redundant Reference file will not be read by the bottom region, the compaction of the redundant Reference file will result in empty file if only this redundant Reference file participates in a compaction.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)