You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Tak-Lon (Stephen) Wu (Jira)" <ji...@apache.org> on 2022/11/18 17:46:00 UTC

[jira] [Updated] (HBASE-27495) Improve HFileLinkCleaner to validate back reference links ahead the next traverse

     [ https://issues.apache.org/jira/browse/HBASE-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tak-Lon (Stephen) Wu updated HBASE-27495:
-----------------------------------------
    Description: 
We found a a race in the CleanerChore related to back reference links. When the HFileLinkCleaner runs for a file it can make 2 decisions depending on the file types.
 - HFiles, The cleaner for HFile deletion only checks if the .links-<> directory is present with files. 
 - Back reference links, the cleaner checks if the forward link is still available in the data directory.

The logic and order how the cleaner checks these 2 files matters. When the back reference is checked first it can remove both the reference and the HFile from the archive, however, when it first runs for the HFile then only the back-reference is removed. In this case, the HFile is only deleted in the next iteration of the CleanerChore, and it could be very slow if the list of files are huge in case of using object store.

The goal of this task is to improve traverse of the archived HFile, reusing the list of found back reference files, and immediately apply the checks for the Back reference links.

  was:
We found a a race in the CleanerChore related to back reference links. When the HFileLinkCleaner runs for a file it can make 2 decisions depending on the file types.
 - Hfiles, The cleaner for HFile deletion only checks if the .links-<> directory is present with files. 
 - Back reference links, the cleaner checks if the forward link is still available in the data directory.

The logic and order how the cleaner checks these 2 files matters. When the back reference is checked first it can remove both the reference and the HFile from the archive, however, when it first runs for the HFile then only the back-reference is removed. In this case, the HFile is only deleted in the next iteration of the CleanerChore, and it could be very slow if the list of files are huge in case of using object store.


> Improve HFileLinkCleaner to validate back reference links ahead the next traverse 
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-27495
>                 URL: https://issues.apache.org/jira/browse/HBASE-27495
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.0, 3.0.0-alpha-4, 2.5.2
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>
> We found a a race in the CleanerChore related to back reference links. When the HFileLinkCleaner runs for a file it can make 2 decisions depending on the file types.
>  - HFiles, The cleaner for HFile deletion only checks if the .links-<> directory is present with files. 
>  - Back reference links, the cleaner checks if the forward link is still available in the data directory.
> The logic and order how the cleaner checks these 2 files matters. When the back reference is checked first it can remove both the reference and the HFile from the archive, however, when it first runs for the HFile then only the back-reference is removed. In this case, the HFile is only deleted in the next iteration of the CleanerChore, and it could be very slow if the list of files are huge in case of using object store.
> The goal of this task is to improve traverse of the archived HFile, reusing the list of found back reference files, and immediately apply the checks for the Back reference links.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)