You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Peter Somogyi (Jira)" <ji...@apache.org> on 2023/01/25 09:19:00 UTC

[jira] [Commented] (HBASE-27590) Change Iterable to List in CleanerChore

    [ https://issues.apache.org/jira/browse/HBASE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680512#comment-17680512 ] 

Peter Somogyi commented on HBASE-27590:
---------------------------------------

The attached flame-1.html shows that inside the SnapshotFileCache.getUnreferencedFiles most of the time is spent in S3 listing.

> Change Iterable to List in CleanerChore
> ---------------------------------------
>
>                 Key: HBASE-27590
>                 URL: https://issues.apache.org/jira/browse/HBASE-27590
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Peter Somogyi
>            Assignee: Peter Somogyi
>            Priority: Minor
>         Attachments: flame-1.html
>
>
> The HFileCleaners can have low performance on large /archive area when used with slow storage like S3. The snapshot write lock in SnapshotFileCache is held while the file metadata is fetched from S3. Due to this even with multiple cleaner threads only a single cleaner can effectively delete files from the archive.
> File metadata collection is performed before SnapshotHFileCleaner just by changing the passed parameter type in FileCleanerDelegate from Iterable to List.
> Running with the below cleaner configurations I observed that the lock held in SnapshotFileCache went down from 45000ms to 100msĀ  when it was running for 1000 files in a directory. The complete evaluation and deletion for this folder took the same time but since the file metadata fetch from S3 was done outside of the lock the multiple cleaner threads were able to run concurrently.
> {noformat}
> hbase.cleaner.directory.sorting=false
> hbase.cleaner.scan.dir.concurrent.size=0.75
> hbase.regionserver.hfilecleaner.small.thread.count=16
> hbase.regionserver.hfilecleaner.large.thread.count=8
> {noformat}
> The files to evaluate are already passed in a List to CleanerChore.checkAndDeleteFiles but it is converted to an Iterable to run the checks on the configured cleaners.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)