You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Pankaj Kumar (Jira)" <ji...@apache.org> on 2021/10/04 18:14:00 UTC

[jira] [Commented] (HBASE-26320) Separate Log Cleaner DirScanPool to prevent the OLDWALs from filling up the disk when archive is large

    [ https://issues.apache.org/jira/browse/HBASE-26320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424087#comment-17424087 ] 

Pankaj Kumar commented on HBASE-26320:
--------------------------------------

Thanks [~zyork] for bringing this, recently we faced this problem in one of our production environment where archived and old WAL files size reaches to several TBs due to slow directory scan, we had to modify multiple configs to speed up the cleaning.

> Separate Log Cleaner DirScanPool to prevent the OLDWALs from filling up the disk when archive is large
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26320
>                 URL: https://issues.apache.org/jira/browse/HBASE-26320
>             Project: HBase
>          Issue Type: Improvement
>          Components: Operability
>    Affects Versions: 1.7.1, 2.4.6
>            Reporter: Zach York
>            Assignee: Zach York
>            Priority: Major
>
> We currently share the DirScanPool (threadpool for scanning for files to delete in the OldLogs and archive directories) between the LogCleaner and HFileCleaner. This means that if the archive directory is large/has lots of files/directories, the threads can get stuck scanning through the archive directory, starving the LogCleaner. This is especially apparent on S3 where list can be slower than on HDFS.
> This JIRA creates separate DirScanPools for the LogCleaner and HFileCleaner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)