You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2020/05/27 16:17:00 UTC

[jira] [Created] (HBASE-24445) Improve default thread pool size for opening store files

Andrew Kyle Purtell created HBASE-24445:
-------------------------------------------

             Summary: Improve default thread pool size for opening store files
                 Key: HBASE-24445
                 URL: https://issues.apache.org/jira/browse/HBASE-24445
             Project: HBase
          Issue Type: Improvement
            Reporter: Andrew Kyle Purtell


For each store open we create a CompletionService and also create a thread pool for opening and closing store files. See HStore#openStoreFiles and HRegion#getStoreFileOpenAndCloseThreadPool. By default this pool has only one thread. It can be increased with "hbase.hstore.open.and.close.threads.max" but this config value is then divided by number of stores in the region.

"hbase.hstore.open.and.close.threads.max" is also used to size other thread pools for opening and closing the stores themselves, so it's an unfortunate overloading.

We should have a configuration parameter that directly and simply tunes the thread pool size for opening store files. Introduce a new configuration parameter: "hbase.hstore.hfile.open.threads.max" which will define the upper bound for a thread pool shared by the entire store for opening hfiles. The default should be 1 to preserve default behavior.

Once this is done, we could increase this to 2, 4, 8, or more for increased parallelism when opening store files without impact on other activities. The time required to open all storefiles often dominates the total time for bringing a region online. The thread pool will be shut down and eligible for garbage collection once all files are loaded and the store is online.

Number of open threads should scale with the number of stores, so allocating the pool at the store level continues to make sense.

Longer term we might try recursively decomposing the region open task with a fork-join pool such that the opening of store files can be dynamically parallelized in a probably superior way (conjecture pending a real attempt with metrics) . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)