You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/03/25 02:07:00 UTC

[jira] [Resolved] (HBASE-26878) TableInputFormatBase should cache RegionSizeCalculator

     [ https://issues.apache.org/jira/browse/HBASE-26878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Kyle Purtell resolved HBASE-26878.
-----------------------------------------
    Fix Version/s: 2.5.0
                   2.6.0
                   3.0.0-alpha-3
                   2.4.12
     Hadoop Flags: Reviewed
       Resolution: Fixed

> TableInputFormatBase should cache RegionSizeCalculator
> ------------------------------------------------------
>
>                 Key: HBASE-26878
>                 URL: https://issues.apache.org/jira/browse/HBASE-26878
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Minor
>             Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> TableInputFormatBase's getSplits() method instantiates a new RegionSizeCalculator every time. Instantiating a RegionSizeCalculator involves scanning for all regionlocations for a given table in meta. This can be costly for large tables, and we don't know how often a subclass will call getSplits().
> When initializeTable is called, we already cache the RegionLocator and Admin that are used for passing into the RegionSizeCalculator. We should similarly cache the RegionSizeCalculator itself at that same time to avoid unnecessary meta scans on repeat getSplits() calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)