You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Yu Li (Jira)" <ji...@apache.org> on 2020/04/08 01:31:00 UTC

[jira] [Resolved] (HBASE-16393) Improve computeHDFSBlocksDistribution

     [ https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yu Li resolved HBASE-16393.
---------------------------
    Resolution: Implemented

Closing since all sub-tasks resolved.

> Improve computeHDFSBlocksDistribution
> -------------------------------------
>
>                 Key: HBASE-16393
>                 URL: https://issues.apache.org/jira/browse/HBASE-16393
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Lijin Bin
>            Assignee: Lijin Bin
>            Priority: Major
>         Attachments: HBASE-16393.patch
>
>
> With our cluster is big, i can see the balancer is slow from time to time. And the balancer will be called on master startup, so we can see the startup is slow also. 
> The first thing i think whether if we can parallel compute different region's HDFSBlocksDistribution. 
> The second i think we can improve compute single region's HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call FileSystem#getFileStatus(path) and then FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc call for every storefile. Instead we can use FileSystem#listLocatedStatus to get a LocatedFileStatus for the information we need, so reduce the namenode rpc call to one. This can speed the computeHDFSBlocksDistribution, but also send out less rpc call to namenode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)