You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Thiruvel Thirumoolan (JIRA)" <ji...@apache.org> on 2016/10/26 22:57:58 UTC

[jira] [Commented] (HBASE-16398) optimize HRegion computeHDFSBlocksDistribution

    [ https://issues.apache.org/jira/browse/HBASE-16398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609965#comment-15609965 ] 

Thiruvel Thirumoolan commented on HBASE-16398:
----------------------------------------------

[~aoxiang],

Are you still working on this? Do you mind if I pick this up if you aren't actively working on it? Sorry, I ran into this jira only now. I was on a long break and didn't notice this jira.

We ran into this problem sometime back at Yahoo! as balancer was taking a really long time to just calculate the block distribution. We pushed the changes described here to all our production clusters may be a quarter back and its much better now both in time taken and number of rpcs to the namenode. I can upload the client side code which lets someone calculate the time taken and rpcs and also verify the new and old implementation for correctness. On an avg with the new implementation took half the time of the old one.

> optimize HRegion computeHDFSBlocksDistribution
> ----------------------------------------------
>
>                 Key: HBASE-16398
>                 URL: https://issues.apache.org/jira/browse/HBASE-16398
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16398.patch
>
>
> First i assume there is no reference and link in a region family's directory. 
> Without the patch to computeHDFSBlocksDistribution for a region family, there is 1+2*N rpc call, N is hfile numbers, The first rpc call is to DistributedFileSystem#listStatus to get hfiles, for every hfile there is two rpc call DistributedFileSystem#getFileStatus(path) and then DistributedFileSystem#getFileBlockLocations(status, start, length).
> With the patch to computeHDFSBlocksDistribution for a region family, there is 2 rpc call, they are DistributedFileSystem#getFileStatus(path) and  DistributedFileSystem#listLocatedStatus(final Path p, final PathFilter filter).
> So if there is at least one hfile, with the patch, the rpc call will less.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)