You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Gary Helmling (JIRA)" <ji...@apache.org> on 2016/10/27 20:39:58 UTC
[jira] [Commented] (HBASE-16570) Compute region locality in
parallel at startup
[ https://issues.apache.org/jira/browse/HBASE-16570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613146#comment-15613146 ]
Gary Helmling commented on HBASE-16570:
---------------------------------------
If I'm reading this correctly, this change totally circumvents the block location cache that was added in HBASE-14473, and calls FileSystem.getFileBlockLocations() for every store file every time the balancer runs.
In the LoadBalancer.balanceCluster() implementations (in StochasticLoadBalancer, SimpleLoadBalancer), we create a new Cluster instance.
In Cluster.<init>, we call registerRegion() on every HRegionInfo.
In registerRegion(), we do the following:
{code}
regionLocationFutures.set(regionIndex,
regionFinder.asyncGetBlockDistribution(region));
{code}
Then, back in Cluster.<init> we do a get() on each ListenableFuture in a loop.
So while we are doing the calls to get block locations in parallel with 5 threads, it looks like we're recomputing them every time balanceCluster() is called and not taking advantage of the cache at all. Am I misreading something here? This seems to be a major performance regression for clusters with large numbers of regions/store files.
> Compute region locality in parallel at startup
> ----------------------------------------------
>
> Key: HBASE-16570
> URL: https://issues.apache.org/jira/browse/HBASE-16570
> Project: HBase
> Issue Type: Sub-task
> Reporter: binlijin
> Assignee: binlijin
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16570-master_V1.patch, HBASE-16570-master_V2.patch, HBASE-16570-master_V3.patch, HBASE-16570-master_V4.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)