You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rahul Ravindran <ra...@yahoo.com.INVALID> on 2014/12/26 08:37:01 UTC

Determining regions with low HDFS locality index

Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the regions get moved off that machine to other Region Servers. However, since the new region servers do not have the backing HFiles, data locality for the newly transitioned regions is not great and hence some of our jobs are a lot slower on these regions. Is there an API for me to determine the regions within a RS which are responsible for low HDFS locality, for which I could trigger a compaction to improve locality?
I took a look at HDFSBlocksDistribution from which I can determine the RS with low HDFS locality. But, going from the RS level to the specific region which is responsible, seems harder. I could try to look at the backing hfiles and determine locality using HDFS, but that seems roundabout. Any suggestions?
I am running Hbase 0.94.15 with CDH 4.6
~Rahul. 

Re: Determining regions with low HDFS locality index

Posted by Rahul Ravindran <ra...@yahoo.com.INVALID>.
Thanks for the response Lars.
My question is not related to cluster or master startup as much as in a running cluster. My scenario is more about - in a running cluster, if a machine goes down, regions get moved off the down machine to other machines. Here, locality is impacted. 
I wanted to find a mechanism for me to query and determine the regions which have poor locality from a client and possibly trigger a manual compaction of such regions from the client to improve locality. I found HDFSBlocksDistribution which gives an indication of the region servers with bad locality but not the regions contained in that region server which are responsible. Is there any way to do that?
Thanks,~Rahul.
|   |
|   |   |   |   |   |
| HDFSBlocksDistribution (HBase 0.94.16 API)Methods  Modifier and Type Method and Description void add(HDFSBlocksDistribution otherBlocksDistribution) This will add the distribution from input to this object void  |
|  |
| View on hbase.apache.org | Preview by Yahoo |
|  |
|   |

   

     On Saturday, December 27, 2014 2:13 AM, lars hofhansl <la...@apache.org> wrote:
   

 There should be logic that attempts to restore the regions on the region servers that had them last.
Note that the master can only assign regions to region server that have reported in. For that reason the master waits a bit (4.5s by default) for region servers to report in after a master start before it starts assigning regions.Maybe in your case that time is too short? You can also configure the master to wait for a certain number of region server to report in.
If after you checked that it is still not working, could you file a jira outlining the details and steps to reproduce?

In any case, if the master has to assign the regions to a subset of the region servers it has no choice but to break locality. Then when the remaining region servers sign in in 0.94 there is no logic to maintain locality when the cluster is balanced. In 0.98 the stochastic balancer uses locality as one of its parameters - although I have personally seen issues with that that I still need to investigate.
-- Lars

      From: Rahul Ravindran <ra...@yahoo.com.INVALID>
 To: "user@hbase.apache.org" <us...@hbase.apache.org> 
 Sent: Thursday, December 25, 2014 11:37 PM
 Subject: Determining regions with low HDFS locality index
  
Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the regions get moved off that machine to other Region Servers. However, since the new region servers do not have the backing HFiles, data locality for the newly transitioned regions is not great and hence some of our jobs are a lot slower on these regions. Is there an API for me to determine the regions within a RS which are responsible for low HDFS locality, for which I could trigger a compaction to improve locality?
I took a look at HDFSBlocksDistribution from which I can determine the RS with low HDFS locality. But, going from the RS level to the specific region which is responsible, seems harder. I could try to look at the backing hfiles and determine locality using HDFS, but that seems roundabout. Any suggestions?
I am running Hbase 0.94.15 with CDH 4.6
~Rahul. 



   

Re: Determining regions with low HDFS locality index

Posted by lars hofhansl <la...@apache.org>.
There should be logic that attempts to restore the regions on the region servers that had them last.
Note that the master can only assign regions to region server that have reported in. For that reason the master waits a bit (4.5s by default) for region servers to report in after a master start before it starts assigning regions.Maybe in your case that time is too short? You can also configure the master to wait for a certain number of region server to report in.
If after you checked that it is still not working, could you file a jira outlining the details and steps to reproduce?

In any case, if the master has to assign the regions to a subset of the region servers it has no choice but to break locality. Then when the remaining region servers sign in in 0.94 there is no logic to maintain locality when the cluster is balanced. In 0.98 the stochastic balancer uses locality as one of its parameters - although I have personally seen issues with that that I still need to investigate.
-- Lars

      From: Rahul Ravindran <ra...@yahoo.com.INVALID>
 To: "user@hbase.apache.org" <us...@hbase.apache.org> 
 Sent: Thursday, December 25, 2014 11:37 PM
 Subject: Determining regions with low HDFS locality index
   
Hi,   When an Hbase RS goes down(possibly because of hardware issues etc), the regions get moved off that machine to other Region Servers. However, since the new region servers do not have the backing HFiles, data locality for the newly transitioned regions is not great and hence some of our jobs are a lot slower on these regions. Is there an API for me to determine the regions within a RS which are responsible for low HDFS locality, for which I could trigger a compaction to improve locality?
I took a look at HDFSBlocksDistribution from which I can determine the RS with low HDFS locality. But, going from the RS level to the specific region which is responsible, seems harder. I could try to look at the backing hfiles and determine locality using HDFS, but that seems roundabout. Any suggestions?
I am running Hbase 0.94.15 with CDH 4.6
~Rahul.