You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Praveen Sripati <pr...@gmail.com> on 2012/02/20 14:03:01 UTC

Re: HBase and Data Locality

Looking at the DefaultLoadBalancer.balance(), the balancing is purely based
on the number of regions hosted per region server and not on the resource
usage. HBASE-57 suggests to use the data locality into consideration when
the regions are assigned to the region server. It would be nice to consider
both the resource usage of the region and the data locality into
consideration, not just purely based on the number of regions in the region
server as implemented currently.

The file to block mapping can be found from the HDFS NameNode, but how to
find out which regions are loaded (# of requests, cpu and memory
perspective) and which are not? I could not see any resource utilization in
the region server pages.

Also, curious if HBASE-57 makes sense, since the major compaction runs
every 24 hrs and the HFiles are all local to the regions after major
compaction. I think that the balancer has to be run manually in HDFS and
there will be a maximum of 24 hrs window between a HDFS balancer execution
and a major compaction during which data locality might be lost.

I am interested in working on this JIRA, but need some help from the HBase
community.

Regards,
Praveen

On Tue, Feb 14, 2012 at 7:34 PM, Mikael Sitruk <mi...@gmail.com>wrote:

> Region allocation is kept in the next restart (
> https://issues.apache.org/jira/browse/HBASE-2896 ). This is also present
> in
> the CDH3 code.
> Nevertheless if you have a server that did not start correctly you will
> have region that will move from it and locality will not remain (even after
> you start the problematic node, since he will get random regions)
> The best solution would be effectivly
> https://issues.apache.org/jira/browse/HBASE-57
>
>
> Mikael.S
>
> On Tue, Feb 14, 2012 at 3:19 PM, Brock Noland <br...@cloudera.com> wrote:
>
> > Hi,
> >
> > On Tue, Feb 14, 2012 at 7:13 AM, Praveen Sripati
> > <pr...@gmail.com> wrote:
> > > Lars blog (1) mentions that data locality for the region servers is
> lost
> > > when HBase cluster is restarted. It's also mentioned at the end that
> work
> > > is going in HBase to assign regions to RS taking data locality into
> > > consideration. The blog entry is 18 months old and so I would like to
> > know
> > > if this has been incorporated into the latest HBase release or data
> > > locality is lost till a compaction is complete.
> >
> > JIRA is down for me, but here is the JIRA:
> >
> > https://issues.apache.org/jira/browse/HBASE-2896
> >
> > I am pretty sure it's been included in the latest HBase release as it's
> in
> > CDH3.
> >
> > Brock
> >
> > --
> > Apache MRUnit - Unit testing MapReduce -
> > http://incubator.apache.org/mrunit/
> >
>
>
>
> --
> Mikael.S
>

Re: HBase and Data Locality

Posted by Stack <st...@duboce.net>.
On Mon, Feb 20, 2012 at 1:09 PM, Stack <st...@duboce.net> wrote:
> On locality, the fb lads are working on a primitive that makes it so
> the hbase dfsclient will tell hdfs where to place blocks.

Here is the hdfs issue: https://issues.apache.org/jira/browse/HDFS-2576
St.Ack

Re: HBase and Data Locality

Posted by Nicolas Spiegelberg <ns...@fb.com>.
>>Its recommended that you run major compactions yourself at down times.
>
>Can we change the `hbase.hregion.majorcompaction` value from 86400000 to
>-1
>along with the required code changes and make a note of it in the
>hbase-default.xml? Also, the hbase.master.loadbalancer.class is not
>specified in the hbase-default.xml. Should I open a JIRA and make those
>two
>changes?

Note that the importance of non-peak major compactions should be mitigated
in 0.92 with multi-threaded compactions.  This config recommendation is
not a general solution but a way to cope with the existing
production-stable feature set.


Re: HBase and Data Locality

Posted by Praveen Sripati <pr...@gmail.com>.
Stack,

> Its recommended that you run major compactions yourself at down times.

Can we change the `hbase.hregion.majorcompaction` value from 86400000 to -1
along with the required code changes and make a note of it in the
hbase-default.xml? Also, the hbase.master.loadbalancer.class is not
specified in the hbase-default.xml. Should I open a JIRA and make those two
changes?

> In 0.92 there is hits per region and this gets reported to the master as
part of ClusterStatus as does memory usage.  This could be factored into a
new balance algorithm.

I just saw the code for the ClusterStatus, HServerLoad and RegionLoad.
First cut I was thinking if we can use only the resource usage (memory,
cpu, # of hits and not block location) to sort the region server in
decreasing order of resource usage and then shed the regions from the top
till the region server with an average usage is hit and then assign the
regions which have been shed to the region server list from the bottom.
This is more or less similar to the DefaultLoadBalancer, but taking the
resource usage into considerations and not on the # of regions.

Question is should memory, cpu and # of hits be given equal weightage or
make it configurable?

> On locality, the fb lads are working on a primitive that makes it so the
hbase dfsclient will tell hdfs where to place blocks.

Is there any JIRA for it?

So using this functionality HBase can specify the location of the 2nd and
the 3rd replica (1st is local) and later use the location of the 2nd and
3rd replica during a region server crash or a region movement. What happens
if the HDFS load balancer is run, the blocks are moved again?

> This primitive that the lads are working on needs to be done I believe
before hbase-57 can be done (properly).

Again the question is should we mix the memory, cpu, # of hits with the
block location and give them configurable weightage or keep the data
locality out for the first cut, since the major compaction (although manual
as per recommendation) will pull all the blocks together anyway.

Thanks,
Praveen

On Tue, Feb 21, 2012 at 2:39 AM, Stack <st...@duboce.net> wrote:

> On Mon, Feb 20, 2012 at 5:03 AM, Praveen Sripati
> <pr...@gmail.com> wrote:
> > It would be nice to consider
> > both the resource usage of the region and the data locality into
> > consideration, not just purely based on the number of regions in the
> region
> > server as implemented currently.
> >
>
> Yes.
>
> > The file to block mapping can be found from the HDFS NameNode, but how to
> > find out which regions are loaded (# of requests, cpu and memory
> > perspective) and which are not? I could not see any resource utilization
> in
> > the region server pages.
> >
>
> In 0.92 there is hits per region and this gets reported to the master
> as part of ClusterStatus as does memory usage.  This could be factored
> into a new balance algorithm.  Could also send over cpu and hardware
> profile for factoring (though much of this is available via JMX --
> either we get these into clusterstatus or master does poll on jmx
> after it sees new server to get server profile)
>
> > Also, curious if HBASE-57 makes sense, since the major compaction runs
> > every 24 hrs
>
> Its recommended that you run major compactions yourself at down times.
>
>  I think that the balancer has to be run manually in HDFS and
> > there will be a maximum of 24 hrs window between a HDFS balancer
> execution
> > and a major compaction during which data locality might be lost.
> >
>
> Yes the hdfs balancer needs to be run manually and yes it knows
> nothing of how hbase has ordered the blocks and will not respect
> region locality when it goes about its business.
>
> I'm sure though I follow the rest of what you are saying above.
>
> On locality, the fb lads are working on a primitive that makes it so
> the hbase dfsclient will tell hdfs where to place blocks.  The favored
> replica locations will be kept up in .META. in a new column When a
> regionserver crashes, or if we want to move a region, we'll move it or
> reopen it on one of the locations that has had region blocks
> replicated to it.  This should help improve the locality story on
> failover/move.
>
> Without this functionality, we're left with the current behavior where
> blocks for regions are scattered and its only per chance you'd have
> good locality opening a region in any location other than the current
> deploy where the gentle waves of compaction having been nudging data
> local.
>
> I dont believe there is an issue for the above yet.  Let me chase the
> lads to file one.
>
> This primitive that the lads are working on needs to be done I believe
> before hbase-57 can be done (properly).  What you reckon Praveen?
>
> St.Ack
>

Re: HBase and Data Locality

Posted by Stack <st...@duboce.net>.
On Mon, Feb 20, 2012 at 5:03 AM, Praveen Sripati
<pr...@gmail.com> wrote:
> It would be nice to consider
> both the resource usage of the region and the data locality into
> consideration, not just purely based on the number of regions in the region
> server as implemented currently.
>

Yes.

> The file to block mapping can be found from the HDFS NameNode, but how to
> find out which regions are loaded (# of requests, cpu and memory
> perspective) and which are not? I could not see any resource utilization in
> the region server pages.
>

In 0.92 there is hits per region and this gets reported to the master
as part of ClusterStatus as does memory usage.  This could be factored
into a new balance algorithm.  Could also send over cpu and hardware
profile for factoring (though much of this is available via JMX --
either we get these into clusterstatus or master does poll on jmx
after it sees new server to get server profile)

> Also, curious if HBASE-57 makes sense, since the major compaction runs
> every 24 hrs

Its recommended that you run major compactions yourself at down times.

 I think that the balancer has to be run manually in HDFS and
> there will be a maximum of 24 hrs window between a HDFS balancer execution
> and a major compaction during which data locality might be lost.
>

Yes the hdfs balancer needs to be run manually and yes it knows
nothing of how hbase has ordered the blocks and will not respect
region locality when it goes about its business.

I'm sure though I follow the rest of what you are saying above.

On locality, the fb lads are working on a primitive that makes it so
the hbase dfsclient will tell hdfs where to place blocks.  The favored
replica locations will be kept up in .META. in a new column When a
regionserver crashes, or if we want to move a region, we'll move it or
reopen it on one of the locations that has had region blocks
replicated to it.  This should help improve the locality story on
failover/move.

Without this functionality, we're left with the current behavior where
blocks for regions are scattered and its only per chance you'd have
good locality opening a region in any location other than the current
deploy where the gentle waves of compaction having been nudging data
local.

I dont believe there is an issue for the above yet.  Let me chase the
lads to file one.

This primitive that the lads are working on needs to be done I believe
before hbase-57 can be done (properly).  What you reckon Praveen?

St.Ack