You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Wayne <wa...@gmail.com> on 2011/02/02 20:51:12 UTC

Region Balancing

I know there were some changes in .90 in terms of how region balancing
occurs. Is there a resource somewhere that describes the options for the
configuration? Per Jonathan Gray's recommendation we are trying to keep our
region count down to 100 per region server (we are up to 5gb region size).
This has caused us to get a lot of hot spots relative to what we saw with
1000 regions per node. Scanning through #3373 it sounded like there is a
round-robin load balancer we can turn on instead of keeping regions pinned
as I believe they are in .90 by default. We have many tables and certain
region servers seem to get "stuck" with a single table's data in almost
contiguous ranges which really throws a wrench in our load balancing.

Thanks.

Re: Region Balancing

Posted by Wayne <wa...@gmail.com>.
hbase.master.startup.retainassign=false works like a charm. After a restart
all tables are scattered across all region servers.

Thanks!

On Wed, Feb 2, 2011 at 4:06 PM, Stack <st...@duboce.net> wrote:

> On Wed, Feb 2, 2011 at 8:41 PM, Wayne <wa...@gmail.com> wrote:
> > The regions counts are the same per region server which is good. My
> problem
> > is that I have 5 tables and several region servers only serve 1 table's
> > regions.
>
> I wonder if this an effect of our deploying splits to same server as
> split parent?  Once the phenomeon goes into effect, we'll not break it
> that I can see (on restart, we try our best in 0.90.0 to redeploy
> regions to where they were running pre-shutdown so we don't lose
> locality).
>
> > I would like to round robin and scatter all tables across all
> > region servers. Basically the distribution is not round-robin enough.
> > Manually moving it is not going to help me. Frankly this goes against the
> > concept of bigger/less regions. Given what I am seeing without an
> > alternative I will reduce the max size of the regions and once I get into
> > the 100s of regions per region server this problem will be resolved. Less
> > regions is dangerous in terms of avoiding hot spots.
> >
> > Is there a way to turn off the memory across restarts of where a region
> > lives? This might help re-balance from scratch.
> >
>
> From AssignmentManager:
>
> 1226     // Determine what type of assignment to do on startup
> 1227     boolean retainAssignment = master.getConfiguration().
> 1228       getBoolean("hbase.master.startup.retainassign", true);
>
> It looks like you could set the above flag in your hbase-site.xml to
> false and that should do it (its on by default).
>
> You could knock a few of your regionservers out of the cluster.  Wait
> till regions assigned elsewhere, then bring them back up again.  Force
> running of assignment.  That might mess stuff up enough?
>
> St.Ack
>

Re: Region Balancing

Posted by Sebastian Bauer <ad...@ugame.net.pl>.
I have small question, is there any method to get region 
numberOfRequests, master server have 
server.getLoad().getNumberOfRequests() but i cannot find any similar to 
region?



On 02.02.2011 22:06, Stack wrote:
> On Wed, Feb 2, 2011 at 8:41 PM, Wayne<wa...@gmail.com>  wrote:
>> The regions counts are the same per region server which is good. My problem
>> is that I have 5 tables and several region servers only serve 1 table's
>> regions.
> I wonder if this an effect of our deploying splits to same server as
> split parent?  Once the phenomeon goes into effect, we'll not break it
> that I can see (on restart, we try our best in 0.90.0 to redeploy
> regions to where they were running pre-shutdown so we don't lose
> locality).
>
>> I would like to round robin and scatter all tables across all
>> region servers. Basically the distribution is not round-robin enough.
>> Manually moving it is not going to help me. Frankly this goes against the
>> concept of bigger/less regions. Given what I am seeing without an
>> alternative I will reduce the max size of the regions and once I get into
>> the 100s of regions per region server this problem will be resolved. Less
>> regions is dangerous in terms of avoiding hot spots.
>>
>> Is there a way to turn off the memory across restarts of where a region
>> lives? This might help re-balance from scratch.
>>
> > From AssignmentManager:
>
> 1226     // Determine what type of assignment to do on startup
> 1227     boolean retainAssignment = master.getConfiguration().
> 1228       getBoolean("hbase.master.startup.retainassign", true);
>
> It looks like you could set the above flag in your hbase-site.xml to
> false and that should do it (its on by default).
>
> You could knock a few of your regionservers out of the cluster.  Wait
> till regions assigned elsewhere, then bring them back up again.  Force
> running of assignment.  That might mess stuff up enough?
>
> St.Ack
>


-- 

Pozdrawiam
Sebastian Bauer
-----------------------------------------------------
http://tikecik.pl


RE: doing a scan that will return random columns in a table's family

Posted by Peter Haidinyak <ph...@local.com>.
Thanks 

-----Original Message-----
From: Jonathan Gray [mailto:jgray@fb.com] 
Sent: Thursday, February 03, 2011 10:19 AM
To: user@hbase.apache.org
Subject: RE: doing a scan that will return random columns in a table's family

Result is just the client-side class which wraps whatever the server returns.  The ability to do this query is not really about whether Result has the methods to get at this data, but rather whether Scan supports this type of query (it does).

Scan.addFamily(family) will make it so that every column in the specified family is returned.

The Result that gets returned can be used in many different ways, including those described by David.  However, I generally recommend using Result.raw() and just operating directly on the KeyValues that are returned.  There is a non-zero cost to generating the maps via the convenience methods in Result.  Some of the more recent javadoc is a bit more explicit about that.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Thursday, February 03, 2011 8:59 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> Thanks
> 
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Thursday, February 03, 2011 8:53 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> By default that is what you get.  You do have to navigate through the results:
> 
> Result.getFamilyMap():
> 
> /**
>    * Map of qualifiers to values.
>    * <p>
>    * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
>    * @param family column family to get
>    * @return map of qualifiers to values
>    */
>   public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)
> 
> 
> Or if you don't know the family name even, use the Result.getMap() function
> 
> /**
>    * Map of families to all versions of its qualifiers and values.
>    * <p>
>    * Returns a three level Map of the form:
>    *
> <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
>    * <p>
>    * Note: All other map returning methods make use of this map internally.
>    * @return map from families to qualifiers to versions
>    */
>   public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long,
> byte[]>>> getMap()
> 
> -----Original Message-----
> From: Pete Haidinyak [mailto:javamann@cox.net]
> Sent: Thursday, February 03, 2011 12:01 AM
> To: user@hbase.apache.org
> Subject: doing a scan that will return random columns in a table's family
> 
> Hi,
> 	If I have a table:family where I add new columns with computer
> generated column name (I won't know what they are to add them to a scan)
> is it possible to do a scan that returns every column in a row?
> 
> Thanks
> 
> -Pete


RE: doing a scan that will return random columns in a table's family

Posted by Jonathan Gray <jg...@fb.com>.
Result is just the client-side class which wraps whatever the server returns.  The ability to do this query is not really about whether Result has the methods to get at this data, but rather whether Scan supports this type of query (it does).

Scan.addFamily(family) will make it so that every column in the specified family is returned.

The Result that gets returned can be used in many different ways, including those described by David.  However, I generally recommend using Result.raw() and just operating directly on the KeyValues that are returned.  There is a non-zero cost to generating the maps via the convenience methods in Result.  Some of the more recent javadoc is a bit more explicit about that.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Thursday, February 03, 2011 8:59 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> Thanks
> 
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Thursday, February 03, 2011 8:53 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> By default that is what you get.  You do have to navigate through the results:
> 
> Result.getFamilyMap():
> 
> /**
>    * Map of qualifiers to values.
>    * <p>
>    * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
>    * @param family column family to get
>    * @return map of qualifiers to values
>    */
>   public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)
> 
> 
> Or if you don't know the family name even, use the Result.getMap() function
> 
> /**
>    * Map of families to all versions of its qualifiers and values.
>    * <p>
>    * Returns a three level Map of the form:
>    *
> <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
>    * <p>
>    * Note: All other map returning methods make use of this map internally.
>    * @return map from families to qualifiers to versions
>    */
>   public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long,
> byte[]>>> getMap()
> 
> -----Original Message-----
> From: Pete Haidinyak [mailto:javamann@cox.net]
> Sent: Thursday, February 03, 2011 12:01 AM
> To: user@hbase.apache.org
> Subject: doing a scan that will return random columns in a table's family
> 
> Hi,
> 	If I have a table:family where I add new columns with computer
> generated column name (I won't know what they are to add them to a scan)
> is it possible to do a scan that returns every column in a row?
> 
> Thanks
> 
> -Pete


RE: doing a scan that will return random columns in a table's family

Posted by Peter Haidinyak <ph...@local.com>.
Thanks

-----Original Message-----
From: Buttler, David [mailto:buttler1@llnl.gov] 
Sent: Thursday, February 03, 2011 8:53 AM
To: user@hbase.apache.org
Subject: RE: doing a scan that will return random columns in a table's family

By default that is what you get.  You do have to navigate through the results:

Result.getFamilyMap():

/**
   * Map of qualifiers to values.
   * <p>
   * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
   * @param family column family to get
   * @return map of qualifiers to values
   */
  public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)


Or if you don't know the family name even, use the Result.getMap() function

/**
   * Map of families to all versions of its qualifiers and values.
   * <p>
   * Returns a three level Map of the form:
   * <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
   * <p>
   * Note: All other map returning methods make use of this map internally.
   * @return map from families to qualifiers to versions
   */
  public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long, byte[]>>> getMap()

-----Original Message-----
From: Pete Haidinyak [mailto:javamann@cox.net] 
Sent: Thursday, February 03, 2011 12:01 AM
To: user@hbase.apache.org
Subject: doing a scan that will return random columns in a table's family

Hi,
	If I have a table:family where I add new columns with computer generated  
column name (I won't know what they are to add them to a scan) is it  
possible to do a scan that returns every column in a row?

Thanks

-Pete


RE: doing a scan that will return random columns in a table's family

Posted by "Buttler, David" <bu...@llnl.gov>.
By default that is what you get.  You do have to navigate through the results:

Result.getFamilyMap():

/**
   * Map of qualifiers to values.
   * <p>
   * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
   * @param family column family to get
   * @return map of qualifiers to values
   */
  public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)


Or if you don't know the family name even, use the Result.getMap() function

/**
   * Map of families to all versions of its qualifiers and values.
   * <p>
   * Returns a three level Map of the form:
   * <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
   * <p>
   * Note: All other map returning methods make use of this map internally.
   * @return map from families to qualifiers to versions
   */
  public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long, byte[]>>> getMap()

-----Original Message-----
From: Pete Haidinyak [mailto:javamann@cox.net] 
Sent: Thursday, February 03, 2011 12:01 AM
To: user@hbase.apache.org
Subject: doing a scan that will return random columns in a table's family

Hi,
	If I have a table:family where I add new columns with computer generated  
column name (I won't know what they are to add them to a scan) is it  
possible to do a scan that returns every column in a row?

Thanks

-Pete


doing a scan that will return random columns in a table's family

Posted by Pete Haidinyak <ja...@cox.net>.
Hi,
	If I have a table:family where I add new columns with computer generated  
column name (I won't know what they are to add them to a scan) is it  
possible to do a scan that returns every column in a row?

Thanks

-Pete


Re: Keyword schema

Posted by Ted Dunning <td...@maprtech.com>.
A small map-reduce program could do updates to Hbase or if your incremental
data is relatively small, you can do the update one by one.  This can work
fine, but it doesn't really solve the top-100 term problem.  For that, it
may be nice to have an occasional MR program that over-produces a list of
top items.  Then each update can to against your top item list which will be
approximately the right list for the time until the next MR update.

On Wed, Feb 2, 2011 at 11:04 PM, Pete Haidinyak <ja...@cox.net> wrote:

> I will be updating the keywords and their frequency every X minutes so I
> don't believe M/R would work well but I could be wrong. I've been doing this
> approach with other data and have been receiving sub 1 second on my queries
> with two overtaxed servers. I figured this problem has been solved many
> times before and I was just looking for guidance.
>
> Thanks
>
> -Pete
>
>
> On Wed, 02 Feb 2011 16:44:39 -0800, Jean-Daniel Cryans <
> jdcryans@apache.org> wrote:
>
>  I don't think HBase is really needed here, unless you somehow need
>> random read/write to those search queries.
>>
>> J-D
>>
>> On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak <ph...@local.com>
>> wrote:
>>
>>> Hi all,
>>>       I was just tasked to take the keywords used for a search and put
>>> them in HBase so we can slice and dice them. They are interested in standard
>>> stuff like highest frequency word, word pairs, etc.
>>>       I know I'm not the first to do this so does anyone have a
>>> recommendation on how to setup a schema for this sort of task?
>>>
>>> Thanks
>>>
>>> -Pete
>>>
>>>
>

Re: Keyword schema

Posted by Pete Haidinyak <ja...@cox.net>.
I will be updating the keywords and their frequency every X minutes so I  
don't believe M/R would work well but I could be wrong. I've been doing  
this approach with other data and have been receiving sub 1 second on my  
queries with two overtaxed servers. I figured this problem has been solved  
many times before and I was just looking for guidance.

Thanks

-Pete


On Wed, 02 Feb 2011 16:44:39 -0800, Jean-Daniel Cryans  
<jd...@apache.org> wrote:

> I don't think HBase is really needed here, unless you somehow need
> random read/write to those search queries.
>
> J-D
>
> On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak <ph...@local.com>  
> wrote:
>> Hi all,
>>        I was just tasked to take the keywords used for a search and put  
>> them in HBase so we can slice and dice them. They are interested in  
>> standard stuff like highest frequency word, word pairs, etc.
>>        I know I'm not the first to do this so does anyone have a  
>> recommendation on how to setup a schema for this sort of task?
>>
>> Thanks
>>
>> -Pete
>>


Re: Keyword schema

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I don't think HBase is really needed here, unless you somehow need
random read/write to those search queries.

J-D

On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak <ph...@local.com> wrote:
> Hi all,
>        I was just tasked to take the keywords used for a search and put them in HBase so we can slice and dice them. They are interested in standard stuff like highest frequency word, word pairs, etc.
>        I know I'm not the first to do this so does anyone have a recommendation on how to setup a schema for this sort of task?
>
> Thanks
>
> -Pete
>

Keyword schema

Posted by Peter Haidinyak <ph...@local.com>.
Hi all,
	I was just tasked to take the keywords used for a search and put them in HBase so we can slice and dice them. They are interested in standard stuff like highest frequency word, word pairs, etc.
	I know I'm not the first to do this so does anyone have a recommendation on how to setup a schema for this sort of task?

Thanks

-Pete

Re: Region Balancing

Posted by Stack <st...@duboce.net>.
On Wed, Feb 2, 2011 at 8:41 PM, Wayne <wa...@gmail.com> wrote:
> The regions counts are the same per region server which is good. My problem
> is that I have 5 tables and several region servers only serve 1 table's
> regions.

I wonder if this an effect of our deploying splits to same server as
split parent?  Once the phenomeon goes into effect, we'll not break it
that I can see (on restart, we try our best in 0.90.0 to redeploy
regions to where they were running pre-shutdown so we don't lose
locality).

> I would like to round robin and scatter all tables across all
> region servers. Basically the distribution is not round-robin enough.
> Manually moving it is not going to help me. Frankly this goes against the
> concept of bigger/less regions. Given what I am seeing without an
> alternative I will reduce the max size of the regions and once I get into
> the 100s of regions per region server this problem will be resolved. Less
> regions is dangerous in terms of avoiding hot spots.
>
> Is there a way to turn off the memory across restarts of where a region
> lives? This might help re-balance from scratch.
>

>From AssignmentManager:

1226     // Determine what type of assignment to do on startup
1227     boolean retainAssignment = master.getConfiguration().
1228       getBoolean("hbase.master.startup.retainassign", true);

It looks like you could set the above flag in your hbase-site.xml to
false and that should do it (its on by default).

You could knock a few of your regionservers out of the cluster.  Wait
till regions assigned elsewhere, then bring them back up again.  Force
running of assignment.  That might mess stuff up enough?

St.Ack

Re: Region Balancing

Posted by Wayne <wa...@gmail.com>.
The regions counts are the same per region server which is good. My problem
is that I have 5 tables and several region servers only serve 1 table's
regions. I would like to round robin and scatter all tables across all
region servers. Basically the distribution is not round-robin enough.
Manually moving it is not going to help me. Frankly this goes against the
concept of bigger/less regions. Given what I am seeing without an
alternative I will reduce the max size of the regions and once I get into
the 100s of regions per region server this problem will be resolved. Less
regions is dangerous in terms of avoiding hot spots.

Is there a way to turn off the memory across restarts of where a region
lives? This might help re-balance from scratch.

Thanks.

On Wed, Feb 2, 2011 at 3:17 PM, Stack <st...@duboce.net> wrote:

> In shell is a move command.  You can also force running of balancer
> (or even turn it off).
>
> For how the balancer works, in short, it runs every 5 minutes by
> default (configurable) and when it runs, using its in-memory notion of
> how the cluster is balanced, creates move plans that are immediately
> acted upon on balance calculation completion.
>
> Current balancer aims to keep the load of regions even across the cluster.
>
> If you would like to know more beyond whats written above, next up
> would be the LoadBalancer classes javadoc:
>
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/master/LoadBalancer.html#42
>  Its not bad.  Check it out.
>
> St.Ack
>
> On Wed, Feb 2, 2011 at 7:51 PM, Wayne <wa...@gmail.com> wrote:
> > I know there were some changes in .90 in terms of how region balancing
> > occurs. Is there a resource somewhere that describes the options for the
> > configuration? Per Jonathan Gray's recommendation we are trying to keep
> our
> > region count down to 100 per region server (we are up to 5gb region
> size).
> > This has caused us to get a lot of hot spots relative to what we saw with
> > 1000 regions per node. Scanning through #3373 it sounded like there is a
> > round-robin load balancer we can turn on instead of keeping regions
> pinned
> > as I believe they are in .90 by default. We have many tables and certain
> > region servers seem to get "stuck" with a single table's data in almost
> > contiguous ranges which really throws a wrench in our load balancing.
> >
> > Thanks.
> >
>

Re: Region Balancing

Posted by Stack <st...@duboce.net>.
In shell is a move command.  You can also force running of balancer
(or even turn it off).

For how the balancer works, in short, it runs every 5 minutes by
default (configurable) and when it runs, using its in-memory notion of
how the cluster is balanced, creates move plans that are immediately
acted upon on balance calculation completion.

Current balancer aims to keep the load of regions even across the cluster.

If you would like to know more beyond whats written above, next up
would be the LoadBalancer classes javadoc:
http://hbase.apache.org/xref/org/apache/hadoop/hbase/master/LoadBalancer.html#42
 Its not bad.  Check it out.

St.Ack

On Wed, Feb 2, 2011 at 7:51 PM, Wayne <wa...@gmail.com> wrote:
> I know there were some changes in .90 in terms of how region balancing
> occurs. Is there a resource somewhere that describes the options for the
> configuration? Per Jonathan Gray's recommendation we are trying to keep our
> region count down to 100 per region server (we are up to 5gb region size).
> This has caused us to get a lot of hot spots relative to what we saw with
> 1000 regions per node. Scanning through #3373 it sounded like there is a
> round-robin load balancer we can turn on instead of keeping regions pinned
> as I believe they are in .90 by default. We have many tables and certain
> region servers seem to get "stuck" with a single table's data in almost
> contiguous ranges which really throws a wrench in our load balancing.
>
> Thanks.
>