You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Zhou Wei <zh...@mails.tsinghua.edu.cn> on 2008/09/26 14:20:25 UTC

Load Balancing problem for HBase to host data in small size under high load

Hi,

I've trying to use HBase 0.2.1 to host one table of data, about 200MB.
But under very high load of read and update.
I found out that all data is assigned to one data node, can't be scale
to more nodes.

I understand that HBase is designed to store massive data.
Is there a way to balance the load to more data nodes?

Thanks,

Zhou

Re: Load Balancing problem for HBase to host data in small size under high load

Posted by stack <st...@duboce.net>.
Zhou Wei wrote:
> stack 写道:
>
> To bring on a split, try the following:
>
> 1. Set down the hbase.region.memcache.flush.size on your table. There is
>   
>   
> I think you mean MAX_FILESIZE here, right?
>   

Pardon me, yes. In this context, it should be MAX_FILESIZE.
St.Ack

Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
stack 写道:
> Jean-Daniel Cryans wrote:
>   
>> Zhou,
>>
>> No sorry there is no guide tho I'm sure that searching through this mailing
>> list you would find some answers.
>>     
> I just happened to be looking into this last night (smile).
>
> To bring on a split, try the following:
>
> 1. Set down the hbase.region.memcache.flush.size on your table. There is
>   
I think you mean MAX_FILESIZE here, right?

> a bug that prevents you making this adjustment from the shell at the
> moment but for a workaround, see
> https://issues.apache.org/jira/browse/HBASE-903.
> 2. Then, wait till the optional flush runs
> (hbase.regionserver.optionalcacheflushinterval). Default is every 30
> minutes.
>
> If you want to run permanently with the smaller split size, you should
> do as you posit, and lower the flush size --
> hbase.hregion.memcache.flush.size -- in same proportion by which you
> lowered split size.
>
>   
I didn't know that I can do this to individual table in HBase.
Great feature!
I definitely need it in the next step when I add more tables.
Thanks a lot!

> St.Ack
>
>
>   


Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
Andrew Purtell 写道:
> If you make the table metadata change that Stack describes, it will permanently apply to that particular table, even after shutdown/restart, but only that one table.
>
> Changing the global config parameter 'hbase.hregion.max.filesize' affects any tables already existing or yet to be created. 
>
>
>   
Thanks! :)

>    - Andy
>
>   
>> From: stack
>> Subject: Re: Load Balancing problem for HBase to host data in
>> small size under high load
>> To: hbase-user@hadoop.apache.org
>> Date: Friday, September 26, 2008, 8:18 AM
>> Jean-Daniel Cryans wrote:
>>     
>>> Zhou,
>>>
>>> No sorry there is no guide tho I'm sure that
>>> searching through this mailing
>>> list you would find some answers.
>>>       
>> I just happened to be looking into this last night (smile).
>>
>> To bring on a split, try the following:
>>
>> 1. Set down the hbase.region.memcache.flush.size on your
>> table. There is a bug that prevents you making this adjustment
>> from the shell at the moment but for a workaround, see
>> https://issues.apache.org/jira/browse/HBASE-903.
>> 2. Then, wait till the optional flush runs
>> (hbase.regionserver.optionalcacheflushinterval). Default is
>> every 30 minutes.
>>
>> If you want to run permanently with the smaller split size,
>> you should do as you posit, and lower the flush size --
>> hbase.hregion.memcache.flush.size -- in same proportion by
>> which you lowered split size.
>>
>> St.Ack
>>     
>
>
>
>       
>
>
>   


Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Andrew Purtell <ap...@yahoo.com>.
If you make the table metadata change that Stack describes, it will permanently apply to that particular table, even after shutdown/restart, but only that one table.

Changing the global config parameter 'hbase.hregion.max.filesize' affects any tables already existing or yet to be created. 


   - Andy

> From: stack
> Subject: Re: Load Balancing problem for HBase to host data in
> small size under high load
> To: hbase-user@hadoop.apache.org
> Date: Friday, September 26, 2008, 8:18 AM
> Jean-Daniel Cryans wrote:
> > Zhou,
> >
> > No sorry there is no guide tho I'm sure that
> > searching through this mailing
> > list you would find some answers.
>
> I just happened to be looking into this last night (smile).
> 
> To bring on a split, try the following:
> 
> 1. Set down the hbase.region.memcache.flush.size on your
> table. There is a bug that prevents you making this adjustment
> from the shell at the moment but for a workaround, see
> https://issues.apache.org/jira/browse/HBASE-903.
> 2. Then, wait till the optional flush runs
> (hbase.regionserver.optionalcacheflushinterval). Default is
> every 30 minutes.
> 
> If you want to run permanently with the smaller split size,
> you should do as you posit, and lower the flush size --
> hbase.hregion.memcache.flush.size -- in same proportion by
> which you lowered split size.
> 
> St.Ack



      

Re: Load Balancing problem for HBase to host data in small size under high load

Posted by stack <st...@duboce.net>.
Jean-Daniel Cryans wrote:
> Zhou,
>
> No sorry there is no guide tho I'm sure that searching through this mailing
> list you would find some answers.
I just happened to be looking into this last night (smile).

To bring on a split, try the following:

1. Set down the hbase.region.memcache.flush.size on your table. There is
a bug that prevents you making this adjustment from the shell at the
moment but for a workaround, see
https://issues.apache.org/jira/browse/HBASE-903.
2. Then, wait till the optional flush runs
(hbase.regionserver.optionalcacheflushinterval). Default is every 30
minutes.

If you want to run permanently with the smaller split size, you should
do as you posit, and lower the flush size --
hbase.hregion.memcache.flush.size -- in same proportion by which you
lowered split size.

St.Ack

Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
Jean-Daniel Cryans 写道:
> Zhou,
>
> No sorry there is no guide tho I'm sure that searching through this mailing
> list you would find some answers.
>
> More regions means more memory taken to manage them. 2.56MB is surely too
> low, try something like 128MB or even 64MB. Yes, like you saw, you also have
> to change the flush size if your value is very low. If using a 64MB maxsize,
> try using a 12MB for flushes.
>
>   
I have changed maxsize into 64MB and 12MB for flushes.
It split into a number of regions and balance to data nodes quite well.
Thanks!

> What is your machine setup like? (how many, what cpu, mem, hdd, etc).
>
> Thanks,
>
> J-D
>
> On Fri, Sep 26, 2008 at 8:57 AM, Zhou Wei
> <zh...@mails.tsinghua.edu.cn>wrote:
>
>   
>> Should I also change the value of hbase.hregion.memcache.flush.size?
>>
>> Zhou Wei
>>
>> Jean-Daniel Cryans 写道:
>>
>>     


Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Zhou,

No sorry there is no guide tho I'm sure that searching through this mailing
list you would find some answers.

More regions means more memory taken to manage them. 2.56MB is surely too
low, try something like 128MB or even 64MB. Yes, like you saw, you also have
to change the flush size if your value is very low. If using a 64MB maxsize,
try using a 12MB for flushes.

What is your machine setup like? (how many, what cpu, mem, hdd, etc).

Thanks,

J-D

On Fri, Sep 26, 2008 at 8:57 AM, Zhou Wei
<zh...@mails.tsinghua.edu.cn>wrote:

> Should I also change the value of hbase.hregion.memcache.flush.size?
>
> Zhou Wei
>
> Jean-Daniel Cryans 写道:
>
>  Zhou,
>>
>> The data won't be distributed until your region passes the split threshold
>> which is by default 256MB for a single family. If you really want
>> distribution at your level, you should lower the
>> hbase.hregion.max.filesize
>> value in hbase-site.xml
>>
>> J-D
>>
>> On Fri, Sep 26, 2008 at 8:20 AM, Zhou Wei
>> <zh...@mails.tsinghua.edu.cn>wrote:
>>
>>
>>
>>> Hi,
>>>
>>> I've trying to use HBase 0.2.1 to host one table of data, about 200MB.
>>> But under very high load of read and update.
>>> I found out that all data is assigned to one data node, can't be scale
>>> to more nodes.
>>>
>>> I understand that HBase is designed to store massive data.
>>> Is there a way to balance the load to more data nodes?
>>>
>>> Thanks,
>>>
>>> Zhou
>>>
>>>
>>>
>>
>>
>>
>
>

Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
Should I also change the value of hbase.hregion.memcache.flush.size?

Zhou Wei

Jean-Daniel Cryans 写道:
> Zhou,
>
> The data won't be distributed until your region passes the split threshold
> which is by default 256MB for a single family. If you really want
> distribution at your level, you should lower the hbase.hregion.max.filesize
> value in hbase-site.xml
>
> J-D
>
> On Fri, Sep 26, 2008 at 8:20 AM, Zhou Wei
> <zh...@mails.tsinghua.edu.cn>wrote:
>
>   
>> Hi,
>>
>> I've trying to use HBase 0.2.1 to host one table of data, about 200MB.
>> But under very high load of read and update.
>> I found out that all data is assigned to one data node, can't be scale
>> to more nodes.
>>
>> I understand that HBase is designed to store massive data.
>> Is there a way to balance the load to more data nodes?
>>
>> Thanks,
>>
>> Zhou
>>
>>     
>
>   


Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Zhou Wei <zh...@mails.tsinghua.edu.cn>.
Is there any guide to set the value of "hbase.hregion.max.filesize "?
Will setting it very small cause performance problems? e.g. set it as 
2.56 M ?
Is there any other parameters that I should also change?

Zhou

Jean-Daniel Cryans :
> Zhou,
>
> The data won't be distributed until your region passes the split threshold
> which is by default 256MB for a single family. If you really want
> distribution at your level, you should lower the hbase.hregion.max.filesize
> value in hbase-site.xml
>
> J-D
>
> On Fri, Sep 26, 2008 at 8:20 AM, Zhou Wei
> <zh...@mails.tsinghua.edu.cn>wrote:
>
>   
>> Hi,
>>
>> I've trying to use HBase 0.2.1 to host one table of data, about 200MB.
>> But under very high load of read and update.
>> I found out that all data is assigned to one data node, can't be scale
>> to more nodes.
>>
>> I understand that HBase is designed to store massive data.
>> Is there a way to balance the load to more data nodes?
>>
>> Thanks,
>>
>> Zhou
>>
>>     
>
>   


Re: Load Balancing problem for HBase to host data in small size under high load

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Zhou,

The data won't be distributed until your region passes the split threshold
which is by default 256MB for a single family. If you really want
distribution at your level, you should lower the hbase.hregion.max.filesize
value in hbase-site.xml

J-D

On Fri, Sep 26, 2008 at 8:20 AM, Zhou Wei
<zh...@mails.tsinghua.edu.cn>wrote:

> Hi,
>
> I've trying to use HBase 0.2.1 to host one table of data, about 200MB.
> But under very high load of read and update.
> I found out that all data is assigned to one data node, can't be scale
> to more nodes.
>
> I understand that HBase is designed to store massive data.
> Is there a way to balance the load to more data nodes?
>
> Thanks,
>
> Zhou
>