You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "M. C. Srivas" <mc...@gmail.com> on 2011/01/08 22:11:01 UTC

Re: Region loadbalancing

If you did the change, can you share your experience/results?

On Wed, Dec 15, 2010 at 12:04 AM, Jan Lukavský <jan.lukavsky@firma.seznam.cz
> wrote:

> We can give it a try. Currently we use 512 MiB per region, is there any
> upper bound for this value which is not recommended to cross? Are there any
> side-effects we may expect when we set this value to say 1 GiB? I suppose at
> least a bit longer random gets?
>
> Thanks,
>  Jan
>
>
> On 14.12.2010 18:50, Stack wrote:
>
>> Can you do w/ less regions?  1k plus per server is pushing it I'd say.
>>  Can you up your region sizes, for instance?
>> St.Ack
>>
>> On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
>> <ja...@firma.seznam.cz>  wrote:
>>
>>> Hi all,
>>>
>>> we are using HBase 0.20.6 on a cluster of about 25 nodes with about 30k
>>> regions and are experiencing as issue which causes running  M/R jobs to
>>> fail.
>>> When we restart single RegionServer, then happens the following:
>>>  1) all regions of that RS get reassigned to remaing (say 24) nodes
>>>  2) when the restarted RegionServer comes up, HMaster closes about 60
>>> regions on all 24 nodes and assigns them back to the restarted node
>>>
>>> Now, the step 1) is usually very quick (if we can assign 10 regions per
>>> heartbeat, we have 240 regions per heartbeat on the whole cluster).
>>> The step 2) seems problematic, because first about 1200 regions get
>>> unassigned, and then they get slowly assigned to the single RS (speed
>>> again
>>> 10 regions per heartbeat). This time causes clients of Maps connected to
>>> the
>>> regions to throw RetriesExhaustedException.
>>>
>>> I'm aware that we can limit number of regions closed per RegionServer
>>> heartbeat by hbase.regions.close.max, but this config option seems a bit
>>> unsatisfactory, because as we increase size of the cluster, we will get
>>> more
>>> and more regions unassigned in single cluster heartbeat (say we limit
>>> this
>>> to 1, then we get 24 unassigned regions, but only 10 assigned per
>>> heartbeat). This led us to a solution, which seems quite simple. We have
>>> introduced new config option which is used to limit number of regions in
>>> transition. When regionsInTransition.size() crosses boundary, we
>>> temporarily
>>> stop load balancer. This seems to resolve our issue, because no region
>>> gets
>>> unassigned for long time and clients manage to recover within their
>>> number
>>> of retries.
>>>
>>> My question is, is this s general issue and a new config option should be
>>> proposed, or I am missing something a we could have resolved the issue
>>> with
>>> some other config option tuning?
>>>
>>> Thanks.
>>>  Jan
>>>
>>>
>>>
>
> --
>
> Jan Lukavský
> programátor
> Seznam.cz, a.s.
> Radlická 608/2
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>
>

Re: Region loadbalancing

Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi,
  sure, we are experiencing the following:
   - regions are getting unavailable for much less time, so clients are 
no longer failing (in fact some of them usually still fail on 
RetriesExhausedException caused by "failed setting up proxy", but there 
are only few of them)
   - on the other hand the cluster is a little imbalanced - this is 
caused by slow rebalancing, which stops as soon as hbase.regions.slop is 
exceeded

Jan

On 8.1.2011 22:11, M. C. Srivas wrote:
> If you did the change, can you share your experience/results?
>
> On Wed, Dec 15, 2010 at 12:04 AM, Jan Lukavský 
> <jan.lukavsky@firma.seznam.cz <ma...@firma.seznam.cz>> 
> wrote:
>
>     We can give it a try. Currently we use 512 MiB per region, is
>     there any upper bound for this value which is not recommended to
>     cross? Are there any side-effects we may expect when we set this
>     value to say 1 GiB? I suppose at least a bit longer random gets?
>
>     Thanks,
>      Jan
>
>
>     On 14.12.2010 18:50, Stack wrote:
>
>         Can you do w/ less regions?  1k plus per server is pushing it
>         I'd say.
>          Can you up your region sizes, for instance?
>         St.Ack
>
>         On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
>         <jan.lukavsky@firma.seznam.cz
>         <ma...@firma.seznam.cz>>  wrote:
>
>             Hi all,
>
>             we are using HBase 0.20.6 on a cluster of about 25 nodes
>             with about 30k
>             regions and are experiencing as issue which causes running
>              M/R jobs to
>             fail.
>             When we restart single RegionServer, then happens the
>             following:
>              1) all regions of that RS get reassigned to remaing (say
>             24) nodes
>              2) when the restarted RegionServer comes up, HMaster
>             closes about 60
>             regions on all 24 nodes and assigns them back to the
>             restarted node
>
>             Now, the step 1) is usually very quick (if we can assign
>             10 regions per
>             heartbeat, we have 240 regions per heartbeat on the whole
>             cluster).
>             The step 2) seems problematic, because first about 1200
>             regions get
>             unassigned, and then they get slowly assigned to the
>             single RS (speed again
>             10 regions per heartbeat). This time causes clients of
>             Maps connected to the
>             regions to throw RetriesExhaustedException.
>
>             I'm aware that we can limit number of regions closed per
>             RegionServer
>             heartbeat by hbase.regions.close.max, but this config
>             option seems a bit
>             unsatisfactory, because as we increase size of the
>             cluster, we will get more
>             and more regions unassigned in single cluster heartbeat
>             (say we limit this
>             to 1, then we get 24 unassigned regions, but only 10
>             assigned per
>             heartbeat). This led us to a solution, which seems quite
>             simple. We have
>             introduced new config option which is used to limit number
>             of regions in
>             transition. When regionsInTransition.size() crosses
>             boundary, we temporarily
>             stop load balancer. This seems to resolve our issue,
>             because no region gets
>             unassigned for long time and clients manage to recover
>             within their number
>             of retries.
>
>             My question is, is this s general issue and a new config
>             option should be
>             proposed, or I am missing something a we could have
>             resolved the issue with
>             some other config option tuning?
>
>             Thanks.
>              Jan
>
>
>
>
>     -- 
>
>     Jan Lukavský
>     programátor
>     Seznam.cz, a.s.
>     Radlická 608/2
>     15000, Praha 5
>
>     jan.lukavsky@firma.seznam.cz <ma...@firma.seznam.cz>
>     http://www.seznam.cz
>
>


-- 

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

jan.lukavsky@firma.seznam.cz
http://www.seznam.cz