You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Yossi Ittach <yo...@gmail.com> on 2008/10/23 14:08:32 UTC

HBase region server - utilization extremely unbalanced

Hi

Using Hbase with 2 Region servers on similar machines , I see that one
machine is serving almost 400 requests per second , while the other one is
serving 0-10 . This cause extreme overload on the first machine. Any idea
what causes it , or how it can be avoided?

Thanks!

Vale et me ama
Yossi

Re: HBase region server - utilization extremely unbalanced

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Yossi,

No you may have more than 1 file per region punctually. See "Cache Flushes"
here : http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion

Changing the file max size to 512MB will decrease memory consumption but
will also affect the level of distribution of your data since you will have
half the number of regions. Andrew Purtell did something like that with
their semi-production cluster. He started by using a maxsize of 64MB to get
early distribution then he gradually upped that number to, IIRC, 512MB.

Something I forgot in my last post is that the namenode also eats a lot of
memory because it holds the whole namespace in cache and there is nothing to
do about it. See the Hadoop documentation about that.

J-D

On Thu, Oct 23, 2008 at 9:28 AM, Yossi Ittach <yo...@gmail.com> wrote:

> J-D,
>
> I have only one table (Huge) , with only 1 family - which means every
> region
> has exactly 1 file . Does it mean that I can significantly decrease the
> size
> of the MapFile indexes in memory?
> Also , what do you think will be the impact of increasing the region size
> (from 256 to 512, for example) , in this scenario?
>
>
>
> Vale et me ama
> Yossi
>
>
> On Thu, Oct 23, 2008 at 2:35 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
>
> > Yossi,
> >
> > Yeah they will go up because each datanode keeps their MapFile indexes in
> > memory and the regionservers keep a Memcache of max 64MB (configurable,
> see
> > hbase-default.xml) for each region it owns.
> >
> > Rule of thumb? Well in hbase-default the maximum a single family can grow
> > inside a single region is 256MB so you can estimate the number of regions
> > you will have, but it also depends on the number of tables and families.
> > For
> > example, if you have a single table with 10 equally filled families, you
> > should expect around 12 regions. Only one family? 120 regions rough.
> >
> > So, based on that number of regions, you can extrapolate the memory
> needed
> > to host your system. Big nodes with 16GB mem will host way more regions
> > then
> > a EC2 small instance.
> >
> > J-D
> >
> > On Thu, Oct 23, 2008 at 8:22 AM, Yossi Ittach <yo...@gmail.com> wrote:
> >
> > > Thanks for the quick reply.
> > >
> > > I'm following the jvm Memory consumption (using "top") , and what
> bothers
> > > me
> > > is that it seems the percentages are just going up and up , and it
> makes
> > me
> > > kind of worried.
> > >
> > > I'm trying to load the system with 30GB of data (this is a benchmark) .
> I
> > > estimate that my production environment will require at least 3 times
> > that
> > > size.  Is there a rule-of-thumb as to how many region servers I'll
> need?
> > >
> > >
> > > Vale et me ama
> > > Yossi
> > >
> > >
> > > On Thu, Oct 23, 2008 at 2:14 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> > > >wrote:
> > >
> > > > Yossi,
> > > >
> > > > The META region is usually heavily used and it's worst when you use
> the
> > > web
> > > > UI. Just for the lolz, go on the Master's page (the main page) and
> hit
> > > > "refresh" a couple of times; you should see that number go high up.
> > > >
> > > > And on how to avoid it, well the only way to split that load would be
> > to
> > > > have the META region do a split but it will require a lot of data
> hence
> > a
> > > > lot of user regions which I doubt you have on 2 machines.
> > > >
> > > > J-D
> > > >
> > > > On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <yo...@gmail.com>
> > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Using Hbase with 2 Region servers on similar machines , I see that
> > one
> > > > > machine is serving almost 400 requests per second , while the other
> > one
> > > > is
> > > > > serving 0-10 . This cause extreme overload on the first machine.
> Any
> > > idea
> > > > > what causes it , or how it can be avoided?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Vale et me ama
> > > > > Yossi
> > > > >
> > > >
> > >
> >
>

Re: HBase region server - utilization extremely unbalanced

Posted by Yossi Ittach <yo...@gmail.com>.
J-D,

I have only one table (Huge) , with only 1 family - which means every region
has exactly 1 file . Does it mean that I can significantly decrease the size
of the MapFile indexes in memory?
Also , what do you think will be the impact of increasing the region size
(from 256 to 512, for example) , in this scenario?



Vale et me ama
Yossi


On Thu, Oct 23, 2008 at 2:35 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Yossi,
>
> Yeah they will go up because each datanode keeps their MapFile indexes in
> memory and the regionservers keep a Memcache of max 64MB (configurable, see
> hbase-default.xml) for each region it owns.
>
> Rule of thumb? Well in hbase-default the maximum a single family can grow
> inside a single region is 256MB so you can estimate the number of regions
> you will have, but it also depends on the number of tables and families.
> For
> example, if you have a single table with 10 equally filled families, you
> should expect around 12 regions. Only one family? 120 regions rough.
>
> So, based on that number of regions, you can extrapolate the memory needed
> to host your system. Big nodes with 16GB mem will host way more regions
> then
> a EC2 small instance.
>
> J-D
>
> On Thu, Oct 23, 2008 at 8:22 AM, Yossi Ittach <yo...@gmail.com> wrote:
>
> > Thanks for the quick reply.
> >
> > I'm following the jvm Memory consumption (using "top") , and what bothers
> > me
> > is that it seems the percentages are just going up and up , and it makes
> me
> > kind of worried.
> >
> > I'm trying to load the system with 30GB of data (this is a benchmark) . I
> > estimate that my production environment will require at least 3 times
> that
> > size.  Is there a rule-of-thumb as to how many region servers I'll need?
> >
> >
> > Vale et me ama
> > Yossi
> >
> >
> > On Thu, Oct 23, 2008 at 2:14 PM, Jean-Daniel Cryans <jdcryans@apache.org
> > >wrote:
> >
> > > Yossi,
> > >
> > > The META region is usually heavily used and it's worst when you use the
> > web
> > > UI. Just for the lolz, go on the Master's page (the main page) and hit
> > > "refresh" a couple of times; you should see that number go high up.
> > >
> > > And on how to avoid it, well the only way to split that load would be
> to
> > > have the META region do a split but it will require a lot of data hence
> a
> > > lot of user regions which I doubt you have on 2 machines.
> > >
> > > J-D
> > >
> > > On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <yo...@gmail.com>
> wrote:
> > >
> > > > Hi
> > > >
> > > > Using Hbase with 2 Region servers on similar machines , I see that
> one
> > > > machine is serving almost 400 requests per second , while the other
> one
> > > is
> > > > serving 0-10 . This cause extreme overload on the first machine. Any
> > idea
> > > > what causes it , or how it can be avoided?
> > > >
> > > > Thanks!
> > > >
> > > > Vale et me ama
> > > > Yossi
> > > >
> > >
> >
>

Re: HBase region server - utilization extremely unbalanced

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Yossi,

Yeah they will go up because each datanode keeps their MapFile indexes in
memory and the regionservers keep a Memcache of max 64MB (configurable, see
hbase-default.xml) for each region it owns.

Rule of thumb? Well in hbase-default the maximum a single family can grow
inside a single region is 256MB so you can estimate the number of regions
you will have, but it also depends on the number of tables and families. For
example, if you have a single table with 10 equally filled families, you
should expect around 12 regions. Only one family? 120 regions rough.

So, based on that number of regions, you can extrapolate the memory needed
to host your system. Big nodes with 16GB mem will host way more regions then
a EC2 small instance.

J-D

On Thu, Oct 23, 2008 at 8:22 AM, Yossi Ittach <yo...@gmail.com> wrote:

> Thanks for the quick reply.
>
> I'm following the jvm Memory consumption (using "top") , and what bothers
> me
> is that it seems the percentages are just going up and up , and it makes me
> kind of worried.
>
> I'm trying to load the system with 30GB of data (this is a benchmark) . I
> estimate that my production environment will require at least 3 times that
> size.  Is there a rule-of-thumb as to how many region servers I'll need?
>
>
> Vale et me ama
> Yossi
>
>
> On Thu, Oct 23, 2008 at 2:14 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
>
> > Yossi,
> >
> > The META region is usually heavily used and it's worst when you use the
> web
> > UI. Just for the lolz, go on the Master's page (the main page) and hit
> > "refresh" a couple of times; you should see that number go high up.
> >
> > And on how to avoid it, well the only way to split that load would be to
> > have the META region do a split but it will require a lot of data hence a
> > lot of user regions which I doubt you have on 2 machines.
> >
> > J-D
> >
> > On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <yo...@gmail.com> wrote:
> >
> > > Hi
> > >
> > > Using Hbase with 2 Region servers on similar machines , I see that one
> > > machine is serving almost 400 requests per second , while the other one
> > is
> > > serving 0-10 . This cause extreme overload on the first machine. Any
> idea
> > > what causes it , or how it can be avoided?
> > >
> > > Thanks!
> > >
> > > Vale et me ama
> > > Yossi
> > >
> >
>

Re: HBase region server - utilization extremely unbalanced

Posted by Yossi Ittach <yo...@gmail.com>.
Thank you very much. I'll report my benchmark results when I have it , maybe
it will help someone

Vale et me ama
Yossi


On Thu, Oct 23, 2008 at 7:35 PM, Michael Stack <st...@duboce.net> wrote:

> Yossi Ittach wrote:
>
>> Thanks for the quick reply.
>>
>> I'm following the jvm Memory consumption (using "top") , and what bothers
>> me
>> is that it seems the percentages are just going up and up , and it makes
>> me
>> kind of worried.
>>
>>
>
> 'top' is an extremely crude tool for figuring how the JVM is doing
> memory-wise.  JVM will generally tend to grow to fill alloted space and then
> do the extra GC'ing to keep within the bound.  If you want to watch it in
> action, enable GC logging -- add "-Xloggc:/tmp/gc.log" to the hbase JVM
> options in hbase-env.sh or turn on JMX and connect to your running
> regionserver with jconsole.
>
> If you'just testing and worried about memory usage, just load till you
> OOME.  I've been able to load 30+Million rows/~400 regions into a single 1G
> RegionServer before it OOME'd.  But I've heard of others with > one family
> who have only been able to load 30 or 40 into 1G heap.
>
> St.Ack
>
>
>
>  I'm trying to load the system with 30GB of data (this is a benchmark) . I
>> estimate that my production environment will require at least 3 times that
>> size.  Is there a rule-of-thumb as to how many region servers I'll need?
>>
>>
>> Vale et me ama
>> Yossi
>>
>>
>> On Thu, Oct 23, 2008 at 2:14 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>>
>>
>>
>>> Yossi,
>>>
>>> The META region is usually heavily used and it's worst when you use the
>>> web
>>> UI. Just for the lolz, go on the Master's page (the main page) and hit
>>> "refresh" a couple of times; you should see that number go high up.
>>>
>>> And on how to avoid it, well the only way to split that load would be to
>>> have the META region do a split but it will require a lot of data hence a
>>> lot of user regions which I doubt you have on 2 machines.
>>>
>>> J-D
>>>
>>> On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <yo...@gmail.com> wrote:
>>>
>>>
>>>
>>>> Hi
>>>>
>>>> Using Hbase with 2 Region servers on similar machines , I see that one
>>>> machine is serving almost 400 requests per second , while the other one
>>>>
>>>>
>>> is
>>>
>>>
>>>> serving 0-10 . This cause extreme overload on the first machine. Any
>>>> idea
>>>> what causes it , or how it can be avoided?
>>>>
>>>> Thanks!
>>>>
>>>> Vale et me ama
>>>> Yossi
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: HBase region server - utilization extremely unbalanced

Posted by Michael Stack <st...@duboce.net>.
Yossi Ittach wrote:
> Thanks for the quick reply.
>
> I'm following the jvm Memory consumption (using "top") , and what bothers me
> is that it seems the percentages are just going up and up , and it makes me
> kind of worried.
>   

'top' is an extremely crude tool for figuring how the JVM is doing 
memory-wise.  JVM will generally tend to grow to fill alloted space and 
then do the extra GC'ing to keep within the bound.  If you want to watch 
it in action, enable GC logging -- add "-Xloggc:/tmp/gc.log" to the 
hbase JVM options in hbase-env.sh or turn on JMX and connect to your 
running regionserver with jconsole.

If you'just testing and worried about memory usage, just load till you 
OOME.  I've been able to load 30+Million rows/~400 regions into a single 
1G RegionServer before it OOME'd.  But I've heard of others with > one 
family who have only been able to load 30 or 40 into 1G heap.

St.Ack


> I'm trying to load the system with 30GB of data (this is a benchmark) . I
> estimate that my production environment will require at least 3 times that
> size.  Is there a rule-of-thumb as to how many region servers I'll need?
>
>
> Vale et me ama
> Yossi
>
>
> On Thu, Oct 23, 2008 at 2:14 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>   
>> Yossi,
>>
>> The META region is usually heavily used and it's worst when you use the web
>> UI. Just for the lolz, go on the Master's page (the main page) and hit
>> "refresh" a couple of times; you should see that number go high up.
>>
>> And on how to avoid it, well the only way to split that load would be to
>> have the META region do a split but it will require a lot of data hence a
>> lot of user regions which I doubt you have on 2 machines.
>>
>> J-D
>>
>> On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <yo...@gmail.com> wrote:
>>
>>     
>>> Hi
>>>
>>> Using Hbase with 2 Region servers on similar machines , I see that one
>>> machine is serving almost 400 requests per second , while the other one
>>>       
>> is
>>     
>>> serving 0-10 . This cause extreme overload on the first machine. Any idea
>>> what causes it , or how it can be avoided?
>>>
>>> Thanks!
>>>
>>> Vale et me ama
>>> Yossi
>>>
>>>       
>
>   


Re: HBase region server - utilization extremely unbalanced

Posted by Yossi Ittach <yo...@gmail.com>.
Thanks for the quick reply.

I'm following the jvm Memory consumption (using "top") , and what bothers me
is that it seems the percentages are just going up and up , and it makes me
kind of worried.

I'm trying to load the system with 30GB of data (this is a benchmark) . I
estimate that my production environment will require at least 3 times that
size.  Is there a rule-of-thumb as to how many region servers I'll need?


Vale et me ama
Yossi


On Thu, Oct 23, 2008 at 2:14 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Yossi,
>
> The META region is usually heavily used and it's worst when you use the web
> UI. Just for the lolz, go on the Master's page (the main page) and hit
> "refresh" a couple of times; you should see that number go high up.
>
> And on how to avoid it, well the only way to split that load would be to
> have the META region do a split but it will require a lot of data hence a
> lot of user regions which I doubt you have on 2 machines.
>
> J-D
>
> On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <yo...@gmail.com> wrote:
>
> > Hi
> >
> > Using Hbase with 2 Region servers on similar machines , I see that one
> > machine is serving almost 400 requests per second , while the other one
> is
> > serving 0-10 . This cause extreme overload on the first machine. Any idea
> > what causes it , or how it can be avoided?
> >
> > Thanks!
> >
> > Vale et me ama
> > Yossi
> >
>

Re: HBase region server - utilization extremely unbalanced

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Yossi,

The META region is usually heavily used and it's worst when you use the web
UI. Just for the lolz, go on the Master's page (the main page) and hit
"refresh" a couple of times; you should see that number go high up.

And on how to avoid it, well the only way to split that load would be to
have the META region do a split but it will require a lot of data hence a
lot of user regions which I doubt you have on 2 machines.

J-D

On Thu, Oct 23, 2008 at 8:08 AM, Yossi Ittach <yo...@gmail.com> wrote:

> Hi
>
> Using Hbase with 2 Region servers on similar machines , I see that one
> machine is serving almost 400 requests per second , while the other one is
> serving 0-10 . This cause extreme overload on the first machine. Any idea
> what causes it , or how it can be avoided?
>
> Thanks!
>
> Vale et me ama
> Yossi
>