You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bryan Beaudreault <bb...@hubspot.com> on 2012/12/07 22:01:12 UTC

Bulk loading (and/or major compaction) causing OOM

We have a couple tables that had thousands of regions due to the size of
the day in them.  We recently changed them to have larger regions (nearly
4GB).  We are trying to bulk load these in now, but every time we do our
servers die with OOM.

The logs seem to show that there is always a major compaction happening
when the OOM happens.  This is among other normal usage from a variety of
apps in our product, so the memstores, block cache, etc are all active
during this time.

I was reading through the compaction code and it doesn't look like it
should take up much memory (depending on how the Reader class works) .
 Does anyone with more knowledge of these internals know how it bulk load
and major compaction works with regard to memory?

We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
version 0.90.4 (I know, I know, we're working to upgrade).

Thanks.

Re: Bulk loading (and/or major compaction) causing OOM

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Merging is not an option for us, because we cannot afford to bring our
cluster down.  Also, we are not yet convinced that our cluster can handle
such large regions due to all the OOM issues we are seeing when trying to
bring new, bigger regions online.


On Sat, Dec 8, 2012 at 3:42 PM, Marcos Ortiz <ml...@uci.cu> wrote:

>
> On 12/08/2012 11:50 AM, Bryan Beaudreault wrote:
>
> Thanks for the responses guys.  Responses inline
>
>
>  When you are doing the bulk load, are you pre-split your regions?
> What OS are you using and what version of Java?
>
>  Yes, regions are pre-split.  We calculated them using M/R before attempting
> to bulk load the data.  We've done this before with smaller sizes and it
> has worked fine.
>
> Centos5, java 1.6.0_27
>
>
>  Yes, my friend. You should know all the benefits in the new stable
>
>  release (0.94.3), so
>
>  this is the first advice.
>
>  We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x
> branch.
>
>  Great to hear.
>
>
> On Fri, Dec 7, 2012 at 4:48 PM, Stack <st...@duboce.net> <st...@duboce.net> wrote:
>
>
>  On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault<bb...@hubspot.com> <bb...@hubspot.com>wrote:
>
>
>  We have a couple tables that had thousands of regions due to the size of
> the day in them.  We recently changed them to have larger regions (nearly
> 4GB).  We are trying to bulk load these in now, but every time we do our
> servers die with OOM.
>
>
>
>  You mean, you are reloading the data that once was in thousands of regions
> instead into new regions of 4GB in size?
>
> I'd be surprised if the actual bulk load brings on the OOME.
>
>
>
>  That's correct.  The exact same data is currently live in an older table
> with thousands of smaller regions.  Once we get these loaded we will swap
> in the new table and delete the old.
>
>
>
>     The logs seem to show that there is always a major compaction happening
> when the OOM happens.  This is among other normal usage from a variety of
> apps in our product, so the memstores, block cache, etc are all active
> during this time.
>
>
>
>  Could you turn off major compaction during the bulk load to see if that
> helps?
>
> Automatic major compactions are actually off for our cluster, it looks
>
>  like they start doing minor compactions as data is loaded in, and that is
> where we first saw the OOM issues.  So we tried forcing major compactions
> earlier instead.
>
>
>   I was reading through the compaction code and it doesn't look like it
> should take up much memory (depending on how the Reader class works) .
>
>
>
> Yes.
>
> Are there lots of storefiles under each region?
>
> Yes actually, the bulk loaded data usually seems to contain approximately
>
>  5-10 files per region.  Likely due to the output settings of the M/R job
> that creates this data.
>
>
>
>    Does anyone with more knowledge of these internals know how it bulk load
> and major compaction works with regard to memory?
>
> We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
> version 0.90.4 (I know, I know, we're working to upgrade).
>
>
>  How much have you given hbase?
>
> If you look at your cluster monitoring, are you swapping?
>
> The regionservers are carrying how many regions per server?
>
>
>  The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of
> which 1GB goes to DN and rest to OS)
> Swapping is disabled.
> We have around 350 regions per RS currently. What we're doing now with this
> table is part of our effort to decrease the number of regions across all
> tables.  We need to do it with minimal downtime though so it is slow going.
>  We are aiming for around 200 regions per RS.
>
>  Yes, It would be nice to see less regions by servers. Have you considered
> to merge some adjacent
> regions?
>
>   St.Ack
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> http://www.uci.cuhttp://www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci
>
>
> --
> **
>
> Marcos Luis Ortíz Valmaseda
> about.me/marcosortiz
> @marcosluis2186 <http://twitter.com/marcosluis2186>
>  **
>
>   <http://www.uci.cu/>
>
>

Re: Bulk loading (and/or major compaction) causing OOM

Posted by Marcos Ortiz <ml...@uci.cu>.
On 12/08/2012 11:50 AM, Bryan Beaudreault wrote:
> Thanks for the responses guys.  Responses inline
>
>> When you are doing the bulk load, are you pre-split your regions?
>> What OS are you using and what version of Java?
> Yes, regions are pre-split.  We calculated them using M/R before attempting
> to bulk load the data.  We've done this before with smaller sizes and it
> has worked fine.
>
> Centos5, java 1.6.0_27
>
>> Yes, my friend. You should know all the benefits in the new stable
> release (0.94.3), so
>> this is the first advice.
> We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x
> branch.
Great to hear.
>
> On Fri, Dec 7, 2012 at 4:48 PM, Stack <st...@duboce.net> wrote:
>
>> On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault
>> <bb...@hubspot.com>wrote:
>>
>>> We have a couple tables that had thousands of regions due to the size of
>>> the day in them.  We recently changed them to have larger regions (nearly
>>> 4GB).  We are trying to bulk load these in now, but every time we do our
>>> servers die with OOM.
>>>
>>>
>> You mean, you are reloading the data that once was in thousands of regions
>> instead into new regions of 4GB in size?
>>
>> I'd be surprised if the actual bulk load brings on the OOME.
>>
>>
> That's correct.  The exact same data is currently live in an older table
> with thousands of smaller regions.  Once we get these loaded we will swap
> in the new table and delete the old.
>
>
>>
>>> The logs seem to show that there is always a major compaction happening
>>> when the OOM happens.  This is among other normal usage from a variety of
>>> apps in our product, so the memstores, block cache, etc are all active
>>> during this time.
>>>
>>>
>> Could you turn off major compaction during the bulk load to see if that
>> helps?
>>
>> Automatic major compactions are actually off for our cluster, it looks
> like they start doing minor compactions as data is loaded in, and that is
> where we first saw the OOM issues.  So we tried forcing major compactions
> earlier instead.
>
>>
>>> I was reading through the compaction code and it doesn't look like it
>>> should take up much memory (depending on how the Reader class works) .
>>>
>>
>> Yes.
>>
>> Are there lots of storefiles under each region?
>>
>> Yes actually, the bulk loaded data usually seems to contain approximately
> 5-10 files per region.  Likely due to the output settings of the M/R job
> that creates this data.
>
>
>>
>>>   Does anyone with more knowledge of these internals know how it bulk load
>>> and major compaction works with regard to memory?
>>>
>>> We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
>>> version 0.90.4 (I know, I know, we're working to upgrade).
>>>
>> How much have you given hbase?
>>
>> If you look at your cluster monitoring, are you swapping?
>>
>> The regionservers are carrying how many regions per server?
>>
> The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of
> which 1GB goes to DN and rest to OS)
> Swapping is disabled.
> We have around 350 regions per RS currently. What we're doing now with this
> table is part of our effort to decrease the number of regions across all
> tables.  We need to do it with minimal downtime though so it is slow going.
>   We are aiming for around 200 regions per RS.
Yes, It would be nice to see less regions by servers. Have you 
considered to merge some adjacent
regions?

>
>> St.Ack
>>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

-- 

Marcos Luis Ortíz Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Bulk loading (and/or major compaction) causing OOM

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Thanks for the responses guys.  Responses inline

> When you are doing the bulk load, are you pre-split your regions?
> What OS are you using and what version of Java?

Yes, regions are pre-split.  We calculated them using M/R before attempting
to bulk load the data.  We've done this before with smaller sizes and it
has worked fine.

Centos5, java 1.6.0_27

> Yes, my friend. You should know all the benefits in the new stable
release (0.94.3), so
> this is the first advice.

We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x
branch.

On Fri, Dec 7, 2012 at 4:48 PM, Stack <st...@duboce.net> wrote:

> On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault
> <bb...@hubspot.com>wrote:
>
> > We have a couple tables that had thousands of regions due to the size of
> > the day in them.  We recently changed them to have larger regions (nearly
> > 4GB).  We are trying to bulk load these in now, but every time we do our
> > servers die with OOM.
> >
> >
> You mean, you are reloading the data that once was in thousands of regions
> instead into new regions of 4GB in size?
>
> I'd be surprised if the actual bulk load brings on the OOME.
>
>
That's correct.  The exact same data is currently live in an older table
with thousands of smaller regions.  Once we get these loaded we will swap
in the new table and delete the old.


>
>

> > The logs seem to show that there is always a major compaction happening
> > when the OOM happens.  This is among other normal usage from a variety of
> > apps in our product, so the memstores, block cache, etc are all active
> > during this time.
> >
> >
>
> Could you turn off major compaction during the bulk load to see if that
> helps?
>
> Automatic major compactions are actually off for our cluster, it looks
like they start doing minor compactions as data is loaded in, and that is
where we first saw the OOM issues.  So we tried forcing major compactions
earlier instead.

>
>
> > I was reading through the compaction code and it doesn't look like it
> > should take up much memory (depending on how the Reader class works) .
> >
>
>
> Yes.
>
> Are there lots of storefiles under each region?
>
> Yes actually, the bulk loaded data usually seems to contain approximately
5-10 files per region.  Likely due to the output settings of the M/R job
that creates this data.


>
>
> >  Does anyone with more knowledge of these internals know how it bulk load
> > and major compaction works with regard to memory?
> >
> > We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
> > version 0.90.4 (I know, I know, we're working to upgrade).
> >
>
> How much have you given hbase?
>
> If you look at your cluster monitoring, are you swapping?
>
> The regionservers are carrying how many regions per server?
>

The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of
which 1GB goes to DN and rest to OS)
Swapping is disabled.
We have around 350 regions per RS currently. What we're doing now with this
table is part of our effort to decrease the number of regions across all
tables.  We need to do it with minimal downtime though so it is slow going.
 We are aiming for around 200 regions per RS.

>
> St.Ack
>

Re: Bulk loading (and/or major compaction) causing OOM

Posted by Stack <st...@duboce.net>.
On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault
<bb...@hubspot.com>wrote:

> We have a couple tables that had thousands of regions due to the size of
> the day in them.  We recently changed them to have larger regions (nearly
> 4GB).  We are trying to bulk load these in now, but every time we do our
> servers die with OOM.
>
>
You mean, you are reloading the data that once was in thousands of regions
instead into new regions of 4GB in size?

I'd be surprised if the actual bulk load brings on the OOME.



> The logs seem to show that there is always a major compaction happening
> when the OOM happens.  This is among other normal usage from a variety of
> apps in our product, so the memstores, block cache, etc are all active
> during this time.
>
>

Could you turn off major compaction during the bulk load to see if that
helps?



> I was reading through the compaction code and it doesn't look like it
> should take up much memory (depending on how the Reader class works) .
>


Yes.

Are there lots of storefiles under each region?



>  Does anyone with more knowledge of these internals know how it bulk load
> and major compaction works with regard to memory?
>
> We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
> version 0.90.4 (I know, I know, we're working to upgrade).
>

How much have you given hbase?

If you look at your cluster monitoring, are you swapping?

The regionservers are carrying how many regions per server?

St.Ack

Re: Bulk loading (and/or major compaction) causing OOM

Posted by Marcos Ortiz <ml...@uci.cu>.
On 12/07/2012 04:01 PM, Bryan Beaudreault wrote:
> We have a couple tables that had thousands of regions due to the size of
> the day in them.  We recently changed them to have larger regions (nearly
> 4GB).  We are trying to bulk load these in now, but every time we do our
> servers die with OOM.
When you are doing the bulk load, are you pre-split your regions?
What OS are you using and what version of Java?
>
> The logs seem to show that there is always a major compaction happening
> when the OOM happens.  This is among other normal usage from a variety of
> apps in our product, so the memstores, block cache, etc are all active
> during this time.
There are a good number of improvements in the new releases respect to
compactions.
>
> I was reading through the compaction code and it doesn't look like it
> should take up much memory (depending on how the Reader class works) .
>   Does anyone with more knowledge of these internals know how it bulk load
> and major compaction works with regard to memory?
>
> We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase
> version 0.90.4 (I know, I know, we're working to upgrade).
Yes, my friend. You should know all the benefits in the new stable 
release (0.94.3), so
this is the first advice.
>
> Thanks.
>
>
>


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci