You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Seraph Imalia <se...@eisp.co.za> on 2012/01/03 09:19:14 UTC
Re: RAM Problems - Keeps Crashing

Hi Guys,

I am quite desperate to resolve the issue I am having, what is the next thing I can try to resolve this issue? - All I can think of is to downgrade back to 0.20.6 and write an am to migrate all the data :(

I hope everyone had a great new year :)

Regards,
Seraph


On 30 Dec 2011, at 7:41 PM, Seraph Imalia wrote:

> Setting that property to false has not made any difference, hbase has just crashed again (ran out of heap) and I am busy restarting it.  What do I do now?
> 
> 
> On 29 Dec 2011, at 5:56 PM, Seraph Imalia wrote:
> 
>> Thanks,
>> 
>> I will try disabling it to see if the memory is being taken up by MSLAB.
>> 
>> Regards,
>> Seraph
>> 
>> On 29 Dec 2011, at 5:47 PM, Ted Yu wrote:
>> 
>>> mslab was introduced after 0.20.6
>>> 
>>> Read Todd's series:
>>> http://www.cloudera.com/blog/2011/03/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-3/
>>> 
>>> Cheers
>>> 
>>> On Thu, Dec 29, 2011 at 12:19 AM, Seraph Imalia <se...@eisp.co.za> wrote:
>>> 
>>>> Region Servers
>>>> 
>>>> Address                         Start Code              Load
>>>> dynobuntu10:60030       1325081250180   requests=43, regions=224,
>>>> usedHeap=3946, maxHeap=4087
>>>> dynobuntu12:60030       1325081249966   requests=32, regions=224,
>>>> usedHeap=3821, maxHeap=4087
>>>> dynobuntu17:60030       1325081248407   requests=39, regions=225,
>>>> usedHeap=4016, maxHeap=4087
>>>> Total:  servers: 3              requests=114, regions=673
>>>> 
>>>> I restarted them yesterday and the number of regions increased from 667 to
>>>> 673 and they are about to run out of heap again :(.  Should I set that
>>>> property to false? - what does mslab do? - is it new after 0.20.6?
>>>> 
>>>> Regards,
>>>> Seraph
>>>> 
>>>> On 28 Dec 2011, at 5:46 PM, Ted Yu wrote:
>>>> 
>>>>> Can you tell me how many regions each region server hosts ?
>>>>> 
>>>>> In 0.90.4 there is this parameter:
>>>>> <name>hbase.hregion.memstore.mslab.enabled</name>
>>>>> <value>true</value>
>>>>> mslab tends to consume heap if region count is high.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Wed, Dec 28, 2011 at 6:27 AM, Seraph Imalia <se...@eisp.co.za>
>>>> wrote:
>>>>> 
>>>>>> Hi Guys,
>>>>>> 
>>>>>> After updating from 0.20.6 to 0.90.4, we have been having serious RAM
>>>>>> issues.  I had hbase-env.sh set to use 3 Gigs of RAM with 0.20.6 but
>>>> with
>>>>>> 0.90.4 even 4.5 Gigs seems not enough.  It does not matter how much load
>>>>>> the hbase services are under, it just crashes after 24-48 hours.  The
>>>> only
>>>>>> difference the load makes is how quickly the services crash.  Even over
>>>>>> this holiday season with our lowest load of the year, it crashes just
>>>> after
>>>>>> 36 hours of being started.  To fix it, I have to run the stop-hbase.sh
>>>>>> command, wait a while and kill -9 any hbase processes that have stopped
>>>>>> outputting logs or stopped responding, and then run start-hbase.sh
>>>> again.
>>>>>> 
>>>>>> Attached are my logs from the latest "start-to-crash".  There are 3
>>>>>> servers and hbase is being used for storing URL's - 7 client servers
>>>>>> connect to hbase and perform URL Lookups at about 40 requests per second
>>>>>> (this is the low load over this holiday season).  If the URL does not
>>>>>> exist, it gets added.  The Key on the HTable is the URL and there are a
>>>> few
>>>>>> fields stored against it - e.g. DateDiscovered, Host, Script,
>>>> QueryString,
>>>>>> etc.
>>>>>> 
>>>>>> Each server has a hadoop datanode and an hbase regionserver and 1 of the
>>>>>> servers additionally has the namenode, master and zookeeper.  On first
>>>>>> start, each regionserver uses 2 Gigs (usedHeap) and as soon as I restart
>>>>>> the clients, the usedHeap slowly climes until it reaches the maxHeap and
>>>>>> shortly after that, the regionservers start crashing - sometimes they
>>>>>> actually shutdown gracefully by themselves.
>>>>>> 
>>>>>> Originally, we had hbase.regionserver.handler.count set to 100 and I
>>>> have
>>>>>> now removed that to leave it as default which has not helped.
>>>>>> 
>>>>>> We have not made any changes to the clients and we have a mirrored
>>>>>> instance of this in our UK Data Centre which is still running 0.20.6 and
>>>>>> servicing 10 clients currently at over 300 requests per second (again
>>>> low
>>>>>> load over the holidays) and it is 100% stable.
>>>>>> 
>>>>>> What do I do now? - your website says I cannot downgrade?
>>>>>> 
>>>>>> Please help
>>>>>> 
>>>>>> Regards,
>>>>>> Seraph
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>