You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Seraph Imalia <se...@eisp.co.za> on 2012/01/03 09:19:14 UTC
Re: RAM Problems - Keeps Crashing
Hi Guys,
I am quite desperate to resolve the issue I am having, what is the next thing I can try to resolve this issue? - All I can think of is to downgrade back to 0.20.6 and write an am to migrate all the data :(
I hope everyone had a great new year :)
Regards,
Seraph
On 30 Dec 2011, at 7:41 PM, Seraph Imalia wrote:
> Setting that property to false has not made any difference, hbase has just crashed again (ran out of heap) and I am busy restarting it. What do I do now?
>
>
> On 29 Dec 2011, at 5:56 PM, Seraph Imalia wrote:
>
>> Thanks,
>>
>> I will try disabling it to see if the memory is being taken up by MSLAB.
>>
>> Regards,
>> Seraph
>>
>> On 29 Dec 2011, at 5:47 PM, Ted Yu wrote:
>>
>>> mslab was introduced after 0.20.6
>>>
>>> Read Todd's series:
>>> http://www.cloudera.com/blog/2011/03/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-3/
>>>
>>> Cheers
>>>
>>> On Thu, Dec 29, 2011 at 12:19 AM, Seraph Imalia <se...@eisp.co.za> wrote:
>>>
>>>> Region Servers
>>>>
>>>> Address Start Code Load
>>>> dynobuntu10:60030 1325081250180 requests=43, regions=224,
>>>> usedHeap=3946, maxHeap=4087
>>>> dynobuntu12:60030 1325081249966 requests=32, regions=224,
>>>> usedHeap=3821, maxHeap=4087
>>>> dynobuntu17:60030 1325081248407 requests=39, regions=225,
>>>> usedHeap=4016, maxHeap=4087
>>>> Total: servers: 3 requests=114, regions=673
>>>>
>>>> I restarted them yesterday and the number of regions increased from 667 to
>>>> 673 and they are about to run out of heap again :(. Should I set that
>>>> property to false? - what does mslab do? - is it new after 0.20.6?
>>>>
>>>> Regards,
>>>> Seraph
>>>>
>>>> On 28 Dec 2011, at 5:46 PM, Ted Yu wrote:
>>>>
>>>>> Can you tell me how many regions each region server hosts ?
>>>>>
>>>>> In 0.90.4 there is this parameter:
>>>>> <name>hbase.hregion.memstore.mslab.enabled</name>
>>>>> <value>true</value>
>>>>> mslab tends to consume heap if region count is high.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Dec 28, 2011 at 6:27 AM, Seraph Imalia <se...@eisp.co.za>
>>>> wrote:
>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> After updating from 0.20.6 to 0.90.4, we have been having serious RAM
>>>>>> issues. I had hbase-env.sh set to use 3 Gigs of RAM with 0.20.6 but
>>>> with
>>>>>> 0.90.4 even 4.5 Gigs seems not enough. It does not matter how much load
>>>>>> the hbase services are under, it just crashes after 24-48 hours. The
>>>> only
>>>>>> difference the load makes is how quickly the services crash. Even over
>>>>>> this holiday season with our lowest load of the year, it crashes just
>>>> after
>>>>>> 36 hours of being started. To fix it, I have to run the stop-hbase.sh
>>>>>> command, wait a while and kill -9 any hbase processes that have stopped
>>>>>> outputting logs or stopped responding, and then run start-hbase.sh
>>>> again.
>>>>>>
>>>>>> Attached are my logs from the latest "start-to-crash". There are 3
>>>>>> servers and hbase is being used for storing URL's - 7 client servers
>>>>>> connect to hbase and perform URL Lookups at about 40 requests per second
>>>>>> (this is the low load over this holiday season). If the URL does not
>>>>>> exist, it gets added. The Key on the HTable is the URL and there are a
>>>> few
>>>>>> fields stored against it - e.g. DateDiscovered, Host, Script,
>>>> QueryString,
>>>>>> etc.
>>>>>>
>>>>>> Each server has a hadoop datanode and an hbase regionserver and 1 of the
>>>>>> servers additionally has the namenode, master and zookeeper. On first
>>>>>> start, each regionserver uses 2 Gigs (usedHeap) and as soon as I restart
>>>>>> the clients, the usedHeap slowly climes until it reaches the maxHeap and
>>>>>> shortly after that, the regionservers start crashing - sometimes they
>>>>>> actually shutdown gracefully by themselves.
>>>>>>
>>>>>> Originally, we had hbase.regionserver.handler.count set to 100 and I
>>>> have
>>>>>> now removed that to leave it as default which has not helped.
>>>>>>
>>>>>> We have not made any changes to the clients and we have a mirrored
>>>>>> instance of this in our UK Data Centre which is still running 0.20.6 and
>>>>>> servicing 10 clients currently at over 300 requests per second (again
>>>> low
>>>>>> load over the holidays) and it is 100% stable.
>>>>>>
>>>>>> What do I do now? - your website says I cannot downgrade?
>>>>>>
>>>>>> Please help
>>>>>>
>>>>>> Regards,
>>>>>> Seraph
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>