You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Jack Zhang(jian)" <ja...@huawei.com> on 2011/05/13 05:42:44 UTC
答复: 答复: Hmaster is OutOfMemory

Ok, I try to do it.

Jian Zhang(Jack)

-----邮件原件-----
发件人: Stack [mailto:saint.ack@gmail.com] 
发送时间: 2011年5月13日 9:48
收件人: user@hbase.apache.org
抄送: user@hbase.apache.org
主题: Re: 答复: Hmaster is OutOfMemory

Have a go at fixing it.  If you do it there is some hope it will make it into the code base soon.

My English isn't too good either but it's better than my Chinese 

Stack



On May 12, 2011, at 18:04, "Jack Zhang(jian)" <ja...@huawei.com> wrote:

> Maybe I didn't descript the issue clearly(I hate English ^_^).Yes, we need on branch. I don't have a patch yet, but if need I can try my best to fix this issue.
> 
> -----邮件原件-----
> 发件人: Stack [mailto:saint.ack@gmail.com] 
> 发送时间: 2011年5月13日 8:46
> 收件人: user@hbase.apache.org
> 抄送: user@hbase.apache.org; Chenjian
> 主题: Re: Hmaster is OutOfMemory
> 
> Pardon my being slow but I think I now understand what you are getting at.  I took a look at a heap dump on one of our production servers which is carrying 10k regions.  I see retention of an hserverload per online region.  The count of hserverload hregionload instances retained can be regions * regions.   On my cluster I see retention of about .5 gigs.   On a cluster of 100k regions it would be a good bit worse.   This should be fixed in trunk.  Do you need a fix on branch?   Maybe you have a patch that mulls out load when hsi is added to online regions.  Would need to make sure we did not break balancer if we did this
> 
> Thanks for digging in here
> 
> Stack
> 
> 
> 
> On May 12, 2011, at 5:44, "Jack Zhang(jian)" <ja...@huawei.com> wrote:
> 
>> In our test cluster, 1 hmaster, 3 regionserver, but there are 1,481 HServerInfo instance.
>> So in hbase 0.90.2 , I think there are memory leak in hmaster.
>> 1. Regionsever reports it's load to hmaster period, so the corresponding Hserverinfo will be changed in ServerManager.onlineServers(line 244, 337 in ServerManager) .
>> 2. When hbase cluster start up, AssignmentManager will receive RS_ZK_REGION_OPENED event, then it will construct OpenedRegionHandler(line 431 in AssignmentManager) by the corresponding Hserverinfo instance refreshed by the Regionserver latest.
>> 3. Then OpenedRegionHandler will store this Hserverinfo into assignmentManager.regionOnline(line 97 in OpenedRegionHandler).
>> 
>> After regionserver reported load, hmaster always store the new hserverinfo instance to assignmentManager.regionOnline if region opened meanwhile.
>> So the more the region opened, the more memory will leak.
>> 
>> -----邮件原件-----
>> 发件人: Gaojinchao [mailto:gaojinchao@huawei.com] 
>> 发送时间: 2011年5月12日 15:24
>> 收件人: user@hbase.apache.org
>> 主题: Re: Hmaster is OutOfMemory
>> 
>> Thanks, Stack.
>> The heap dump is so big. I try to dig it and share the result. 
>> Then ,you can give me some suggestion.
>> 
>> 
>> Do you give me more suggestion about my cluster?
>> 
>> My application:
>> write operation with api put<list> is about 75k Puts/s( one put about 400 Bytes)
>> read opreation is rarely, but the latency time demands lower than 5s
>> 
>> machine:
>> cpu:    8 core 2.GHz
>> memory: 24G, Hbase use 8G
>> Disk:   2T*8 = 16T
>> 
>> node number: 13 nodes
>> 
>> dfs configure：
>> 
>> dfs.block.size 256M 
>> dfs.datanode.handler.count 10 
>> dfs.namenode.handler.count 30 
>> dfs.datanode.max.xcievers 2047 
>> dfs.support.append True 
>> 
>> hbase configure：
>> 
>> hbase.regionserver.handler.count 50 
>> hbase.regionserver.global.memstore.upperLimit 0.4 
>> hbase.regionserver.global.memstore.lowerLimit 0.35 
>> hbase.hregion.memstore.flush.size 128M 
>> hbase.hregion.max.filesize 512M 
>> hbase.client.scanner.caching 1 
>> hfile.block.cache.size 0.2 
>> hbase.hregion.memstore.block.multiplier 3 
>> hbase.hstore.blockingStoreFiles 10 
>> hbase.hstore.compaction.min.size 64M 
>> 
>> compress: gz
>> 
>> I am afraid of some problem:
>> 1, One region server has about 12k reigons or more, if up the hregion.max.filesize , parallel scalability will be low and affect scan latency .
>> 2, If a region server crashed, It could bring on the other region server crashed?
>> 
>> Can you give some suggestion about the hbase parameter and my cluster ?
>> 
>> -----邮件原件-----
>> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
>> 发送时间: 2011年5月10日 23:21
>> 收件人: user@hbase.apache.org
>> 主题: Re: Hmaster is OutOfMemory
>> 
>> 2011/5/9 Gaojinchao <ga...@huawei.com>:
>>> My first cluster needs save 147 TB data. If one region has 512M or 1 GB, It will be 300 K regions or 147K regions.
>>> In the future, If store several PB, It will more regions.
>> 
>> You might want to up the size of your regions to 4G (FB run w/ big
>> regions IIUC).
>> 
>> Do you want to put up a heap dump for me to take a look at.  That'd be
>> easier than me figuring time to try and replicate your scenario.
>> 
>> Thanks Gao,
>> St.Ack