You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zephyr Guo (JIRA)" <ji...@apache.org> on 2018/11/05 11:25:00 UTC
[jira] [Updated] (HBASE-21436) Getting OOM frequently if hold many
regions
[ https://issues.apache.org/jira/browse/HBASE-21436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zephyr Guo updated HBASE-21436:
-------------------------------
Attachment: HBASE-21436-UT.patch
> Getting OOM frequently if hold many regions
> --------------------------------------------
>
> Key: HBASE-21436
> URL: https://issues.apache.org/jira/browse/HBASE-21436
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 3.0.0, 1.4.8, 2.0.2
> Reporter: Zephyr Guo
> Priority: Major
> Attachments: HBASE-21436-UT.patch
>
>
> Recently, some feedback reached me from a customer which complains about NotServingRegionException thrown out at intevals. I examined his cluster and found there were quite a lot of OOM logs there but metric "readDataPerSecondKB/writeDataPerSecondKB" is in quite low level. In this customer's case, each RS has 3k regions and heap size of 4G. I dumped heap when OOM took place, and found that a lot of Chunk objects (counts as much as 1700) was there.
> Eventually, piecing all these evidences together, I came to the conclusion that: 1. The root cause is that global flush is triggered by size of all memstores, rather than size of all chunks. 2. A chunk is always allocated for each region, even we only write a few data to the region.
> And in this case, a total of 3.4G memory was consumed by 1700 chunks, although throughput is very low.
> Although 3K regions is too much for RS with 4G memory, it is still wise to improve RS stability in such scenario (In fact, most customers buy a small size HBase on cloud side).
>
> I provide a patch (only contain UT) to reproduce this case (just send a batch).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)