You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zephyr Guo (JIRA)" <ji...@apache.org> on 2018/11/05 11:23:00 UTC

[jira] [Created] (HBASE-21436) Getting OOM frequently if hold many regions

Zephyr Guo created HBASE-21436:
----------------------------------

             Summary:  Getting OOM frequently if hold many regions
                 Key: HBASE-21436
                 URL: https://issues.apache.org/jira/browse/HBASE-21436
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 2.0.2, 1.4.8, 3.0.0
            Reporter: Zephyr Guo


Recently, some feedback reached me from a customer which complains about NotServingRegionException thrown out at intevals. I examined his cluster and found there were quite a lot of OOM logs there but metric "readDataPerSecondKB/writeDataPerSecondKB" is in quite low level. In this customer's case, each RS has 3k regions and heap size of 4G. I dumped heap when OOM took place, and found that a lot of Chunk objects (counts as much as 1700) was there.
 Eventually, piecing all these evidences together, I came to the conclusion that: 1. The root cause is that global flush is triggered by size of all memstores, rather than size of all chunks. 2. A chunk is always allocated for each region, even we only write a few data to the region.
 And in this case, a total of 3.4G memory was consumed by 1700 chunks, although throughput is very low.
Although 3K regions is too much for RS with 4G memory, it is still wise to improve RS stability in such scenario (In fact, most customers buy a small size HBase on cloud side).
 
I provide a patch (only contain UT) to reproduce this case (just send a batch).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)