You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Sean Busbey (JIRA)" <ji...@apache.org> on 2018/12/07 01:25:00 UTC
[jira] [Resolved] (HBASE-21551) Memory leak when use scan with STREAM at server side

     [ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Busbey resolved HBASE-21551.
---------------------------------
      Resolution: Fixed
    Release Note: 
<!-- markdown -->
### Summary
HBase clusters will experience Region Server failures due to out of memory errors due to a leak given any of the following:

* User initiates Scan operations set to use the STREAM reading type
* User initiates Scan operations set to use the default reading type that read more than 4 * the block size of column families involved in the scan (e.g. by default 4*64KiB)
* Compactions run

### Root cause

When there are long running scans the Region Server process attempts to optimize access by using a different API geared towards sequential access. Due to an error in HBASE-20704 for HBase 2.0+ the Region Server fails to release related resources when those scans finish. That same optimization path is always used for the HBase internal file compaction process.

### Workaround

Impact for this error can be minimized by setting the config value “hbase.storescanner.pread.max.bytes” to MAX_INT to avoid the optimization for default user scans. Clients should also be checked to ensure they do not pass the STREAM read type to the Scan API. This will have a severe impact on performance for long scans.

Compactions always use this sequential optimized reading mechanism so downstream users will need to periodically restart Region Server roles after compactions have happened.

> Memory leak when use scan with STREAM at server side
> ----------------------------------------------------
>
>                 Key: HBASE-21551
>                 URL: https://issues.apache.org/jira/browse/HBASE-21551
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Blocker
>             Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
>
>         Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, HBASE-21551.v3.patch, heap-dump.jpg
>
>
> We open the RegionServerScanner with STREAM as following: 
> {code}
> RegionScannerImpl#initializeScanners
>       |---> HStore#getScanner
>                     |----------> StoreScanner()
>                                         |-------> StoreFileScanner#getScannersForStoreFiles
>                                                           |------> HStoreFile#getStreamScanner      #1
> {code}
> In #1,  we put the StoreFileReader into  a concurrent hash map streamReaders, but not remove the StreamReader from streamReaders until closing the store file. 
> So if we  scan with stream with  so many times, the streamReaders hash map will be exploded.   we can see the heap dump in the attached heap-dump.jpg. 
> I found this bug, because when i benchmark the scan performance by using YCSB in a cluster (heap size of RS is 50g),  the Rs was easy to occur a long time full gc ( ~ 110 sec)....



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)