You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Zheng Hu (Jira)" <ji...@apache.org> on 2019/08/26 02:05:00 UTC

[jira] [Resolved] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

     [ https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Hu resolved HBASE-22867.
------------------------------
    Hadoop Flags: Reviewed
    Release Note: Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit the spawn thread size and avoid  the master GC frequently.  The replacement is an internal implementation in CleanerChore,  so no config key change, the upstream users can just upgrade the hbase master without any other change.
            Tags: master
      Resolution: Fixed

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-22867
>                 URL: https://issues.apache.org/jira/browse/HBASE-22867
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Critical
>             Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>
>         Attachments: 191318.stack, 191318.stack.1, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint              [    9126         67            474    ]      [     1    28 86596    87   101    ]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)