You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Anoop Sam John (Jira)" <ji...@apache.org> on 2021/07/21 12:09:00 UTC
[jira] [Resolved] (HBASE-26088) conn.getBufferedMutator(tableName) leaks thread executors and other problems

     [ https://issues.apache.org/jira/browse/HBASE-26088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anoop Sam John resolved HBASE-26088.
------------------------------------
    Fix Version/s: 2.3.6
     Hadoop Flags: Reviewed
       Resolution: Fixed

Pushed to branch-2.3, branch-2.4 and branch-2.
Thanks for the patch [~shahrs87].
Thanks for the great find [~whitney13]

> conn.getBufferedMutator(tableName) leaks thread executors and other problems
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-26088
>                 URL: https://issues.apache.org/jira/browse/HBASE-26088
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.4.13, 2.4.4
>            Reporter: Whitney Jackson
>            Assignee: Rushabh Shah
>            Priority: Critical
>             Fix For: 2.5.0, 2.3.6, 2.4.5
>
>
> TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client 2.4.4 and doesn't match documented behavior in 1.4.13.
> To work around the problems until fixed do this:
> {code:java}
> var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
> var params = new BufferedMutatorParams(tableName);
> params.pool(mySingletonPool);
> var myMutator = conn.getBufferedMutator(params);
> {code}
> And avoid code like this:
> {code:java}
> var myMutator = conn.getBufferedMutator(tableName);
> {code}
> The full story:
> My application started leaking threads after upgrading from hbase client 1.4.13 to 2.4.4. So much so that after less than a minute of runtime more that 30k threads are leaked and all available virtual memory on the box (> 50 GB) is consumed. Other processes on the box start crashing with memory allocation errors. Even running {{ls}} at the shell fails with OS resource allocation failures.
> A thread dump after just a few seconds of runtime shows thousands of threads like this:
> {code:java}
> "htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000]
>  java.lang.Thread.State: TIMED_WAITING (parking)
>  at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>  - parking to wait for <0x00000007e7cd6188> (a java.util.concurrent.SynchronousQueue$TransferStack)
>  at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
>  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@11.0.6/SynchronousQueue.java:462)
>  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@11.0.6/SynchronousQueue.java:361)
>  at java.util.concurrent.SynchronousQueue.poll(java.base@11.0.6/SynchronousQueue.java:937)
>  at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1053)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
>  at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> {code}
>  
> Note: All the threads are labeled {{htable-pool-0}}. That suggests we're leaking thread executors not just threads. The {{htable-pool}} part indicates the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only part of my code that interacts with that is a call to {{conn.getBufferedMutator(tableName)}}.
>  
> Looking at the hbase client code shows a few problems:
> 1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for {{conn.getBufferedMutator(tableName)}} which says:
> {quote}This BufferedMutator will use the Connection's ExecutorService.
> {quote}
> That suggests some singleton thread executor is being used which is not the case.
>  
> 2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every {{BufferedMutator}}. That's probably not what you want but you likely won't notice. I didn't. It's a code path I hadn't profiled much.
>  
> 3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every {{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up after the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} carries with it one thread which hangs around until a timeout value which defaults to 60 seconds.
> My application creates one {{BufferedMutator}} for every incoming stream and there are lots of streams, some of them are short lived so my code leaks threads fast under 2.4.4.
> Here's the part where a new executor is created for every {{BufferedMutator}} (it's similar for 1.4.13):
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420]
>  
> The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic added here:
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104]
> That might be ok if {{pool}} was being initialized there but in the {{conn.getBufferedMutator(tableName)}} code path it's not. {{pool}} is initialized in {{conn.getBufferedMutator}} itself so the executor cleanup code never runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)