You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2018/03/06 07:06:11 UTC

[GitHub] merlimat opened a new pull request #1345: Cleanup failed connection from background thread

merlimat opened a new pull request #1345: Cleanup failed connection from background thread
URL: https://github.com/apache/incubator-pulsar/pull/1345
 
 
   This is a fix for #1138
   
   ### Motivation
   
   There is a race condition when shutting down the client instance while a reconnection is happening. 
   
   The symptom is that the `client.close()` operation is stuck due to a deadlock.
   
   This is the timer thread: 
   
   ```
   
   "pulsar-timer-210-1" #889 prio=5 os_prio=0 tid=0x00007f52bc027000 nid=0x25a6 runnable [0x00007f51ef15b000]
      java.lang.Thread.State: RUNNABLE
   	at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1172)
   	at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1546)
   	at org.apache.pulsar.client.impl.ConnectionPool.cleanupConnection(ConnectionPool.java:291)
   	at org.apache.pulsar.client.impl.ConnectionPool.lambda$createConnection$6(ConnectionPool.java:202)
   	at org.apache.pulsar.client.impl.ConnectionPool$$Lambda$178/536662449.apply(Unknown Source)
   	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
   	at java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
   	at java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
   	at org.apache.pulsar.client.impl.ConnectionPool.createConnection(ConnectionPool.java:199)
   	at org.apache.pulsar.client.impl.ConnectionPool.lambda$getConnection$1(ConnectionPool.java:143)
   	at org.apache.pulsar.client.impl.ConnectionPool$$Lambda$171/814839376.apply(Unknown Source)
   	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   	- locked <0x00000000dca51c30> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
   	at org.apache.pulsar.client.impl.ConnectionPool.getConnection(ConnectionPool.java:143)
   	at org.apache.pulsar.client.impl.ConnectionPool.getConnection(ConnectionPool.java:113)
   	at org.apache.pulsar.client.impl.BinaryProtoLookupService.findBroker(BinaryProtoLookupService.java:95)
   	at org.apache.pulsar.client.impl.BinaryProtoLookupService.getBroker(BinaryProtoLookupService.java:80)
   	at org.apache.pulsar.client.impl.PulsarClientImpl.getConnection(PulsarClientImpl.java:631)
   	at org.apache.pulsar.client.impl.PulsarClientImpl$$EnhancerByMockitoWithCGLIB$$ed5c6031.CGLIB$getConnection$11(<generated>)
   	at org.apache.pulsar.client.impl.PulsarClientImpl$$EnhancerByMockitoWithCGLIB$$ed5c6031$$FastClassByMockitoWithCGLIB$$f1654aef.invoke(<generated>)
   	at org.mockito.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:216)
   	at org.powermock.api.mockito.repackaged.DelegatingMockitoMethodProxy.invokeSuper(DelegatingMockitoMethodProxy.java:20)
   	at org.mockito.internal.invocation.realmethod.DefaultRealMethod.invoke(DefaultRealMethod.java:21)
   	at org.mockito.internal.invocation.realmethod.CleanTraceRealMethod.invoke(CleanTraceRealMethod.java:30)
   	at org.mockito.internal.invocation.InvocationImpl.callRealMethod(InvocationImpl.java:112)
   	at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:41)
   	at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:93)
   	at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
   	at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:38)
   	at org.powermock.api.mockito.repackaged.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:60)
   	at org.apache.pulsar.client.impl.PulsarClientImpl$$EnhancerByMockitoWithCGLIB$$ed5c6031.getConnection(<generated>)
   	at org.apache.pulsar.client.impl.HandlerBase.grabCnx(HandlerBase.java:75)
   	at org.apache.pulsar.client.impl.HandlerBase.lambda$reconnectLater$0(HandlerBase.java:108)
   	at org.apache.pulsar.client.impl.HandlerBase$$Lambda$332/279702165.run(Unknown Source)
   	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
   	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
   	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   
   And this is the thread closing the client:
   
   ```
   "ForkJoinPool.commonPool-worker-1" #892 daemon prio=5 os_prio=0 tid=0x00007f52f8a06800 nid=0x2472 in Object.wait() [0x00007f523af45000]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
   	at java.lang.Object.wait(Native Method)
   	at java.lang.Thread.join(Thread.java:1260)
   	- locked <0x000000008629ff28> (a io.netty.util.concurrent.FastThreadLocalThread)
   	at io.netty.util.HashedWheelTimer.stop(HashedWheelTimer.java:380)
   	at org.apache.pulsar.client.impl.PulsarClientImpl.shutdown(PulsarClientImpl.java:620)
   ```
   
   ### Modifications
   
   Do not cleanup connection from same thread where the connection open operation is failed,  because that might be the timer thread which is already owning a mutex on the hashmap. Concurrent hash map is non-blocking most of the time but it resort to mutex in some occasions. In this case, since we're doing the `computeIfAbsent()` from same thread, it is holding the mutex.
   
   ### Result
   
   No deadlock on client close.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services