You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2018/03/06 07:06:11 UTC
[GitHub] merlimat opened a new pull request #1345: Cleanup failed connection from background thread
merlimat opened a new pull request #1345: Cleanup failed connection from background thread
URL: https://github.com/apache/incubator-pulsar/pull/1345
This is a fix for #1138
### Motivation
There is a race condition when shutting down the client instance while a reconnection is happening.
The symptom is that the `client.close()` operation is stuck due to a deadlock.
This is the timer thread:
```
"pulsar-timer-210-1" #889 prio=5 os_prio=0 tid=0x00007f52bc027000 nid=0x25a6 runnable [0x00007f51ef15b000]
java.lang.Thread.State: RUNNABLE
at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1172)
at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1546)
at org.apache.pulsar.client.impl.ConnectionPool.cleanupConnection(ConnectionPool.java:291)
at org.apache.pulsar.client.impl.ConnectionPool.lambda$createConnection$6(ConnectionPool.java:202)
at org.apache.pulsar.client.impl.ConnectionPool$$Lambda$178/536662449.apply(Unknown Source)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
at java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
at org.apache.pulsar.client.impl.ConnectionPool.createConnection(ConnectionPool.java:199)
at org.apache.pulsar.client.impl.ConnectionPool.lambda$getConnection$1(ConnectionPool.java:143)
at org.apache.pulsar.client.impl.ConnectionPool$$Lambda$171/814839376.apply(Unknown Source)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
- locked <0x00000000dca51c30> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
at org.apache.pulsar.client.impl.ConnectionPool.getConnection(ConnectionPool.java:143)
at org.apache.pulsar.client.impl.ConnectionPool.getConnection(ConnectionPool.java:113)
at org.apache.pulsar.client.impl.BinaryProtoLookupService.findBroker(BinaryProtoLookupService.java:95)
at org.apache.pulsar.client.impl.BinaryProtoLookupService.getBroker(BinaryProtoLookupService.java:80)
at org.apache.pulsar.client.impl.PulsarClientImpl.getConnection(PulsarClientImpl.java:631)
at org.apache.pulsar.client.impl.PulsarClientImpl$$EnhancerByMockitoWithCGLIB$$ed5c6031.CGLIB$getConnection$11(<generated>)
at org.apache.pulsar.client.impl.PulsarClientImpl$$EnhancerByMockitoWithCGLIB$$ed5c6031$$FastClassByMockitoWithCGLIB$$f1654aef.invoke(<generated>)
at org.mockito.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:216)
at org.powermock.api.mockito.repackaged.DelegatingMockitoMethodProxy.invokeSuper(DelegatingMockitoMethodProxy.java:20)
at org.mockito.internal.invocation.realmethod.DefaultRealMethod.invoke(DefaultRealMethod.java:21)
at org.mockito.internal.invocation.realmethod.CleanTraceRealMethod.invoke(CleanTraceRealMethod.java:30)
at org.mockito.internal.invocation.InvocationImpl.callRealMethod(InvocationImpl.java:112)
at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:41)
at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:93)
at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:38)
at org.powermock.api.mockito.repackaged.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:60)
at org.apache.pulsar.client.impl.PulsarClientImpl$$EnhancerByMockitoWithCGLIB$$ed5c6031.getConnection(<generated>)
at org.apache.pulsar.client.impl.HandlerBase.grabCnx(HandlerBase.java:75)
at org.apache.pulsar.client.impl.HandlerBase.lambda$reconnectLater$0(HandlerBase.java:108)
at org.apache.pulsar.client.impl.HandlerBase$$Lambda$332/279702165.run(Unknown Source)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:663)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:738)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:466)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
```
And this is the thread closing the client:
```
"ForkJoinPool.commonPool-worker-1" #892 daemon prio=5 os_prio=0 tid=0x00007f52f8a06800 nid=0x2472 in Object.wait() [0x00007f523af45000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1260)
- locked <0x000000008629ff28> (a io.netty.util.concurrent.FastThreadLocalThread)
at io.netty.util.HashedWheelTimer.stop(HashedWheelTimer.java:380)
at org.apache.pulsar.client.impl.PulsarClientImpl.shutdown(PulsarClientImpl.java:620)
```
### Modifications
Do not cleanup connection from same thread where the connection open operation is failed, because that might be the timer thread which is already owning a mutex on the hashmap. Concurrent hash map is non-blocking most of the time but it resort to mutex in some occasions. In this case, since we're doing the `computeIfAbsent()` from same thread, it is holding the mutex.
### Result
No deadlock on client close.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services