You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/10/23 01:25:11 UTC

[GitHub] [pulsar] pkumar-singh opened a new pull request #8351: PLSR-1240 upgrade GRPC to 1.31 to avoid deadlock

pkumar-singh opened a new pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351


   Motivation
   Current version of gRPC is deadlocks as explained below.
   This deadlock is currently only appearing in state store but might affect pulsar in general as well.
   Found one Java-level deadlock:
   =============================
   "io-read-scheduler-OrderedScheduler-0-0":
   waiting to lock monitor 0x00007f10e804e100 (object 0x00000000e3b8e0a8, a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream),
   which is held by "grpc-default-executor-17"
   "grpc-default-executor-17":
   waiting to lock monitor 0x00007f107000ca00 (object 0x00000000e3b8e1d8, a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream),
   which is held by "io-read-scheduler-OrderedScheduler-0-0"
   Java stack information for the threads listed above:
   ===================================================
   "io-read-scheduler-OrderedScheduler-0-0":
   at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.request(InProcessTransport.java:639)
   waiting to lock <0x00000000e3b8e0a8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
   at io.grpc.internal.ForwardingClientStream.request(ForwardingClientStream.java:32)
   at io.grpc.internal.ClientCallImpl.request(ClientCallImpl.java:369)
   at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
   at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
   at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
   at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
   at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
   at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
   at io.grpc.PartialForwardingClientCall.request(PartialForwardingClientCall.java:34)
   at io.grpc.ForwardingClientCall.request(ForwardingClientCall.java:22)
   at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.request(ForwardingClientCall.java:44)
   at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy.onMessage(ProxyCall.java:112)
   locked <0x00000000e3b8dfd8> (a org.apache.bookkeeper.common.grpc.proxy.ProxyCall$ResponseProxy)
   at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
   at io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
   at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
   at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
   at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
   at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.messagesAvailable(ClientCallImpl.java:536)
   at io.grpc.internal.ForwardingClientStreamListener.messagesAvailable(ForwardingClientStreamListener.java:44)
   at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.writeMessage(InProcessTransport.java:455)
   locked <0x00000000e3b8e1d8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
   at io.grpc.internal.ServerCallImpl.sendMessage(ServerCallImpl.java:139)
   at io.grpc.ForwardingServerCall.sendMessage(ForwardingServerCall.java:32)
   at org.apache.bookkeeper.common.grpc.stats.MonitoringServerCall.sendMessage(MonitoringServerCall.java:47)
   at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:344)
   at org.apache.bookkeeper.stream.storage.impl.grpc.handler.ResponseHandler.accept(ResponseHandler.java:49)
   at org.apache.bookkeeper.stream.storage.impl.grpc.handler.ResponseHandler.accept(ResponseHandler.java:29)
   at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.8/CompletableFuture.java:859)
   at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.8/CompletableFuture.java:837)
   at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.8/CompletableFuture.java:506)
   at java.util.concurrent.CompletableFuture.complete(java.base@11.0.8/CompletableFuture.java:2073)
   at org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:472)
   at org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal$$Lambda$294/0x00000008404b7040.run(Unknown Source)
   at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.8/Executors.java:515)
   at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
   at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
   at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
   at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.8/Executors.java:515)
   at java.util.concurrent.FutureTask.run(java.base@11.0.8/FutureTask.java:264)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.8/ScheduledThreadPoolExecutor.java:304)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628)
   at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   at java.lang.Thread.run(java.base@11.0.8/Thread.java:834)
   "grpc-default-executor-17":
   at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream.isReady(InProcessTransport.java:466)
   waiting to lock <0x00000000e3b8e1d8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessServerStream)
   at io.grpc.internal.ServerCallImpl.isReady(ServerCallImpl.java:167)
   at io.grpc.PartialForwardingServerCall.isReady(PartialForwardingServerCall.java:43)
   at io.grpc.ForwardingServerCall.isReady(ForwardingServerCall.java:22)
   at io.grpc.ForwardingServerCall$SimpleForwardingServerCall.isReady(ForwardingServerCall.java:39)
   at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:173)
   at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
   at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
   at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
   at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
   at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
   at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49)
   at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener.halfClosed(ServerImpl.java:722)
   at io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream.halfClose(InProcessTransport.java:745)
   locked <0x00000000e3b8e0a8> (a io.grpc.inprocess.InProcessTransport$InProcessStream$InProcessClientStream)
   at io.grpc.internal.ForwardingClientStream.halfClose(ForwardingClientStream.java:67)
   at io.grpc.internal.ClientCallImpl.halfClose(ClientCallImpl.java:408)
   at io.grpc.PartialForwardingClientCall.halfClose(PartialForwardingClientCall.java:44)
   at io.grpc.ForwardingClientCall.halfClose(ForwardingClientCall.java:22)
   at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.halfClose(ForwardingClientCall.java:44)
   at io.grpc.PartialForwardingClientCall.halfClose(PartialForwardingClientCall.java:44)
   at io.grpc.ForwardingClientCall.halfClose(ForwardingClientCall.java:22)
   at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.halfClose(ForwardingClientCall.java:44)
   at io.grpc.PartialForwardingClientCall.halfClose(PartialForwardingClientCall.java:44)
   at io.grpc.ForwardingClientCall.halfClose(ForwardingClientCall.java:22)
   at io.grpc.ForwardingClientCall$SimpleForwardingClientCall.halfClose(ForwardingClientCall.java:44)
   at org.apache.bookkeeper.common.grpc.proxy.ProxyCall$RequestProxy.onHalfClose(ProxyCall.java:68)
   at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
   at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
   at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
   at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.8/ThreadPoolExecutor.java:1128)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.8/ThreadPoolExecutor.java:628)
   at java.lang.Thread.run(java.base@11.0.8/Thread.java:834)
   Found 1 deadlock.
   Solution
   Verified that with gRPC-1.31 issue is fixed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] pkumar-singh commented on pull request #8351: Upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
pkumar-singh commented on pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351#issuecomment-716705610


   Well, The context is. I was running bookkeeper table service and I encounter this deadlock. As  can be seen from the call stack. org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:472).  And I saw the grpc/grpc-java#3084 issue too while digging around.
   Natural question could be if deadlock is reported while running apache bookkeeper table service why upgrade pulsar. 
   Reason is: They all are running as same k8s deployment. 
   
   How do I confirm upgrade to 1.31 fixes the deadlock. Well, I upgraded to 1.31 in the deployment and deadlock never happened otherwise used to happen within 2 minutes.
   
   Besides everything, sooner or later gRPC have to be updated in Apache Pulsar as well. Apache Pulsar running with gRPC 1.18 does have a deadlock. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat merged pull request #8351: PLSR-1240 upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
merlimat merged pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] pkumar-singh removed a comment on pull request #8351: Upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
pkumar-singh removed a comment on pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351#issuecomment-716706956


   @lhotari 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #8351: Upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351#issuecomment-715497642


   @pkumar-singh @merlimat I have issued a PR to revert this grpc upgrade:Please see  #8363 for more details. The master branch is currently broken because of the grpc upgrade.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] pkumar-singh commented on pull request #8351: Upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
pkumar-singh commented on pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351#issuecomment-716706956


   @lhotari 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #8351: Upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351#issuecomment-715363630


   I opened https://github.com/apache/pulsar/pull/8361 since another PR job failed with this kind of exception:
   ```
   13:11:31.285 [main] ERROR org.apache.bookkeeper.common.component.AbstractLifecycleComponent - Failed to start Component: storage-service
   java.lang.NoSuchMethodError: io.grpc.internal.DnsNameResolverProvider.newNameResolver(Ljava/net/URI;Lio/grpc/Attributes;)Lio/grpc/internal/DnsNameResolver;
   	at org.apache.bookkeeper.common.resolver.ServiceNameResolverProvider.newNameResolver(ServiceNameResolverProvider.java:95) ~[org.apache.bookkeeper-stream-storage-java-client-4.10.0.jar:4.10.0]
   	at org.apache.bookkeeper.common.resolver.NameResolverProviderFactory.newNameResolver(NameResolverProviderFactory.java:45) ~[org.apache.bookkeeper-stream-storage-java-client-4.10.0.jar:4.10.0]
   	at io.grpc.NameResolver$Factory.newNameResolver(NameResolver.java:207) ~[io.grpc-grpc-api-1.31.0.jar:1.31.0]
   	at io.grpc.NameResolver$Factory.newNameResolver(NameResolver.java:235) ~[io.grpc-grpc-api-1.31.0.jar:1.31.0]
   	at io.grpc.internal.ManagedChannelImpl.getNameResolver(ManagedChannelImpl.java:701) ~[io.grpc-grpc-core-1.31.0.jar:1.31.0]
   	at io.grpc.internal.ManagedChannelImpl.<init>(ManagedChannelImpl.java:606) ~[io.grpc-grpc-core-1.31.0.jar:1.31.0]
   ```
   in https://github.com/apache/pulsar/runs/1297863577?check_suite_focus=true
   
   I'd assume that protoc-gen-grpc-java.version should match grpc.version . I also added a change in the PR to use the grpc-bom in dependencyManagement.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #8351: Upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351#issuecomment-716374674


   @pkumar-singh regarding the dead lock, did you find some reference elsewhere that GRPC 1.31 has a fix for some dead lock bug?
   
   I found an open issue https://github.com/grpc/grpc-java/issues/3084 which is not resolved. There the workaround seems to be to execute callbacks on a different thread (executor). This is the reason why I'm wondering why the upgrade to GRPC 1.31 fixes the issue. It would be useful to understand the full context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] pkumar-singh edited a comment on pull request #8351: Upgrade GRPC to 1.31 to avoid deadlock

Posted by GitBox <gi...@apache.org>.
pkumar-singh edited a comment on pull request #8351:
URL: https://github.com/apache/pulsar/pull/8351#issuecomment-716705610


   Well, The context is. I was running bookkeeper table service and I encounter this deadlock. As  can be seen from the call stack. org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:472).  And I saw the grpc/grpc-java#3084 issue too while digging around.
   Natural question could be if deadlock is reported while running apache bookkeeper table service why upgrade pulsar. 
   Reason is: They all are running as same k8s deployment. 
   
   How do I confirm upgrade to 1.31 fixes the deadlock. Well, I upgraded to 1.31 in the deployment and deadlock never happened otherwise used to happen within 2 minutes.
   
   Besides everything, sooner or later gRPC have to be updated in Apache Pulsar as well. Apache Pulsar running with gRPC 1.18 does have a deadlock. @lhotari 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org