You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/02/08 17:37:41 UTC

[GitHub] [pulsar] hadesy opened a new pull request #14176: [metadata-store] Fixed deadlock on metadata listener

hadesy opened a new pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176


   ### Motivation
   In Pulsar 2.9.1,updating namespace policy can cause deadlock if authorization is enabled,
   because we can't submit tasks in the thread of SingleThreadPoolExecutor,see test case.
   https://github.com/apache/pulsar/blob/9165aed72ceb5511c575aae96d6002acec01e59b/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L79
   
   Log
   ```
   # Metadata listener was triggered at the same time.
   level:INFO
   logger:org.apache.pulsar.broker.resourcegroup.ResourceGroupNamespaceConfigListener
   thread:metadata-store-6-1
   time:2022-02-08T06:26:43,527+0000
   message:Metadata store notification: Path /admin/policies/xxx/xxx, Type Modified
   
   # First partition succeeded.
   level:INFO
   logger:org.apache.pulsar.broker.admin.impl.PersistentTopicsBase
   thread:pulsar-web-36-8
   time:2022-02-08T06:26:43,528+0000
   message:[admin] Successfully granted access for role xxxxx: [produce] - topic persistent://xxx/xxx/xxx-partition-0
   
   # After 30 seconds(org.apache.pulsar.broker.ServiceConfiguration#zooKeeperOperationTimeoutSeconds), second partition timeout.
   level:ERROR
   logger:org.apache.pulsar.broker.admin.impl.PersistentTopicsBase
   thread:pulsar-web-36-8
   time:2022-02-08T06:27:13,528+0000
   message:[admin] Failed to grant permissions for topic persistent://xxx/xxx/xxxxx-partition-1
   org.apache.pulsar.metadata.api.MetadataStoreException: Failed to set data for /admin/policies/xxx/xxx
   	at org.apache.pulsar.broker.resources.BaseResources.set(BaseResources.java:106) ~[org.apache.pulsar-pulsar-broker-common-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.resources.NamespaceResources.setPolicies(NamespaceResources.java:113) ~[org.apache.pulsar-pulsar-broker-common-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.grantPermissions(PersistentTopicsBase.java:262) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalGrantPermissionsOnTopic(PersistentTopicsBase.java:291) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.admin.v2.PersistentTopics.grantPermissionsOnTopic(PersistentTopics.java:174) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
   	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
   	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
   	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
   	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
   	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
   	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
   	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
   	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
   	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) ~[org.glassfish.jersey.core-jersey-common-2.34.jar:?]
   	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680) ~[org.glassfish.jersey.core-jersey-server-2.34.jar:?]
   	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
   	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
   	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
   	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
   	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) ~[org.glassfish.jersey.containers-jersey-container-servlet-core-2.34.jar:?]
   	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.apache.pulsar.broker.web.ResponseHandlerFilter.doFilter(ResponseHandlerFilter.java:67) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.apache.pulsar.broker.web.AuthenticationFilter.doFilter(AuthenticationFilter.java:84) ~[org.apache.pulsar-pulsar-broker-common-2.9.1.jar:2.9.1]
   	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) ~[org.eclipse.jetty-jetty-servlet-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:179) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) ~[org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) [org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) [org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [org.eclipse.jetty-jetty-server-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [org.eclipse.jetty-jetty-io-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [org.eclipse.jetty-jetty-io-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [org.eclipse.jetty-jetty-io-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) [org.eclipse.jetty-jetty-util-9.4.43.v20210629.jar:9.4.43.v20210629]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   Caused by: java.util.concurrent.TimeoutException
   	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886) ~[?:?]
   	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021) ~[?:?]
   	at org.apache.pulsar.broker.resources.BaseResources.set(BaseResources.java:101) ~[org.apache.pulsar-pulsar-broker-common-2.9.1.jar:2.9.1]
   	... 70 more
   	
   # Causes all consumers/producers throw timeout exceptions.
   level:WARN
   logger:org.apache.pulsar.broker.service.Producer
   thread:metadata-store-6-1
   time:2022-02-08T06:27:13,550+0000
   message:- [xxxx] Get unexpected error while autorizing [persistent://biz/xxx/xxxx-partition-1]  null
   java.util.concurrent.TimeoutException: null
   	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886) ~[?:?]
   	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021) ~[?:?]
   	at org.apache.pulsar.broker.authorization.AuthorizationService.canProduce(AuthorizationService.java:230) ~[org.apache.pulsar-pulsar-broker-common-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.service.Producer.checkPermissions(Producer.java:654) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$onPoliciesUpdate$106(PersistentTopic.java:2419) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) ~[?:?]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:2418) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$null$84(BrokerService.java:1830) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at java.util.Optional.ifPresent(Optional.java:183) ~[?:?]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$null$85(BrokerService.java:1830) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753) ~[?:?]
   	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731) ~[?:?]
   	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108) ~[?:?]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$null$86(BrokerService.java:1825) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387) ~[org.apache.pulsar-pulsar-common-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159) ~[org.apache.pulsar-pulsar-common-2.9.1.jar:2.9.1]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$handlePoliciesUpdates$87(BrokerService.java:1821) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:714) ~[?:?]
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
   	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?]
   	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$null$7(ZKMetadataStore.java:139) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   
   # After 5 minutes(org.apache.pulsar.metadata.cache.impl.MetadataCacheImpl#CACHE_REFRESH_TIME_MILLIS), all cache expired and zk callback still stuck.
   level:ERROR
   logger:rg.apache.bookkeeper.mledger.impl.ManagedLedgerImpl
   thread:BookKeeperClientWorker-OrderedExecutor-2-0
   time:2022-02-08T06:33:24,517+0000
   message:- [xxx/xxx/xxx/xxxx-partition-0] Error creating ledger rc=-23 Bookie operation timeout
   ```
   
   Jstack
   ```
   "metadata-store-6-1" Id=58 TIMED_WAITING on java.util.concurrent.CompletableFuture$Signaller@571e7c01
       at java.base@11.0.11/jdk.internal.misc.Unsafe.park(Native Method)
       -  waiting on java.util.concurrent.CompletableFuture$Signaller@571e7c01
       at java.base@11.0.11/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1798)
       at java.base@11.0.11/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1868)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
       at app//org.apache.pulsar.broker.authorization.AuthorizationService.canProduce(AuthorizationService.java:230)
       at app//org.apache.pulsar.broker.service.Producer.checkPermissions(Producer.java:654)
       at app//org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$onPoliciesUpdate$106(PersistentTopic.java:2419)
       at app//org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1298/0x000000084081dc40.accept(Unknown Source)
       at java.base@11.0.11/java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772)
       at app//org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:2418)
       at app//org.apache.pulsar.broker.service.BrokerService.lambda$null$84(BrokerService.java:1830)
       at app//org.apache.pulsar.broker.service.BrokerService$$Lambda$1297/0x000000084081d840.accept(Unknown Source)
       at java.base@11.0.11/java.util.Optional.ifPresent(Optional.java:183)
       at app//org.apache.pulsar.broker.service.BrokerService.lambda$null$85(BrokerService.java:1830)
       at app//org.apache.pulsar.broker.service.BrokerService$$Lambda$1296/0x000000084081d440.accept(Unknown Source)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108)
       at app//org.apache.pulsar.broker.service.BrokerService.lambda$null$86(BrokerService.java:1825)
       at app//org.apache.pulsar.broker.service.BrokerService$$Lambda$1295/0x000000084081d040.accept(Unknown Source)
       at app//org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387)
       at app//org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159)
       at app//org.apache.pulsar.broker.service.BrokerService.lambda$handlePoliciesUpdates$87(BrokerService.java:1821)
       at app//org.apache.pulsar.broker.service.BrokerService$$Lambda$1294/0x000000084081cc40.accept(Unknown Source)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:714)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
       at java.base@11.0.11/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
       at app//org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$null$7(ZKMetadataStore.java:139)
       at app//org.apache.pulsar.metadata.impl.ZKMetadataStore$$Lambda$196/0x000000084033c440.run(Unknown Source)
       at java.base@11.0.11/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
       at java.base@11.0.11/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
       at app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
       at java.base@11.0.11/java.lang.Thread.run(Thread.java:829)
   ```
   
   ### Modifications
   
   add new ThreadPoolExecutor for listener


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#discussion_r801957011



##########
File path: pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -77,6 +78,8 @@
     protected AbstractMetadataStore() {
         this.executor = Executors
                 .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store"));
+        this.listenerExecutor = Executors
+                .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store-listener"));

Review comment:
       In particular, this deadlock should already be fixed by #13885




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hadesy closed pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
hadesy closed pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on a change in pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on a change in pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#discussion_r801966315



##########
File path: pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -77,6 +78,8 @@
     protected AbstractMetadataStore() {
         this.executor = Executors
                 .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store"));
+        this.listenerExecutor = Executors
+                .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store-listener"));

Review comment:
       Thanks for that reference Matteo. Is there any value in adding an extra thread just in case a blocking method is inadvertently added back onto this calling thread? The risk seems high, considering we'd be blocking the zk notification handling thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#discussion_r801954176



##########
File path: pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -77,6 +78,8 @@
     protected AbstractMetadataStore() {
         this.executor = Executors
                 .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store"));
+        this.listenerExecutor = Executors
+                .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store-listener"));

Review comment:
       I'm not sure this is the right approach because it just moves the problem to a different thread. Instead we should make sure that no blocking calls are happening from the callbacks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] github-actions[bot] commented on pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#issuecomment-1072916811


   The pr had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] github-actions[bot] commented on pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#issuecomment-1032882900


   @hadesy:Thanks for your contribution. For this PR, do we need to update docs?
   (The [PR template contains info about doc](https://github.com/apache/pulsar/blob/master/.github/PULL_REQUEST_TEMPLATE.md#documentation), which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#issuecomment-1033491553


   @hadesy
   thanks for your detailed explanation.
   This patch will mitigate the problem, and we could commit it in order to take time before eventually finding other problems.
   The downside is that we will hide other problems if we commit this patch.
   
   we should prevent any blocking call to the metadata service on the metadata executor (IIUC here we are no more talking about the ZK client thread, but our own Metadata listener thread). 
   
   We could add some check about the fact that the "current thread" is not the "metadata-store" thread in blocking calls to the MetadataStore. IIRC in Netty they are doing so in order to prevent this kind of errors.
   
   Overall I am leaning towards accepting this patch as we cannot know all of the problems like the one you found here (and we should fix the specific case)
   
   Probably a broader discussion on dev@ mailing is better, as not everybody is following GH notifications about PRs and this topic is very important


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hadesy commented on pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
hadesy commented on pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#issuecomment-1033483339


   **update deadlock details**
   I think it is risky to operate in ZK callback thread, just like the test case, it is necessary to use different thread pools
   ![1](https://user-images.githubusercontent.com/6346047/153154130-8b1cd242-81db-4327-b56f-3032ca3ae48d.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#discussion_r802071733



##########
File path: pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -77,6 +78,8 @@
     protected AbstractMetadataStore() {
         this.executor = Executors
                 .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store"));
+        this.listenerExecutor = Executors
+                .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store-listener"));

Review comment:
       In most cases, I think the added thread is just moving the problem somewhere else but will not avoid the deadlock scenarios.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Jason918 commented on a change in pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
Jason918 commented on a change in pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#discussion_r802234696



##########
File path: pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -192,7 +195,7 @@ public void registerListener(Consumer<Notification> listener) {
                 });
 
                 return null;
-            }, executor);
+            }, listenerExecutor);

Review comment:
       I think this would introduce a lot of wired race conditions. Some of our current implementations relay on this single thread to keep data consistence and thread safe, like the `existsCache` and `childrenCache`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on a change in pull request #14176: [metadata-store] Fixed deadlock on metadata listener

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #14176:
URL: https://github.com/apache/pulsar/pull/14176#discussion_r802388359



##########
File path: pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -77,6 +78,8 @@
     protected AbstractMetadataStore() {
         this.executor = Executors
                 .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store"));
+        this.listenerExecutor = Executors
+                .newSingleThreadScheduledExecutor(new DefaultThreadFactory("metadata-store-listener"));

Review comment:
       I agree with @merlimat 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org