You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/19 07:12:32 UTC

[GitHub] [pulsar] hadesy opened a new issue #8625: ZooKeeperDataCache TimeoutException

hadesy opened a new issue #8625:
URL: https://github.com/apache/pulsar/issues/8625


   **Describe the bug**
   
   when we grant permission, exception occurs on the broker:
   ```
   [ForkJoinPool.commonPool-worker-1] ERROR org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://xxx/xxx/xxxx-partition-1] Policies update failed org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException, scheduled retry in 60 seconds
   java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException
   	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:661) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplicationAndRetryOnFailure(PersistentTopic.java:1027) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:1844) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$null$59(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_252]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$null$60(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$onUpdate$61(BrokerService.java:1532) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:1528) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:171) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$4(ZooKeeperDataCache.java:138) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
   	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$18(ZooKeeperCache.java:385) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
   	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:365) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
   	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_252]
   Caused by: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1081) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	... 29 more
   Caused by: java.util.concurrent.TimeoutException
   	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252]
   	at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1077) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	... 29 more
   ```
   ```
   WARN  org.apache.pulsar.broker.service.Producer - [xxx] Get unexpected error while autorizing [persistent://xx/xxx/xxx-partition-0]  null
   java.util.concurrent.TimeoutException: null
   	at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252]
   	at org.apache.pulsar.broker.authorization.AuthorizationService.canProduce(AuthorizationService.java:226) ~[org.apache.pulsar-pulsar-broker-common-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.Producer.checkPermissions(Producer.java:593) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$onPoliciesUpdate$62(PersistentTopic.java:1830) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707) ~[?:1.8.0_252]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:1829) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$null$59(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_252]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$null$60(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
   	at org.apache.pulsar.broker.service.BrokerService.lambda$onUpdate$61(BrokerService.java:1532) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:1528) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:171) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
   	at org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$4(ZooKeeperDataCache.java:138) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
   	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$18(ZooKeeperCache.java:385) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
   	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
   	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:365) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
   	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_252]
   	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_252]
   ```
   and all producer and consumer also throw exceptions:
   ```
   4:48:10,988 WARN  BinaryProtoLookupService - [persistent://xxx/xx/xxx] failed to get Partitioned metadata : org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
   java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
   	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_212]
   	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_212]
   	at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) ~[?:1.8.0_212]
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_212]
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_212]
   	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) [?:1.8.0_212]
   	at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:497) [maxwell.jar:?]
   	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:138) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [maxwell.jar:?]
   	at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [maxwell.jar:?]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
   Caused by: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
   	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:972) ~[maxwell.jar:?]
   	at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:498) ~[maxwell.jar:?]
   	... 20 more
   ```
   
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Create a standard pulsar cluster in k8s
   2. Create 20 partitioned topics, and set the number of partitions to 2
   3. Randomly grant permissions  to these topics
   4. See error
   
   **Expected behavior**
   No error
   
   **Additional context**
   we try [#8304](https://github.com/apache/pulsar/pull/8304) ,but it doesn't work
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hadesy commented on issue #8625: ZooKeeperDataCache TimeoutException

Posted by GitBox <gi...@apache.org>.
hadesy commented on issue #8625:
URL: https://github.com/apache/pulsar/issues/8625#issuecomment-730178805


   and we can reproduce this problem in version 2.6.1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] alexku7 commented on issue #8625: ZooKeeperDataCache TimeoutException

Posted by GitBox <gi...@apache.org>.
alexku7 commented on issue #8625:
URL: https://github.com/apache/pulsar/issues/8625#issuecomment-731618031


   I thin this issue is related to this one https://github.com/apache/pulsar/issues/8419
   
   We see the same behavior 
   
   Currently in order to prevent this issue we simply insert a sleep of 500 ms between each topics set permission request


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org