You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/19 07:12:32 UTC
[GitHub] [pulsar] hadesy opened a new issue #8625: ZooKeeperDataCache TimeoutException
hadesy opened a new issue #8625:
URL: https://github.com/apache/pulsar/issues/8625
**Describe the bug**
when we grant permission, exception occurs on the broker:
```
[ForkJoinPool.commonPool-worker-1] ERROR org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://xxx/xxx/xxxx-partition-1] Policies update failed org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException, scheduled retry in 60 seconds
java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:661) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplicationAndRetryOnFailure(PersistentTopic.java:1027) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:1844) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.BrokerService.lambda$null$59(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_252]
at org.apache.pulsar.broker.service.BrokerService.lambda$null$60(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
at org.apache.pulsar.broker.service.BrokerService.lambda$onUpdate$61(BrokerService.java:1532) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:1528) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:171) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$4(ZooKeeperDataCache.java:138) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$18(ZooKeeperCache.java:385) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:365) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_252]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_252]
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_252]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_252]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_252]
Caused by: org.apache.pulsar.broker.service.BrokerServiceException$ServerMetadataException: java.util.concurrent.TimeoutException
at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1081) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
... 29 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252]
at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1077) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
... 29 more
```
```
WARN org.apache.pulsar.broker.service.Producer - [xxx] Get unexpected error while autorizing [persistent://xx/xxx/xxx-partition-0] null
java.util.concurrent.TimeoutException: null
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) ~[?:1.8.0_252]
at org.apache.pulsar.broker.authorization.AuthorizationService.canProduce(AuthorizationService.java:226) ~[org.apache.pulsar-pulsar-broker-common-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.Producer.checkPermissions(Producer.java:593) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$onPoliciesUpdate$62(PersistentTopic.java:1830) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707) ~[?:1.8.0_252]
at org.apache.pulsar.broker.service.persistent.PersistentTopic.onPoliciesUpdate(PersistentTopic.java:1829) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.BrokerService.lambda$null$59(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at java.util.Optional.ifPresent(Optional.java:159) ~[?:1.8.0_252]
at org.apache.pulsar.broker.service.BrokerService.lambda$null$60(BrokerService.java:1537) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010) ~[?:1.8.0_252]
at org.apache.pulsar.broker.service.BrokerService.lambda$onUpdate$61(BrokerService.java:1532) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:388) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:160) ~[org.apache.pulsar-pulsar-common-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:1528) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.broker.service.BrokerService.onUpdate(BrokerService.java:171) ~[org.apache.pulsar-pulsar-broker-2.6.2.jar:2.6.2]
at org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$4(ZooKeeperDataCache.java:138) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$18(ZooKeeperCache.java:385) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_252]
at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:365) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.6.2.jar:2.6.2]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_252]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_252]
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_252]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_252]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_252]
```
and all producer and consumer also throw exceptions:
```
4:48:10,988 WARN BinaryProtoLookupService - [persistent://xxx/xx/xxx] failed to get Partitioned metadata : org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_212]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_212]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) [?:1.8.0_212]
at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:497) [maxwell.jar:?]
at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:138) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [maxwell.jar:?]
at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [maxwell.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: org.apache.pulsar.client.api.PulsarClientException$LookupException: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at test-pulsar-broker.pulsar.svc.cluster.local/172.16.83.46:6650
at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:972) ~[maxwell.jar:?]
at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:498) ~[maxwell.jar:?]
... 20 more
```
**To Reproduce**
Steps to reproduce the behavior:
1. Create a standard pulsar cluster in k8s
2. Create 20 partitioned topics, and set the number of partitions to 2
3. Randomly grant permissions to these topics
4. See error
**Expected behavior**
No error
**Additional context**
we try [#8304](https://github.com/apache/pulsar/pull/8304) ,but it doesn't work
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] hadesy commented on issue #8625: ZooKeeperDataCache TimeoutException
Posted by GitBox <gi...@apache.org>.
hadesy commented on issue #8625:
URL: https://github.com/apache/pulsar/issues/8625#issuecomment-730178805
and we can reproduce this problem in version 2.6.1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] alexku7 commented on issue #8625: ZooKeeperDataCache TimeoutException
Posted by GitBox <gi...@apache.org>.
alexku7 commented on issue #8625:
URL: https://github.com/apache/pulsar/issues/8625#issuecomment-731618031
I thin this issue is related to this one https://github.com/apache/pulsar/issues/8419
We see the same behavior
Currently in order to prevent this issue we simply insert a sleep of 500 ms between each topics set permission request
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org