You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/03/21 21:26:45 UTC

[GitHub] [pulsar] cerebrotecnologico opened a new issue #14779: BadVersionException , unable to consume topic, failing to reconnect

cerebrotecnologico opened a new issue #14779:
URL: https://github.com/apache/pulsar/issues/14779


   **Describe the bug**
   I am using Pulsar 2.8.0.
    The topic is partitioned, 3 partitions.
    The backlog quota is set to 50h
    The TTL is set to 48h.
   
   This is the second time that we observe this error. The application reads from 5 different topics, but only one topic has been affected twice.
   
   This is the topic with the highest traffic processed by the same application.  (Yet, our app is not really data intensive when compared to other use cases, it process less than 1 million events per hour)
   
   My application suddenly stopped processing messages, tries to reconnect but keeps getting errors about ZK BadVersion.
   
   [persistent://tenant/namespace/mytopic-partition-2] [platform_persistent://tenant/namespace/mytopic] Could not get connection to broker: java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /managed-ledgers/tenant/namespace/persistent/my-topic-partition-2/platform_persistent%3A%2F%2Ftenant%2Fnamespace%2Fmytopic -- Will try again in 0.1 s
   
   We unloaded the topic and the application restarted processing messages. (The application did not need to be restarted).
   
   Any idea what causes this? How to prevent it.
   
   **To Reproduce**
   Unknown.
   
   **Expected behavior**
   I expected that the topic would be unloaded automatically, causing the clients to be able to reconnect to a new broker.
   
   ** Logs
   Broker:
   09:13:45.822 [pulsar-io-4-2] WARN  org.apache.pulsar.broker.service.BrokerService - Namespace is being unloaded, cannot add topic persistent://platform/system/workflow-event-even-partition-0
   09:13:45.823 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://platform/system/workflow-event-even-partition-0] Attempting to add producer to a fenced topic
   09:13:45.823 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.pulsar.broker.service.ServerCnx - [/10.42.105.16:49756] Failed to add producer to topic persistent://platform/system/workflow-event-even-partition-0: producerId=289578, org.apache.pulsar.broker.service.BrokerServiceException$TopicFencedException: Topic is temporarily unavailable
   09:13:45.844 [ForkJoinPool.commonPool-worker-3] WARN  org.apache.pulsar.broker.service.BrokerService - Namespace bundle for topic (persistent://platform/system/workflow-event-even-partition-0) not served by this instance. Please redo the lookup. Request is denied: namespace=platform/system
   09:13:45.854 [ForkJoinPool.commonPool-worker-3] WARN  org.apache.pulsar.broker.service.BrokerService - Namespace bundle for topic (persistent://platform/system/workflow-event-even-partition-0) not served by this instance. Please redo the lookup. Request is denied: namespace=platform/system
   09:13:45.877 [pulsar-2-2] WARN  org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker pulsar-broker-2.pulsar-broker.pulsar.svc.cluster.local:8080 is overloaded: CPU: 96.51538%, MEMORY: 51.04191%, DIRECT MEMORY: 12.5%, BANDWIDTH IN: 2.6368966%, BANDWIDTH OUT: 0.3707455%
   09:13:45.877 [pulsar-2-2] WARN  org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker http://pulsar-broker-2.pulsar-broker.pulsar.svc.cluster.local:8080 is overloaded: max usage=0.9651538133621216
   
   09:28:46.277 [bookkeeper-ml-scheduler-OrderedScheduler-1-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [platform/system/persistent/workflow-event-even-partition-2] Failed to update consumer platform_persistent%3A%2F%2Fplatform%2Fsystem%2Fworkflow-event-even_queue
   	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl$7.operationFailed(ManagedLedgerImpl.java:940) ~[io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]
   	at org.apache.bookkeeper.util.SafeRunnable$1.safeRun(SafeRunnable.java:43) [org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
   09:28:46.281 [bookkeeper-ml-scheduler-OrderedScheduler-1-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://platform/system/workflow-event-even-partition-2] Failed to create subscription: platform_persistent://platform/system/workflow-event-even_queue
   java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /managed-ledgers/platform/system/persistent/workflow-event-even-partition-2/platform_persistent%3A%2F%2Fplatform%2Fsystem%2Fworkflow-event-even_queue
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:704) ~[?:?]
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
   	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
   	at org.apache.pulsar.broker.service.persistent.PersistentTopic$3.openCursorFailed(PersistentTopic.java:887) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]
   	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$2.operationFailed(ManagedCursorImpl.java:567) ~[io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]
   	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$23.operationFailed(ManagedCursorImpl.java:2348) ~[io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]
   
   At the time this occurred, I saw long GC in a ZK node:
   <img width="333" alt="image" src="https://user-images.githubusercontent.com/1891405/159366581-dc2a518b-f1b9-4ef1-bf5c-bf52b64aa6eb.png">
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #14779: BadVersionException , unable to consume topic, failing to reconnect

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #14779:
URL: https://github.com/apache/pulsar/issues/14779#issuecomment-1074673985


   Hi @cerebrotecnologico thanks for creating the issue, is it able to share the broker logs file? I want to check what happens before the error logs, this will help to investigate the problem.
   
   /cc @hangc0276 @zymap 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org