You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/05/25 15:58:18 UTC

[GitHub] [pulsar] codelipenghui opened a new issue #10703: Managed ledger can't been initialized

codelipenghui opened a new issue #10703:
URL: https://github.com/apache/pulsar/issues/10703


   **Describe the bug**
   All the broker can't complete the managed ledger creation, the managed ledger will in the `None` state for 1min and after 1min to reconnect to the topic, the managed ledger been removed due to `too long pending time`
   
   Here is the logs:
   
   ```
   May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.579 [pulsar-io-4-7] INFO  org.apache.pulsar.broker.service.ServerCnx - [/10.0.0.179:46878][persistent://public/default/test-partition-0] Creating producer. producerId=0
   May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.580 [pulsar-ordered-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/test-partition-0] Managed ledger has been pending in initialize state more than 1899091 milliseconds, remove it from cache to retry ...
   May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.580 [pulsar-ordered-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/test-partition-0
   May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.581 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/test-partition-0] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 100, 101, 102, 97, 117, 108, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 101, 115, 116, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 48], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
   May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.581 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [10.0.0.179:3181, 10.0.0.104:3181] is not adhering to Placement Policy. quarantinedBookies: []
   May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.586 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [10.0.0.179:3181, 10.0.0.104:3181] for ledger: 42
   May 25 15:24:38 ip-10-0-0-179 pulsar[39322]: 15:24:38.762 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:38 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0
   May 25 15:24:38 ip-10-0-0-179 pulsar[39322]: 15:24:38.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:38 +0000] "GET /metrics/ HTTP/1.1" 200 43130 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:24:43 ip-10-0-0-179 pulsar[39322]: 15:24:43.866 [pulsar-io-4-6] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /127.0.0.1:59372
   May 25 15:24:43 ip-10-0-0-179 pulsar[39322]: 15:24:43.866 [pulsar-io-4-7] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /10.0.0.179:46878
   May 25 15:24:48 ip-10-0-0-179 pulsar[39322]: 15:24:48.762 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:48 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0
   May 25 15:24:48 ip-10-0-0-179 pulsar[39322]: 15:24:48.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:48 +0000] "GET /metrics/ HTTP/1.1" 200 43128 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:24:58 ip-10-0-0-179 pulsar[39322]: 15:24:58.761 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:58 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0
   May 25 15:24:58 ip-10-0-0-179 pulsar[39322]: 15:24:58.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:58 +0000] "GET /metrics/ HTTP/1.1" 200 43126 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:25:01 ip-10-0-0-179 CRON[51146]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
   May 25 15:25:08 ip-10-0-0-179 pulsar[39322]: 15:25:08.761 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:08 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0
   May 25 15:25:08 ip-10-0-0-179 pulsar[39322]: 15:25:08.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:08 +0000] "GET /metrics/ HTTP/1.1" 200 43128 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:25:18 ip-10-0-0-179 pulsar[39322]: 15:25:18.761 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:18 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0
   May 25 15:25:18 ip-10-0-0-179 pulsar[39322]: 15:25:18.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:18 +0000] "GET /metrics/ HTTP/1.1" 200 43128 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:25:28 ip-10-0-0-179 pulsar[39322]: 15:25:28.762 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:28 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 1
   May 25 15:25:28 ip-10-0-0-179 pulsar[39322]: 15:25:28.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:28 +0000] "GET /metrics/ HTTP/1.1" 200 43158 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: 15:25:35.580 [pulsar-io-4-8] ERROR org.apache.pulsar.broker.service.ServerCnx - [/10.0.0.179:46878] Failed to create topic persistent://public/default/test-partition-0, producerId=0
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: java.util.concurrent.CompletionException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) ~[?:?]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at org.apache.pulsar.common.util.FutureUtil.lambda$addTimeoutHandling$1(FutureUtil.java:129) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [io.netty-netty-transport-native-epoll-4.1.63.Final-linux-x86_64.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.lang.Thread.run(Thread.java:829) [?:?]
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: Caused by: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout
   May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at org.apache.pulsar.broker.service.BrokerService.futureWithDeadline(...)(Unknown Source) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
   May 25 15:25:38 ip-10-0-0-179 pulsar[39322]: 15:25:38.761 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:38 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0
   May 25 15:25:38 ip-10-0-0-179 pulsar[39322]: 15:25:38.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:38 +0000] "GET /metrics/ HTTP/1.1" 200 43157 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:25:40 ip-10-0-0-179 systemd[1]: Started Session 115 of user ubuntu.
   May 25 15:25:48 ip-10-0-0-179 pulsar[39322]: 15:25:48.761 [pulsar-web-40-5] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:48 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0
   May 25 15:25:48 ip-10-0-0-179 pulsar[39322]: 15:25:48.764 [prometheus-stats-41-1] INFO  org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:48 +0000] "GET /metrics/ HTTP/1.1" 200 43157 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.534 [pulsar-io-4-10] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /127.0.0.1:59376
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.686 [pulsar-io-4-11] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /10.0.0.179:46882
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.690 [pulsar-io-4-11] INFO  org.apache.pulsar.broker.service.ServerCnx - [/10.0.0.179:46882][persistent://public/default/test-partition-0] Creating producer. producerId=0
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.690 [pulsar-ordered-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/test-partition-0] Managed ledger has been pending in initialize state more than 78110 milliseconds, remove it from cache to retry ...
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.690 [pulsar-ordered-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/test-partition-0
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.691 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/test-partition-0] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 100, 101, 102, 97, 117, 108, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 101, 115, 116, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 48], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.692 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [10.0.0.179:3181, 10.0.0.104:3181] is not adhering to Placement Policy. quarantinedBookies: []
   May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.696 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [10.0.0.179:3181, 10.0.0.104:3181] for ledger: 43
   ```
   
   This will affect all topics of the broker and all the 3 brokers are in this state.
   I have make a thread dump and heap dump:
   
   
   
   [heap.hprof.zip](https://github.com/apache/pulsar/files/6540692/heap.hprof.zip)
   [jstack_v2.log](https://github.com/apache/pulsar/files/6540699/jstack_v2.log)
   
   
   **To Reproduce**
   It's difficult to reproduce
   
   **Additional context**
   current master branch
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat closed issue #10703: Managed ledger can't been initialized

Posted by GitBox <gi...@apache.org>.
merlimat closed issue #10703:
URL: https://github.com/apache/pulsar/issues/10703


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org