You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/07/06 09:11:03 UTC
Slack digest for #general - 2019-07-06

2019-07-05 15:21:31 UTC - Santiago Del Campo: @David Kjerrumgaard You have some troubleshooting in mind?

Lately i've been understanding more the outputs of the bookeeper and i might know how to solve some issues.... but even after that.. i can see other problems when i try to deploy the broker pods again:

*Broker pods logs*

``` org.apache.pulsar.broker.PulsarServerException: java.lang.RuntimeException: java.lang.RuntimeException: Can't create a producer on assignment topic <persistent://public/functions/assignments>  ```

``` Caused by: org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering ledger  ```
----
2019-07-05 15:23:31 UTC - Santiago Del Campo: I've been surfing the Pulsar Documentation trying to understand better the architecture.... but sometimes i think im fighting more with the Kubernetes deployments design than with Pulsar itself :thinking_face:
----
2019-07-05 17:00:26 UTC - David Kjerrumgaard: We would have to have a more interactive debugging session, as I am not 100% clear on the environment you are working in and the steps taken to produce the issue.  It sounds like you spin up the Pulsar cluster with the standard Helm chart, and then "re-deploy" new bookie pods that are configured to point to the existing ZK node. Then you see "invalid cookie" and "bad segment" errors.
----
2019-07-05 17:01:50 UTC - David Kjerrumgaard: I don't think you can deploy an new "set" of bookies at once. You would need to have at least one "old" bookie around to serve the reads of existing data. Otherwise all the ledger metadata in the ZK nodes (which is based on the old, and now deleted bookies) would be invalid.
----
2019-07-05 17:28:01 UTC - Santiago Del Campo: I'd appreciate that alot, right now im kinda lost how to troubleshoot from here.

How could we have a more interactive debugging session?
----
2019-07-05 17:44:47 UTC - David Kjerrumgaard: Let me ask you this, do you really replace all the bookies in the cluster at the same time?
----
2019-07-05 18:57:48 UTC - Santiago Del Campo: Yeah.. so, we use Rancher2 as our Kubernetes cluster administrator... it uses a UI to visualize all that is deployed inside a specific cluster.  I simply click in a "redeploy" button for the bookie workload which contains all the bookie pods and all the current pods are replaced for new ones... thats all.
----
2019-07-05 19:04:16 UTC - David Kjerrumgaard: That is most likely the issue then. All the metadata that is kept in the ZK pod is for the old bookies. Therefore you will need to initialize the metadata again, since it is now essentially a new cluster.... <https://pulsar.apache.org/docs/en/admin-api-clusters/#initialize-cluster-metadata>
----
2019-07-05 19:09:47 UTC - David Kjerrumgaard: Bookies have a strict cookie validation mechanism to ensure data consistency. If a bookie is added without proper initialization, the bookie will fail the cookie validation. The cookie is stored on the bookies local disk and validated against the expected cookie value that is kept in ZK. Since you are adding new bookies, they don't have the proper cookie value.
----
2019-07-05 19:12:16 UTC - Santiago Del Campo: Perfect, i understand that.. so that means... that whenever i redeploy the bookie pods for whatever reason.. i have to also make sure that the ZKs are aware of this change by updating the general cluster metadata?
----
2019-07-05 19:27:25 UTC - David Kjerrumgaard: correct
----
2019-07-05 19:28:09 UTC - David Kjerrumgaard: Since no data is kept on the brokers (excluding the cache),  you are basically starting over with a new, empty cluster. HTH
----
2019-07-05 19:45:17 UTC - Santiago Del Campo: And how impossible would be if i wanted to do it backwards... like forcing the new bookie pods to have the same metadata that the ZKs want to?. That way it'd be more easy to poweroff a machine to do some maintenance and turn it back on
----
2019-07-05 19:49:10 UTC - Santiago Del Campo: as far i understand.. what is really breaking everything.. is not that old topics or messages are lost in the new redeploy... because i am not dealing with persistent data, and the default topics generated by Pulsar are created automatically with a new deploy..... the thing here is about the stricts validation mechanisms that bookie needs to be able to boot correctly.

In that case, what i need would be to be able to setup the Bookies in a way that can adapt to machines that may be turned off at some point... if it is possible, of course.
----
2019-07-05 22:42:14 UTC - t: @t has joined the channel
----
2019-07-06 05:39:57 UTC - vikash: Hello  ,i  am   facing   continues disconnect  of   websocket Producer like  Closing connection ,i have   used  websocket ,is  there  any  setting  in  apache  pulsar  to   keep   connection  long  live
----
2019-07-06 05:56:57 UTC - vikash: i  also  getting  this  Error  too
----
2019-07-06 05:56:58 UTC - vikash: 05:55:23.260 [ForkJoinPool.commonPool-worker-3] ERROR org.apache.pulsar.broker.web.PulsarWebResource - Policies not found for c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients namespace
05:55:23.260 [ForkJoinPool.commonPool-worker-3] WARN  org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/20.43.19.64:58104] <persistent://c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients/0370bc8f-a880-43b7-8121-930996c67e52>: Policies not found for c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients namespace
org.apache.pulsar.broker.web.RestException: Policies not found for c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients namespace
        at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$4(PulsarWebResource.java:679) ~[org.apache.pulsar-pulsar-broker-2.3.2.jar:2.3.2]
        at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_212]
        at org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$0(ZooKeeperDataCache.java:67) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.2.jar:2.3.2]
        at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_212]
        at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:354) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.2.jar:2.3.2]
        at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_212]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_212]
        at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$12(ZooKeeperCache.java:339) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.2.jar:2.3.2]
        at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) [?:1.8.0_212]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_212]
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_212]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_212]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_212]
----