You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/06/03 09:11:05 UTC

Slack digest for #general - 2020-06-03

2020-06-02 10:12:10 UTC - Ebere Abanonu: @Sijie Guo can I suggest that for your next release, you create a separate section detailing features and changes for developers. Having all the information in one place will help.
----
2020-06-02 12:00:12 UTC - Ankush: Hi everyone,
I have few questions regarding KeyShared subscription, which I stumbled across while playing in dev:
1. When using key shared policy `KeySharedPolicy.autoSplitHashRange()` , what is the internal process of Pulsar to rebalance hash keys when we add/remove a consumer?
2. When using key shared policy `KeySharedPolicy.stickyHashRange()` , the documentation says that if we cannot cover the complete range [0, 65535], the cursor will rewind. What does that mean? How is pulsar handing restarts of a node when using this policy (for us, we have 4 consumers in k8s and restarting 1 node can take around 1 minute)?
Thanks!
----
2020-06-02 13:48:53 UTC - Ebere Abanonu: @Sijie Guo remember there was a time I faced an issue of schemaValidation exception even when the schema is correct? I have been able to reproduce same with better understanding: For instance, I had a topic named Students with a schema registered, now I tried testing a different schema with different definition and I called the topic Students-Test an exception will be thrown but if changed the name to something that does not have Students in it, it succeeds. When I first encountered this Earlier I was able to resolve it by restarting Pulsar. For the current issue, I was able to resolve it by changing the name to sometthing not containing existing topic's name. Could it have to do with caching
----
2020-06-02 15:56:47 UTC - Alexandre DUVAL: I think I missunderstand something around TLS and proxying:
----
2020-06-02 15:57:43 UTC - Alexandre DUVAL:
----
2020-06-02 15:58:44 UTC - Alexandre DUVAL: throwing:
----
2020-06-02 15:58:52 UTC - Alexandre DUVAL:
----
2020-06-02 16:00:06 UTC - Alexandre DUVAL: broker_url should not contains the scheme?
----
2020-06-02 16:00:44 UTC - Alexandre DUVAL: the throw occurs from this call <https://github.com/apache/pulsar/blob/master/pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/DirectProxyHandler.java#L126>
----
2020-06-02 16:00:59 UTC - Alexandre DUVAL: and this doesn't take care of ssl/notssl ? <https://github.com/apache/pulsar/blob/master/pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/DirectProxyHandler.java#L119>
----
2020-06-02 16:01:27 UTC - Alexandre DUVAL: WDYT? Issue?
----
2020-06-02 16:12:48 UTC - Chris DiGiovanni: This morning I went to expand storage on 3 of 4 of my bookies. These are the steps I followed:
1. Add disk to OS, and mounted to a new ledger dir
2. Added ledger dir to the bookkeeper.conf
3. Ran bin/bookkeeper shell updatecookie -expandstorage
4. Disabled Autorecovery
5. Restarted the bookie
6. Once the process connected and was stable, enabled autorecovery
I repeated this process 3 times for the 3 bookies and about 3-5 minutes in-between each bookie. All expanded their size appropriately, though now I have underreplicated ledgers that say they have 3 missing replicas. Unsure how this is possible since all my namespaces have set
```--bookkeeper-ack-quorum 2 --bookkeeper-ensemble 3 --bookkeeper-write-quorum 3````
Here is an example of an underreplicated ledger I'm seeing from listunderreplicated ledgers:
```3698657
Ctime : 1591105532954
MissingReplica : <http://chhq-vudpulbk02.us.drwholdings.com:3181|chhq-vudpulbk02.us.drwholdings.com:3181>
MissingReplica : <http://chhq-vudpulbk01.us.drwholdings.com:3181|chhq-vudpulbk01.us.drwholdings.com:3181>
MissingReplica : <http://chhq-vudpulbk03.us.drwholdings.com:3181|chhq-vudpulbk03.us.drwholdings.com:3181>```
Not understanding how this is possible or how to fix. My bookies all show readwrite as well. Any help on steps for recovery would be helpful.
----
2020-06-02 16:23:48 UTC - Sijie Guo: Noted with thanks. @Penghui Li ^^
----
2020-06-02 16:30:09 UTC - Alexandre DUVAL: nvm issue on pulsar-rs :slightly_smiling_face: broker url must not contains scheme
----
2020-06-02 16:33:41 UTC - Addison Higham: @Alexandre DUVAL that code path is using raw sockets (via netty). It doesn't really use the pulsar-client which is what uses the URI scheme to determine TLS. If you look below, it doesn't really do anything with that URI other than to pull out the host and port.

If you trace through that code you can see that based upon your settings, it will create the sslHandlerSupplier.

As far as your issue, what sort of discovery are you using for brokers? However your targetBrokerUrl is getting found appears to lack port information
----
2020-06-02 16:37:25 UTC - Addison Higham: is the creation time of the missing ledgers before you performed the maintenance or during?
----
2020-06-02 16:39:33 UTC - Penghui Li: 1. The current implementation is split the largest hash range and there are a new implementation based on consistent hash. <https://github.com/apache/pulsar/pull/6791>
2. Rewind is reset the read position to the last acknowledged position. If the broker can’t find any consumer to dispatch some messages, other consumers will stop consuming messages until these messages can deliver.
----
2020-06-02 16:47:51 UTC - Chris DiGiovanni: Yes it looks to be during the maintenance.
----
2020-06-02 16:56:42 UTC - Alexandre DUVAL: targetBrokerUrl must not contains scheme
----
2020-06-02 16:56:49 UTC - Alexandre DUVAL: then it works
----
2020-06-02 16:59:28 UTC - Ankush: Thanks a lot. This is good help.
----
2020-06-02 16:59:44 UTC - Chris DiGiovanni: When I try to recover that ledger using readledger bookie command I see lines like this:
`2020-06-02 11:58:56.936 [BookieClientScheduler-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand - Failed to read entry 167 -- No such ledger exists on Bookies`
----
2020-06-02 17:00:10 UTC - Chris DiGiovanni: I ran that against all my bookies and it says the ledger doesn't exist.
----
2020-06-02 17:33:47 UTC - Addison Higham: do you have any logs from brokers or metadata from the ledgers to see where they came from?
----
2020-06-02 17:39:46 UTC - Chris DiGiovanni: I'll need to look... Curious the course of action if I'm able to find the logs or metadata from these ledgers that is unable to read from these ledgers.
----
2020-06-02 17:41:12 UTC - Addison Higham: you look at the disks and see if you can see any evidence of them there?
----
2020-06-02 17:47:13 UTC - Chris DiGiovanni: Here is the LedgerMetadata for a ledger that is missing replicas from all three:
```ledgerID: 2817853
LedgerMetadata{formatVersion=2, ensembleSize=3, writeQuorumSize=3, ackQuorumSize=2, state=CLOSED, length=692009, lastEntryId=168, digestType=CRC32C, password=base64:, ensembles={0=[<http://chhq-vudpulbk03.us.drwholdings.com:3181|chhq-vudpulbk03.us.drwholdings.com:3181>, <http://chhq-vudpulbk01.us.drwholdings.com:3181|chhq-vudpulbk01.us.drwholdings.com:3181>, <http://chhq-vudpulbk02.us.drwholdings.com:3181|chhq-vudpulbk02.us.drwholdings.com:3181>]}, customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:ZmlvL3NpZ25hbC9wZXJzaXN0ZW50L3ZvbGFy, application=base64:cHVsc2Fy}}```
----
2020-06-02 17:55:36 UTC - Chris DiGiovanni: Unfortunately it looks like the logs rolled off as they are being spammed pretty hard because of the missing ledgers.
----
2020-06-02 18:03:32 UTC - Chris DiGiovanni: Brokers never showed anything in their logs and I have logs for the brokers going back 2 days.
----
2020-06-02 18:31:43 UTC - Raphael Enns: I was looking at <https://pulsar.apache.org/docs/en/deploy-bare-metal/>. We don't need any data redundancy as the data we're sending doesn't need to last long. We're also not pushing through a large amount or frequency of data. What would you recommend for a simple stable production setup? Would 1 zookeeper process, 1 bookkeeper process and 1 pulsar broker process all running on the same machine work?
----
2020-06-02 19:35:43 UTC - Frank Kelly: Newbie question - where in the Java client is the tenant / namespace specified in either the producer or the consumer <https://pulsar.apache.org/docs/en/client-libraries-java/>
----
2020-06-02 19:38:21 UTC - Addison Higham: the `topic` setting, topics can be done very simple with just a string like `my-topic`, but that then uses your default tenant and namespace, a topic name of `my-namespace/my-topic` will be in the default tenant, but in a namespace `my-namespace`.

What most people do is usually fully qualified topic names like:
`<persistent://my-tenant/my-namespace/my-topic>`
----
2020-06-02 19:38:54 UTC - Frank Kelly: Gotcha - thanks - the examples in the doc were a little confusing.
----
2020-06-03 00:59:12 UTC - Hiroyuki Yamada: Hi, I’m testing Pulsar auto recovery feature in a k8s environment deployed with Helm.
I’m using the default configuration so the number of replicas of bookie is 4 and Ensemble/WQ/AQ = (3,3,2) and there is an auto recovery pod and auto recovery is on.

To test auto recovery behavior, after I created some data with pulsar-perf, I deleted all the files under ledgers (rm -rf /pulsar/data/bookkeeper/ledgers/current/*) of bookie-0 to simulate a disk failure. During the testing, I was watching all the logs of recovery and bookies by `kubectl logs`, but nothing really happened after the ledger data is deleted. (so looks like the deleted data is not recovered)

Is it the expected behavior ? Am I missing something ?
It would be great if anyone can help me out.
----
2020-06-03 01:43:37 UTC - Rounak Jaggi: We have deployed pulsar cluster using Terraform/ansible with 3 brokers, 3 bookies, 3 zookeeper and 2 proxy with pulsar version 2.5.0. Now I have two questions:
1. Now we want to add 2 more brokers, 2 more bookies, 2 more zookeepers and 1 more proxy to the existing cluster environment, I was able to build those new instance using Terraform easily, just by increasing the number of those components. How can we configure only those new components with the latest version of pulsar and not affect the existing components running on the older version using ansible. Is there a way to do this?
2. How do we do upgrade/migration using Terraform/ansible on aws environment.
----
2020-06-03 04:07:50 UTC - Ken Huang: Hi, how to set the IP address in bookkeeper so that register to zookeeper
----
2020-06-03 05:30:52 UTC - Pushkar Sawant: Is there a way to improve the distribution of ledgers across bookkeeper servers. I have 6 node bookkeeper cluster with 1Tb ledger storage on each node. At the moment 3 nodes are at around 60% storage utilization and other 3 nodes are around 87% utilization.
----
2020-06-03 06:09:53 UTC - Dhakshin: @Dhakshin has joined the channel
----
2020-06-03 06:49:16 UTC - Sijie Guo: auto recovery contains two tasks. One is bookie audit task, detecting bookies are gone; the other one is ledger audit, detecting entries missing. ledger audit is scheduled for very long duration.

for bookie gone case, just kill one bookie and keep it down and you will see how auto recovery works. for entries gone case, reduce the ledger audit interval.
----
2020-06-03 06:50:21 UTC - Sijie Guo: it seems like more of a terraform question?
----
2020-06-03 06:50:35 UTC - Sijie Guo: advertisedAddress
----
2020-06-03 06:51:25 UTC - Sijie Guo: How many partitions do you have?
----
2020-06-03 07:30:49 UTC - Dhakshin: Hi,
Unable to load consumer metrics in prometheus after enabled "exposeTopicLevelMetricsInPrometheus=true" and "exposeConsumerLevelMetricsInPrometheus=true" properties in broker.conf file
----
2020-06-03 07:31:59 UTC - Hiroyuki Yamada: Thank you !
----