You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/04/02 09:11:04 UTC

Slack digest for #general - 2020-04-02

2020-04-01 09:19:58 UTC - ikeda: Hi, pulsar engineers!

I’m interested in geo replication.
I wonder that pulsar supports both async and sync geo replication.

The document says that “forwarded asynchronously to the remote clusters.”, so is it correct that pulsar supports only async geo replication?
----
2020-04-01 09:21:18 UTC - Ali Ahmed: it supports both
----
2020-04-01 09:39:55 UTC - ikeda: thanks, ahmed!
How to configure it?
----
2020-04-01 09:42:50 UTC - ikeda: producer’s send, sendAsync API?
----
2020-04-01 12:29:35 UTC - Jack Newbury: @Jack Newbury has joined the channel
----
2020-04-01 12:31:34 UTC - Slackbot: This message was deleted.
----
2020-04-01 12:33:11 UTC - Fabien LD: <https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1585687939157900>
----
2020-04-01 12:36:39 UTC - Frank Kelly: @Frank Kelly has joined the channel
----
2020-04-01 12:44:18 UTC - Fabien LD: Did you even try to google it ?
----
2020-04-01 12:44:21 UTC - Fabien LD: <https://github.com/apache/pulsar/tree/master/deployment/kubernetes/helm>
----
2020-04-01 12:44:31 UTC - Fabien LD: Took me about 5s
----
2020-04-01 12:55:59 UTC - Vladimir Shchur: Hi! I wonder if broker checks schema for each message or only when producer/consumer connects to it? Another question is - where do I set/change value of `isAllowAutoUpdateSchema` ?
----
2020-04-01 13:21:04 UTC - Tanner Nilsson: yes, both of those commands work
----
2020-04-01 13:28:32 UTC - Tanner Nilsson: aaaaand I see the problem! running it in docker, `localhost` won't work. Since I'm using docker-compose, I updated `advertisedAddress=&lt;pulsar container service name from docker-compose&gt;` in my `standalone.conf` and it now works! Thank you!
----
2020-04-01 13:32:34 UTC - Ishan Bhatt: @Ishan Bhatt has joined the channel
----
2020-04-01 14:33:26 UTC - Evan Furman: 2 brokers, 3 bookies; I can use the kubernetes cluster which has grafana. What metrics/graphs would you want to see?
----
2020-04-01 15:16:23 UTC - Penghui Li: I reproduce the problem on my laptop. And I pushed a PR to improve the dispatch performance for Key_Shared subscription. <https://github.com/apache/pulsar/pull/6647>
----
2020-04-01 15:22:20 UTC - Penghui Li: @Evan Furman Does the producer enable batch message? I reproduce it when I disable batch message. The main problem I met is when I increase consumers, the thread (handling message writing and reading of the topic) CPU load becomes higher.

So the <https://github.com/apache/pulsar/pull/6647> can reduce some CPU cycles.
----
2020-04-01 15:24:41 UTC - Penghui Li: And you can try to increase the increase the `dispatcherMaxReadBatchSize` in the broker. At high throughput, this can better group messages by key, so that we can reduce find consumer times.
----
2020-04-01 15:26:16 UTC - Penghui Li: Is you're using random message key, I think it should be the same as the problem I reproduced.
----
2020-04-01 15:28:02 UTC - Evan Furman: We do not use batching so that sounds correct, when @Tim Corbett is online later he can provide more context
----
2020-04-01 15:28:26 UTC - Evan Furman: I do believe we are using random message key too
----
2020-04-01 15:35:42 UTC - Penghui Li: Ok, Looks the same as the situation I encountered. This PR can solve some problems, we need to test again to see if there is room for further optimization. I think you can use some fixed message key and enable batch message(`.batcherBuilder(BatcherBuilder.KEY_BASED)`). After this PR merged, we can trigger the weekly build to build a new tar. So that you can test it with random message key and disable message batch.
----
2020-04-01 15:50:17 UTC - Rattanjot Singh: Hi! In pulsar manager can we see the message in the backlog in a topic?
----
2020-04-01 15:51:16 UTC - Matteo Merli: Schema is validated by broker when the producer/consumer session is established
----
2020-04-01 16:21:18 UTC - Evan Furman: Great, will give it a shot. Thank you!
----
2020-04-01 16:24:43 UTC - Sijie Guo: there is a new feature support in the upcoming new release.
----
2020-04-01 16:24:59 UTC - Sijie Guo: it is used for peeking the messages from the backlog
----
2020-04-01 17:26:00 UTC - Manjunath Ghargi: Is there a documentation or wiki page for us to refer the upcoming features or fixes?
----
2020-04-01 17:27:25 UTC - Tim Corbett: Wow, excellent work! Yes, we do not have batching enabled (it is not supported yet by the C# client, perhaps we can add this feature to it ourselves), and the keys are effectively random in our tests/production use case. I presume the `dispatcherMaxReadBatchSize` refers to some internal batching and does not require the messages to be batched?
----
2020-04-01 17:31:21 UTC - Evan Furman: I’m wondering if this will help the issue we encountered here <https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1584371459330600|https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1584371459330600>

In the server running on Kubernetes, we see very very high latencies from consumers when there is a backlog.
----
2020-04-01 18:46:58 UTC - Manjunath Ghargi: Hi, Is there a sample code for connecting Camel with Pulsar and ActiveMQ, like I'm trying for a usecase for reading messages from a topic in activemq and produce to pulsar topic using camel?
----
2020-04-01 22:42:09 UTC - Kai Levy: Hey, are there any indications in the pulsar-clients metrics that the `ioThreads` or `listenerThreads` are underprovisioned? Or would any exception be thrown if one client was trying to process too many messages for it's thread pools? Thanks
----
2020-04-01 23:09:13 UTC - Vladimir Shchur: I see, it means that only client ensures that messages correspond to schema. Could you please also answer to the second question?
----
2020-04-01 23:51:21 UTC - Matteo Merli: Easiest way would be to do a check with `top` (in the per-thread mode, pressing capital `H`)
----
2020-04-01 23:52:10 UTC - Matteo Merli: that will show cpu % usage per thread. if there are threads getting close to 100% (eg: saturating 1 core), then it might be a good idea to increase
----
2020-04-01 23:57:29 UTC - Matteo Merli: &gt; it means that only client ensures that messages correspond to schema.
The idea is to protect against mistakes rather than malicious attempts in publishing non-conforming data.

Checking every single message format would be very expensive on the broker side, for a limited practical utility.

&gt; nother question is - where do I set/change value of `isAllowAutoUpdateSchema`
you can use CLI/REST:

`pulsar-admin namespaces set-is-allow-auto-update-schema $NAMESPACE --enable`
----
2020-04-02 00:01:50 UTC - Kai Levy: interesting.. i'm hoping to find a way to monitor on deployed instances without shell access, but that is helpful to know, thanks
----
2020-04-02 02:26:29 UTC - Penghui Li: @Tim Corbett
&gt; I presume the `dispatcherMaxReadBatchSize` refers to some internal batching and does not require the messages to be batched?
Yes, considering not batch message, increase ``dispatcherMaxReadBatchSize`` can improve message dispatching performance. And in Key_Shared subscription, the dispatch group messages by the key hash, so read a bigger batch can reduce select consumer calls.
----
2020-04-02 02:28:25 UTC - Penghui Li: &gt; In the server running on Kubernetes, we see very very high latencies from consumers when there is a backlog.
I think yes
----
2020-04-02 02:29:04 UTC - Tim Corbett: Understood. I'm told we tried that on our k8s cluster but did not see much improvement, but it's likely we can revisit it. I saw your PR failed some automated tests, hopefully that gets resolved soon, we're excited to try it out!
----
2020-04-02 02:29:54 UTC - Penghui Li: Yes, I'm trying to rerun the tests, look not related to this PR.
----
2020-04-02 06:02:29 UTC - Sankararao Routhu: @Sankararao Routhu has joined the channel
----
2020-04-02 06:51:52 UTC - Vladimir Shchur: thank you!
----