You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/04/15 09:11:03 UTC

Slack digest for #general - 2020-04-15

2020-04-14 09:11:24 UTC - tuteng: You can try mvn clean install -DskipTests
----
2020-04-14 09:45:09 UTC - Sijie Guo: 1. For write availability, you need at least ensemble size of bookies.
2. There is no read quorum. It can read from any available replicas.
----
2020-04-14 10:16:30 UTC - Hiroyuki Yamada: @Sijie Guo Thank you ! Let me ask additional questions. Sorry for keep asking …
1. So,  E should basically be equal to Qw ? Benefit to have E &gt; Qw is basically better performance ? Also, if I want to tolerate 1 node down E=2, Qw=2, Qa=2 is good enough ?
2. How read always read consistent data ? What if a read goes to a non-caught-up node ?

----
2020-04-14 11:31:51 UTC - Aviram Webman: I tried KEY_BASED batcher. now the messages are routed correctly.
However the performance is lower than DEFAULT batcher, and it depends on number of keys.
1 Keys -&gt; 100,000 Messages/sec
25 Keys -&gt; 60,000 Messages/sec
100 keys -&gt; 25,000 Messages/sec

it seems that there a batch per key, which explain this behavior
----
2020-04-14 11:59:37 UTC - Aravindhan: <#C5Z4T36F7|general> What is the recommended way of configuring source connector/pulsar function in the Kubernetes production deployment(Where there is multiple brokers and function workers)? (It requires copying the jar and nar files into the container and running the source connector/function create command inside the container)
----
2020-04-14 12:02:15 UTC - Xavier Levaux: @Xavier Levaux has joined the channel
----
2020-04-14 12:09:32 UTC - Xavier Levaux: Dear All,
I'm new to Apache Pulsar.  I've previously used Kafka.
What I liked: possible to have multi-schema per topic.
What I didn't liked: Limitation to the number of topics, limitation of KSQL not supporting queries on topic with multi-schema.
So, I see Pulsar supports a huge number of topics (limit?).  BUT apparently Pulsar does NOT support to have multiple schemas on the same topic? :disappointed_relieved:
Right or wrong?
Thanks!
----
2020-04-14 13:00:52 UTC - Ryan Slominski: <https://github.com/apache/pulsar/pull/6738>
----
2020-04-14 13:03:39 UTC - Greg: Hi, is there a way to create a source connector that will be deployed on every brokers ?
----
2020-04-14 13:13:14 UTC - Guilherme Perinazzo: @Nozomi Kurihara Yes, ThreadsafeFunction gives you a safe callback that you can call from the pulsar thread, and runs on the node thread. It's being used on the message listener. For the other things that currently use a worker thread, you can create the promise and resolve it once the threadsafe function is called.
----
2020-04-14 13:15:34 UTC - Ryan Slominski: Hi Greg, I believe if you set the parallelism parameter to match the number of brokers that should work
----
2020-04-14 13:16:25 UTC - Greg: Hi, yes just tried this parameter and it works as expected, thanks
----
2020-04-14 13:17:33 UTC - Ryan Slominski: I believe you can use a union to have something like multiple schema on a single topic.
----
2020-04-14 14:49:09 UTC - Rattanjot Singh: will this build a docker image
----
2020-04-14 15:07:54 UTC - Raman Gupta: PIP 43 covers the ability to send messages of different schemas from one producer: <https://github.com/apache/pulsar/wiki/PIP-43%3A-producer-send-message-with-different-schema>. This should be in 2.5.0 -- I haven't tried it yet.
----
2020-04-14 15:09:11 UTC - Ryan Slominski: Two oddities I've encountered with a custom source connector:
1. "--destination-topic-name testing" parameter is required to run it, even though it never writes to that topic (per Record destination topics are used instead).  Without this parameter "topic name must not be null" error is displayed.
2. Only the "localrun" command works.  If I use "create" I get error "Source package does not have the correct format", which is weird since it works with localrun
Ideas?
----
2020-04-14 15:12:33 UTC - Tanner Nilsson: I'm running docker on mac
----
2020-04-14 15:20:59 UTC - Xavier Levaux: @Raman Gupta I might be wrong, but I don’t think it means a topic containing messages of different schema types
----
2020-04-14 15:22:11 UTC - Raman Gupta: You might be right. I have the same requirement to migrate from Kafka to Pulsar but haven't looked deeply into it yet.
----
2020-04-14 15:22:52 UTC - Raman Gupta: You think this PIP is just for different versions of the same schema?
----
2020-04-14 15:24:32 UTC - Raman Gupta: I brought this up on the dev list a while ago, and it seemed like this PIP was needed to get this feature, but it still needs the implementation: <https://lists.apache.org/thread.html/b04523a659bfcb23a1750ee675dc1882420eee6b072c18a3fc5d7cff@%3Cdev.pulsar.apache.org%3E>
----
2020-04-14 15:26:03 UTC - Xavier Levaux: It’s not very clear.  It could indeed serve the purpose of being able to produce messages with different schemas.
----
2020-04-14 15:26:33 UTC - Raman Gupta: From that thread I referenced above:
&gt; I think we should address this feature in the
&gt; future, and this PIP provides the essential ability to implement it.
----
2020-04-14 15:26:50 UTC - Raman Gupta: So for now, the union approach is probably the best workaround.
----
2020-04-14 15:27:02 UTC - Xavier Levaux: If schema is set on the message, then does the consumer uses that schema info from the meta data to be able to deserialize messages corresponding to different schemas?  It would be logical
----
2020-04-14 15:27:26 UTC - Xavier Levaux: I don’t find anything about union.  Do you have a page ref?
----
2020-04-14 15:27:43 UTC - Raman Gupta: I believe it does but I think the schema compatibility implementation in Pulsar is the remaining gap.
----
2020-04-14 15:28:46 UTC - Raman Gupta: For unions, use your underlying data system. If you are using Avro for example, you would create an Avro record type which is a union of all the possible records that can be produced to that topic. You then unpackage the union in the consumer.
----
2020-04-14 15:30:26 UTC - Xavier Levaux: Oh, that is a very last ressort solution.  I prefer to keep out till it is supported
----
2020-04-14 15:31:20 UTC - Raman Gupta: I don't believe there is even any PIP for this yet.
----
2020-04-14 15:34:34 UTC - tuteng: Before you need build package /pulsar/distribution/server/target/apache-pulsar-2.6.0-SNAPSHOT-src.tar.gz, using command `mvn clean install -DskipTests`, then run `mvn package -Pdocker`
----
2020-04-14 15:36:08 UTC - Raman Gupta: @Xavier Levaux FYI in case you want to subscribe to updates: <https://lists.apache.org/thread.html/r7286a29f5ae7c6043ac0cf63aef111cca04a5b66761d1ca7a2e349ec%40%3Cdev.pulsar.apache.org%3E>
----
2020-04-14 15:53:18 UTC - Rattanjot Singh: getting this error in grafana pod
```{"id":1,"message":"Datasource added","name":"Prometheus"}Done!
2020/04/14 15:40:35 http: proxy error: context canceled
t=2020-04-14T15:40:35+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=1 uname= method=GET path=/api/v1/series status=502 remote_addr=127.0.0.1 time
_ms=7038 size=0
2020/04/14 15:40:35 http: proxy error: context canceled
t=2020-04-14T15:40:35+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=1 uname= method=GET path=/api/v1/series status=502 remote_addr=127.0.0.1 time
_ms=7039 size=0
t=2020-04-14T15:47:00+0000 lvl=eror msg="Data source with same name already exists" logger=context userId=1 orgId=1 uname=admin error="Data source with same name alrea
dy exists"
t=2020-04-14T15:47:00+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=POST path=/api/datasources status=409 remote_addr=[::1]
time_ms=28 size=55
t=2020-04-14T15:47:22+0000 lvl=eror msg="Data source with same name already exists" logger=context userId=1 orgId=1 uname=admin error="Data source with same name alrea
dy exists"
t=2020-04-14T15:47:22+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=POST path=/api/datasources status=409 remote_addr=[::1]
time_ms=28 size=55```
----
2020-04-14 15:58:33 UTC - Rattanjot Singh: While running build.sh `<https://github.com/apache/pulsar/blob/master/docker/build.sh>`
getting the following error
```+ auditwheel repair dist/pulsar_client-2.5.0-cp27-cp27mu-linux_x86_64.whl dist/pulsar_client-2.6.0-cp27-cp27mu-linux_x86_64.whl
usage: auditwheel [-h] [-V] [-v] command ...
auditwheel: error: unrecognized arguments: dist/pulsar_client-2.6.0-cp27-cp27mu-linux_x86_64.whl```
----
2020-04-14 16:11:34 UTC - Sijie Guo: If you are writing a source connector, it means you are reading some events from a source and write them to a pulsar topic. that’s why a destination topic name is required. Can you explain the use case that a source connector that doesn’t write any events to a pulsar topic?

How did you package the file? In a NAR package?
----
2020-04-14 16:13:19 UTC - Sijie Guo: You have two different builds. One is from 2.5.0 and the other one is from 2.6.0. If you are buiding 2.6.0, then you can remove those 2.5.0 files.
----
2020-04-14 16:22:33 UTC - JG: this is strange
----
2020-04-14 16:28:34 UTC - Ryan Slominski: I'm using the org.apache.pulsar.functions.api.Record.getDestinationTopic() method to return a destination topic based on incoming data (it varies, this one connector writes to 100s of topics.   I'm using the Gradle NAR plugin to create a NAR file.
----
2020-04-14 16:48:15 UTC - rwaweber: Absolutely! I’m still just getting started with the greater pulsar ecosystem, but I’ll try to be as clear as I can
----
2020-04-14 17:29:03 UTC - Rattanjot Singh: ```apache-pulsar-2.6.0-SNAPSHOT-src.tar.gz```
where can i get this
----
2020-04-14 17:34:06 UTC - Kirill Kosenko: hi guys
Functions have access to the shared state(Bokkeeper).
Is it possible to offload data from the state to cold storage(e.g AWS S3)
----
2020-04-14 17:46:12 UTC - Curtis Cook: I’m new to pulsar &amp; this slack, but I think the current version of the client is 2.5?
----
2020-04-14 17:51:36 UTC - Frans Guelinckx: i guess you can always check out the code from github and build it yourself: <https://github.com/apache/pulsar>
----
2020-04-14 18:54:20 UTC - Ryan Slominski: This might be a clue: if I use the --classname parameter the localrun continues to work as expected, but the error for the "create" command changes to "Source class xyz must be in class path".   It looks like Pulsar can't find the class?   It's a standalone instance of Pulsar.
----
2020-04-14 19:02:02 UTC - Ryan Slominski: Since the docs didn't say anything about the pulsar-io.yaml file, I wonder what else is missing from the docs that I need to do to make the custom connector work?
----
2020-04-14 19:38:40 UTC - Sijie Guo: Currently it is not supported yet. Can you create an issue for us?
----
2020-04-14 19:39:03 UTC - Sijie Guo: I think he is trying to build master.
----
2020-04-14 23:11:28 UTC - JG: Hey guys, I just tested the Pulsar client on java with a simple main class with a consumer and I see the execution takes 100% of the CPU !!!! Is it normal ? I am not a running a heavy task and he is already taking all the CPU... Are you experiencing the same problem with CPU usage ???
----
2020-04-14 23:57:09 UTC - Tolulope Awode: Thanks, I will get back to you on this
----
2020-04-14 23:58:14 UTC - JG: Problem fixed, I was instancing many Pulsar Client
+1 : Penghui Li, Shivji Kumar Jha
----
2020-04-15 00:42:10 UTC - Tolulope Awode: @Sijie Guo Thanks so much
----
2020-04-15 00:42:21 UTC - Tolulope Awode: It has worked now
----
2020-04-15 00:42:58 UTC - Sijie Guo: cool
----
2020-04-15 00:43:09 UTC - Tolulope Awode: Hope you wont mind if i check back in case of any other thing
----
2020-04-15 00:43:24 UTC - Tolulope Awode: later
----
2020-04-15 00:45:41 UTC - Sijie Guo: No  I don’t mind
----
2020-04-15 00:57:54 UTC - Tolulope Awode: Thanks
----
2020-04-15 02:47:27 UTC - busykid: @busykid has joined the channel
----
2020-04-15 05:05:17 UTC - Shivji Kumar Jha: @Raman Gupta @Xavier Levaux How about this option?
<https://pulsar.apache.org/docs/en/pulsar-admin/#set-schema-autoupdate-strategy>

The NONE option, i contributes sometime back , allows me to keep several schemas on the same topic. The ability to add multiple schemas to a topic is there and we have been using this in production for an year now.<http://now.it| It> does work well with Avro.

The remaining piece though is the compatibility check between the similar schemas in one topic is not there! Thats is a feature I want to add.
----
2020-04-15 05:14:01 UTC - Raman Gupta: @Shivji Kumar Jha NONE is the same as just turning off schema compatibility checks, right? That isn't something I'd want to do.
----
2020-04-15 05:15:18 UTC - Raman Gupta: It should work like the "subject" concept in Kafka Schema Registry. A schema is associated with a "subject" in the registry, and the subject can be configured to be a combination of topic name and record type. Schema compat is managed at the subject level.
----
2020-04-15 05:15:56 UTC - Raman Gupta: The subject concept effectively decouples the compatibility check logic from the serdes logic.
----
2020-04-15 05:16:02 UTC - Shivji Kumar Jha: I agree. I have been wanting to add that for some time. Are you working on contributing that too?
----
2020-04-15 05:16:40 UTC - Raman Gupta: I wish I could, but sadly don't have the time now to do it.
----
2020-04-15 05:18:33 UTC - Shivji Kumar Jha: ok, I just started looking at that and plan to add it in possibly 5.6.2 (earliest) or 5.6.0 if the change is deemed risky by the community... :slightly_smiling_face:
----
2020-04-15 05:20:30 UTC - Raman Gupta: Awesome @Shivji Kumar Jha, are you creating a PIP for this?
----
2020-04-15 05:21:16 UTC - Shivji Kumar Jha: shortly. I am collecting the details as of now. I will reach out to you if i need some help (review or otherwise) from your kafka experience :slightly_smiling_face:
+1 : Raman Gupta
----
2020-04-15 05:41:06 UTC - plsr: @plsr has joined the channel
----
2020-04-15 07:13:52 UTC - Xavier Levaux: Indeed Kafka has the very useful support of multiple schemas for the same topic.  It is very much needed, as many wants to store messages of different kind but all related to the same subject (topic) and that needs to be delivered in order (topic).
----
2020-04-15 07:18:57 UTC - Xavier Levaux: One use case: if I want to store events for one Aggregate (DDD), I need to have all those events stored in the same topic to keep them ordered.
I also need to have a schema for each of those different events.
I also need to have each schema evolve with forward compatibility.
All the requirements are very essential. Only having one kind of schema per topic makes Pulsar very limited and pretty much useless for scenarios when different kind of messages have to be stored together for order reason.
----
2020-04-15 07:52:48 UTC - Kirill Kosenko: Sure
----