You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/06/05 09:11:02 UTC

Slack digest for #general - 2018-06-05

2018-06-04 16:21:04 UTC - Igor Zubchenok: Hello, I get an exception when trying to compare two MessageIds of different types:
`IllegalArgumentException: expected BatchMessageIdImpl object. Got instance of org.apache.pulsar.client.impl.MessageIdImpl`

I think Pulsar client implementation should support comparison of any type of MessageIds.
----
2018-06-04 17:29:19 UTC - Sijie Guo: @Igor Zubchenok good point. Do you mind filing a github issue? 
----
2018-06-04 17:29:44 UTC - Igor Zubchenok: I'll do, maybe tomorrow
ok_hand : Sijie Guo
----
2018-06-04 18:22:04 UTC - Sijie Guo: Thank you Igor
----
2018-06-04 18:25:58 UTC - Josh West: does anyone have an example of how to connect to Pulsar using TLS Client Authentication with the Python client?
----
2018-06-04 18:29:22 UTC - Matteo Merli: Hi @Josh West, there are not concrete examples yet. It’s similar to the Cpp client configuration. <http://pulsar.apache.org/docs/latest/clients/Cpp/#Authentication-g5vab> in Python it would involve to create the `Authentication` object and pass it on when creating the client.
----
2018-06-04 18:29:38 UTC - Josh West: roger
----
2018-06-04 18:29:46 UTC - Josh West: i'm just not sure what to pass into the `Authentication` object yet
----
2018-06-04 18:29:53 UTC - Josh West: and i was being lazy and not figuring it out for myself
----
2018-06-04 18:30:56 UTC - Matteo Merli: the tricky part is it requires the `libtls.so` which is not directly included the Python wheel file
----
2018-06-04 18:32:55 UTC - Matteo Merli: it would have to be something like : 

```
Authentication('/path/to/tls.so', 'tlsCertFile=/path/to/client-cert.pem;tlsKeyFile=/path/to/client-key.pem')
```
----
2018-06-04 18:33:03 UTC - Josh West: that's easy enough
----
2018-06-04 18:33:05 UTC - Josh West: are there any hopes for a native Python client?  and a native Golang client?  (instead of the wrapping around the C++ client)
----
2018-06-04 18:34:33 UTC - Matteo Merli: ideally the “built-in” auth provider should be linked in, to make it easier to enable it .
----
2018-06-04 18:35:30 UTC - Josh West: i'm interested in a native Golang client mostly, just because developers here have mentioned that incorporating the C++ libs changes the threading model used within their Golang apps
----
2018-06-04 18:37:04 UTC - Matteo Merli: &gt; are there any hopes for a native Python client?  and a native Golang client?  (instead of the wrapping around the C++ client)

Not for short term. These both would include a significant amount of work. The best bang for the buck is with the C++ wrappers. For Python is mostly very easy to install with `pip`. For Go it’s going to be easy as well (with Hombrew,RPM,DEB packages).
----
2018-06-04 18:41:22 UTC - Matteo Merli: &gt; i’m interested in a native Golang client mostly, just because developers here have mentioned that incorporating the C++ libs changes the threading model used within their Golang apps 

It should not affect the Go threading model. All the blocking operations are waiting on Go channel, so it makes a seamless integration with Go
----
2018-06-04 18:42:33 UTC - Matteo Merli: Main problem with native libraries, is that it would take a lot of time to get on par with all features and debug/stabilize. All that work was already done for C++ and we can go 0-100 very quickly.
----
2018-06-04 18:43:55 UTC - Josh West: interesting re: go &amp; channels; i'll pass on the word
----
2018-06-04 18:44:07 UTC - Josh West: yeah that makes a lot of sense (why c++ and wrapping model)
----
2018-06-04 22:04:28 UTC - Bhargava: Guys, I am sorry I am new to this channel.But I have a decent experience in Python ,Distributed systems and Java and I wanted to help
----
2018-06-04 22:05:27 UTC - Bhargava: I would be happy to tag along with someone who needs help (even if it is writing test cases or documentation).All I want to do is contribute and learn in this process
----
2018-06-04 22:09:25 UTC - Matteo Merli: :+1:
----
2018-06-04 22:11:11 UTC - Jerry Peng: @Bhargava welcome! That sounds great! It is awesome that you want to contribute!
----
2018-06-04 22:11:29 UTC - Bhargava: Thank you Jerry.Lemme know if someone needs help
----
2018-06-04 22:11:50 UTC - Bhargava: I have exp in Python,WebUIs (Django) ,Distributed Systems and Java
----
2018-06-04 22:16:53 UTC - Jerry Peng: @Bhargava probably one to the first things you can do is 1) read up on pulsar architecture, 2) look at how to use the pulsar cli tools e.g. create topics, functions, etc, 3) look through the code for the CLI, understand it, and potentially improve upon it i.e. make sure number of parameters are right and of the right type etc.  4) Improve upon the documentation where needed.
+1 : Sijie Guo
----
2018-06-04 22:17:12 UTC - Bhargava: Sure Jerry that seems like a good start
----
2018-06-04 22:17:28 UTC - Bhargava: I will start picking it up immedietly.
+1 : Matteo Merli, Sijie Guo
----
2018-06-04 22:18:25 UTC - Jerry Peng: We need this kind of help and polishing especially for Pulsar Functions
----
2018-06-04 22:18:44 UTC - Jerry Peng: @Bhargava awesome! feel free the reach out if you have any problems or questions
----
2018-06-04 22:18:59 UTC - Bhargava: Sure Jerry thank you
----
2018-06-04 22:20:30 UTC - Sam Zeitlin: @Sam Zeitlin has joined the channel
----
2018-06-04 23:37:12 UTC - Sam Zeitlin: So I'm trying to test out the python API and it seems really really slow. I'm wondering if it's because it's defaulting to a single partition mode when really what I want is the round robin distribution, but I can't figure out how to configure that. Is the python API just lacking a lot of functionality still, or is it just the documentation that's really behind?
----
2018-06-04 23:48:35 UTC - Matteo Merli: @Sam Zeitlin Python lib is just a tiny wrapper on top of C++ which is very close in term of features to the Java client lib. I just realized the `partitions_routing_mode` is being being exposed in the Python wrapper. It is included in the BoostPython wrapper but it’s not exposed in the Python class that wraps that.
----
2018-06-04 23:51:57 UTC - Sam Zeitlin: I found this in `create_producer()`: 
&gt;     * `message_routing_mode`:
      Set the message routing mode for the partitioned producer.
----
2018-06-04 23:52:05 UTC - Sam Zeitlin: but when I tried to use it I got an error that it wasn't a valid argument.
----
2018-06-04 23:52:36 UTC - Sam Zeitlin: and I'm not sure what format to use for the value anyway, just the same camel case 'RoundRobinDistribution' ?
----
2018-06-04 23:52:49 UTC - Matteo Merli: For the publish is being slow I’d check this items: 
 * Use `send_async()` instead of `send()` otherwise each message has to wait for the previous one to be acknowledged by broker (throghput = 1/ latency) (eg: even at 1ms latency, the throughput is capped at 1K msg/s) 
 * Enable batching to achieve efficiency 

With these 2 items I was able to publish 100s of K of msg/s from Python 

Finally, as a temp workaround for the message routing not working, you can specify a “key” on the message, to have the message hashed into partitions
----
2018-06-04 23:53:34 UTC - Sam Zeitlin: So we are already using send_async and batching is enabled. Actually my real question is, it looks like async consumer is missing from the python API, I was just trying to figure out if having multiple consumers would be another way to test how fast it could go
----
2018-06-04 23:53:47 UTC - Sam Zeitlin: oh so how do  I specify a key on the message?
----
2018-06-04 23:53:48 UTC - Matteo Merli: &gt;    Set the message routing mode for the partitioned producer.
&gt; but when I tried to use it I got an error that it wasn’t a valid argument.
&gt; and I’m not sure what format to use for the value anyway, just the same camel case ‘RoundRobinDistribution’ ?

Yes, it needs to be added to to the `create_producer()` call, and the enum should be exposed too
----
2018-06-04 23:55:02 UTC - Matteo Merli: &gt; oh so how do  I specify a key on the message?

On `send()` (or `send_async()`) call, pass the `partition_key='my-string'` : <http://pulsar.apache.org/api/python/#pulsar.Producer.send>
----
2018-06-04 23:55:34 UTC - Sam Zeitlin: thanks!
----
2018-06-04 23:55:39 UTC - Matteo Merli: We’ll fix the routing issue :slightly_smiling_face:
----
2018-06-04 23:56:14 UTC - Sam Zeitlin: awesome, thanks!
----
2018-06-05 00:04:15 UTC - Sam Zeitlin: this didn't work either:
`create_producer() got an unexpected keyword argument 'partition_key'`
----
2018-06-05 00:04:57 UTC - Matteo Merli: that would be on the `producer.send()`, when publishing the messages
----
2018-06-05 00:05:50 UTC - Matteo Merli: the key set on each message gets hashed and the messages get routed to a particular partition (essentially, with “per-key” ordering)
----
2018-06-05 00:24:26 UTC - Sam Zeitlin: oh whoops I'll try that instead
----
2018-06-05 00:27:31 UTC - Sam Zeitlin: interesting. so the producer works but now I'm getting ConsumerBusy errors.
----
2018-06-05 00:31:32 UTC - Sam Zeitlin: ok weird I got it working. It's definitely faster but still not faster than Kafka.
----
2018-06-05 01:16:10 UTC - Matteo Merli: &gt; interesting. so the producer works but now I’m getting ConsumerBusy errors.

`ConsumerBusy` is usually because the default subscription type is `Exclusive`. To have a similar behavior as Kafka, you should use `Failover`. If you don’t require ordering, best option is to use `Shared` ( <http://pulsar.apache.org/docs/latest/getting-started/ConceptsAndArchitecture/#Subscriptionmodes-v795tf> )
----
2018-06-05 01:16:47 UTC - Matteo Merli: &gt; ok weird I got it working. It’s definitely faster but still not faster than Kafka. 

There should be something else then, since it should be much faster in comparison :slightly_smiling_face:
----
2018-06-05 02:40:01 UTC - Matteo Merli: Did you enable batching &amp; async publish?
----