You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/03/04 09:11:02 UTC

Slack digest for #dev - 2020-03-04

2020-03-03 17:29:25 UTC - John Duffie: We have a use case that does require this.   We have generic logic that operates on GenericRecords of unknown schema using keys that are passed as arguments into the job
----
2020-03-03 17:32:10 UTC - John Duffie: I looked at FlinkPulsarRowSource, PulsarRowFetcher, and RowReaderThread.  I see how it is deserializing to a GeneriRecord, using the record to create the Row object, and then emitting the row
----
2020-03-03 17:38:45 UTC - John Duffie: while I could craft a similar solution for GenericRecord, it looks like it would require new Source, Fetcher,  ReaderThread, and  a deserializer similar to PulsarDeserializer
----
2020-03-03 17:48:03 UTC - Sijie Guo: I think it just needs a GenericRecordDeserializer. I don’t think you need to rewrite a new source. @yijie is it correct?
----
2020-03-03 17:51:01 UTC - John Duffie: was hoping this would just be a new deserializer passed to FlinkPulsarSource that would include client calls to the broker when schema ID changed?
----
2020-03-03 18:12:56 UTC - Sijie Guo: correct
----
2020-03-03 18:13:11 UTC - Sijie Guo: similar as pulsar AUTO_CONSUME schema
----
2020-03-03 18:36:02 UTC - Devin G. Bost: We created a Pulsar consumer that writes to the Admin API HTTP endpoint directly. We got much better performance that way than from the other approaches.
----
2020-03-03 18:36:59 UTC - Devin G. Bost: However, that's for bare metal docker. I imagine it would work just as well in k8s.
----
2020-03-03 22:10:50 UTC - John Duffie: would you send me a pointer?  I’m not seeing how this is doable if the deserializer can only access the byte array from the Message object and not the Message metadata.
----
2020-03-04 00:02:39 UTC - Derek Moore: @Derek Moore has joined the channel
----
2020-03-04 00:03:41 UTC - Derek Moore: Hi! I'm with Dell EMC &amp; the <http://Pravega.io|Pravega.io> team. We have some similarities with Apache Pulsar and some differences. I was wondering if anyone in here could speak to some Pulsar's most important or impactful performance-related issues and fixes.
----
2020-03-04 00:04:13 UTC - Ali Ahmed: @Derek Moore what would like to know ?
----
2020-03-04 00:05:14 UTC - Derek Moore: well, I was digging through release notes trying to gather how Pulsar has tuned performance, and I have found a few things here and there ... but I wondered if someone here might be more informed
----
2020-03-04 00:07:26 UTC - Matteo Merli: the perf improvements have been in general spread out over a long period. so there's not a single release about it. in any case, what aspect of performance are you referring to?
----
2020-03-04 00:15:29 UTC - Derek Moore: Right, that's what I expected... A bunch of little things all adding up. We're doing some extensive performance testing here (vs. Kafka, etc.) that I'd like to talk about more (I'll try to get clearance to do that), and Pulsar performs well.
----
2020-03-04 00:17:13 UTC - Derek Moore: I guess I was hoping someone could recall a small handful of fixes that were most impactful. For example, I found this issue among others that appear promising: <https://github.com/apache/pulsar/pull/3732>
----
2020-03-04 00:18:19 UTC - Derek Moore: Pravega and Pulsar share common roots in Bookkeeper
----
2020-03-04 00:19:41 UTC - Matteo Merli: Yes. Much of the effort has been on the BookKeeper side. We have done improvement to meet various different perf goals, from publish latency to write throughput and to read throughput
----
2020-03-04 00:21:28 UTC - Matteo Merli: that's why I was asking which aspect of performance you were thinking of
----
2020-03-04 00:23:24 UTC - Matteo Merli: In any case, for baseline measurements you can take a look at <http://openmessaging.cloud/docs/benchmarks/> I think it should be straightforward to add a driver implementation for Pravega into it
----
2020-03-04 00:35:31 UTC - Jeon.DeukJin: @Jeon.DeukJin has joined the channel
----
2020-03-04 00:41:18 UTC - Derek Moore: we are looking at publish latency figures in particular right now
----
2020-03-04 00:44:15 UTC - Derek Moore: that's a great idea, thanks!
----