You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/08/23 09:11:04 UTC

Slack digest for #general - 2019-08-23

2019-08-22 09:14:46 UTC - Bon: I think you can read document again in geo-replication.
----
2019-08-22 10:08:12 UTC - Jan-Pieter George: Nice, checking it out. Do you use the websocket connection or a raw socket? I'm interested into having a multi threaded message pump (especially interesting with the nack and delayed delivery capabilities, incredibly amazing).
----
2019-08-22 11:49:39 UTC - JoeyDeng: @JoeyDeng has joined the channel
----
2019-08-22 12:33:50 UTC - Alexandre DUVAL: Pulsar SQL worker can't be joined through pulsar proxy?
----
2019-08-22 13:40:14 UTC - Sijie Guo: the later
----
2019-08-22 14:53:09 UTC - Addison Higham: they directly interact with bookkeeper
----
2019-08-22 14:59:04 UTC - Richard Sherman: Is there any documentation on setting up bookies to be rack aware?
----
2019-08-22 15:01:39 UTC - Alexandre DUVAL: What do you mean?
----
2019-08-22 15:03:38 UTC - Alexandre DUVAL: pulsar-proxy reads ssl configurations periodically or watch them? Because I updated my certificates files without reload or restart it and the certificate is the new one.
----
2019-08-22 15:03:50 UTC - Addison Higham: question about the configuration store (globak zk): Has there been any consideration for making that pluggable with different storage backends? I looked through the code and AFAICT, it isn't using any more advanced ZK features (locks, leader election, only uses watches for cache invalidation) for the global zk and it is mostly just pretty straight forward storage of data config.

The reason I ask: I wonder how reasonable it would be if you could implement the config store on top of something like gcp cloud spanner or dynamodb global tables. For dynamodb global tables, you could have conflicting writes, but for the config store, I wonder if that would be okay?
----
2019-08-22 15:05:31 UTC - Addison Higham: maybe I misunderstood your question, are you asking if you can have sql workers connect to pulsar via the proxy? Or are you asking if you can interact with the workers API via the proxy
----
2019-08-22 15:06:22 UTC - Addison Higham: in 2.4.0 they added that yes
----
2019-08-22 15:08:42 UTC - Alexandre DUVAL: Second part :slightly_smiling_face:
----
2019-08-22 15:09:21 UTC - Addison Higham: ah, not sure of that
----
2019-08-22 15:09:44 UTC - Alexandre DUVAL: It watches? Have your more informations of the process used?
----
2019-08-22 15:10:27 UTC - Alexandre DUVAL: From the proxy.conf I'll say no (yet not :p). But okay.
----
2019-08-22 15:12:39 UTC - Addison Higham: trying to find the code, but IIRC, it polls periodically, will post a link to code here in a minute
----
2019-08-22 15:14:14 UTC - Alexandre DUVAL: Thanks, more is it the same behavior for pulsar sql worker ssl conf? :smile:
----
2019-08-22 15:14:38 UTC - Alexandre DUVAL: because if pulsar-proxy can't manage sql workers need to have this on pulsar sql workers too.
----
2019-08-22 15:15:13 UTC - Addison Higham: <https://github.com/apache/pulsar/blob/d3643a072c6dfd444974e0f8b864fc053cfdb4f8/pulsar-common/src/main/java/org/apache/pulsar/common/util/SslContextAutoRefreshBuilder.java>
----
2019-08-22 15:16:16 UTC - Alexandre DUVAL: Oh it's a generic one, so probably used in pulsar sql worker too :slightly_smiling_face:
----
2019-08-22 15:21:10 UTC - Addison Higham: was trying to track down it's usage, haven't yet, but I would hope so!
----
2019-08-22 16:22:22 UTC - Raman Gupta: Can I "copy" the list of acked messages from one consumer to another? The use case would be creating a consumer that starts from where an existing consumer has left off. In Kafka, this would be done simply be just setting the offsets of the new consumer to the offsets of the old one.
----
2019-08-22 16:23:25 UTC - Addison Higham: @Raman Gupta <https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Consumer.html#seek-org.apache.pulsar.client.api.MessageId->
----
2019-08-22 16:24:33 UTC - Addison Higham: the seek basically resets the position on the server, so you just restart from there
----
2019-08-22 16:24:49 UTC - Raman Gupta: Yeah, that covers the equivalent of what Kafka does with offset, but it doesn't handle the per-message ack semantics of Pulsar.
----
2019-08-22 16:27:25 UTC - Addison Higham: not sure I understand, once you seek to that position, you subscription behaves pretty much like a normal subscription
----
2019-08-22 16:29:11 UTC - Raman Gupta: As I understand Pulsar consumer semantics (and I'm only researching at this point, so my understanding could be wrong), a consumer could receive messages 1, 2, 3, ack 2, and die. That leaves 1 and 3 unacked. At this point, if I use the seek method above to seek to ~2~ 3, I will miss message 1 right?
----
2019-08-22 16:31:00 UTC - Addison Higham: yes, but if you are just trying to reconnect to an existing subscription, why would you seek? pulsar will redeliver any unacknowledged messages when you reconnect
----
2019-08-22 16:31:50 UTC - Addison Higham: (this would be shared subscription behavior, with an exclusive or failover subscription, cumulative acking is used, so acking 2 would also ack 1)
----
2019-08-22 16:32:56 UTC - Raman Gupta: I'm just thinking through operational concerns. For example, I might want to rename a subscription. There might be other use cases as well for creating a new subscription that is "copy" of an existing one.
----
2019-08-22 16:44:03 UTC - Raman Gupta: Seek works, just have to make sure that all the messages prior to the seeked one have been consumed/acked, and that none of the messages after it have been, by the original subscription. Is there an easy way to verify that?
----
2019-08-22 17:12:45 UTC - jialin liu: Hi, what is the typical message size that pulsar can handle? ~KB or ~MB?
----
2019-08-22 17:34:57 UTC - Jon Bock: Both are possible with the right configuration, KBs are by far the most common but up to 1MB message sizes are regularly tested.
----
2019-08-22 17:35:39 UTC - jialin liu: Is it designed to handle image or video? if not, do you have any suggestions?
----
2019-08-22 17:46:14 UTC - Matteo Merli: @Addison Higham Yes, the plan was to make it pluggable. The only feature we use (other than pure get/put) is the notifications. The watches are use to make sure the policies caches are updated in all brokers.

That part needs to be abstracted out, since most key-value stores won’t offer that.
----
2019-08-22 17:48:38 UTC - Jon Bock: Pulsar is agnostic to the message content type.  For very large message payloads, you can either break up the payload into smaller pieces (e.g. break up a video into frames) or have the message body include a reference to the external location of the object.
----
2019-08-22 17:49:27 UTC - Addison Higham: :thinking_face: what are the consistency needs for that data? I couldn't see any immediate problem why the dynamo global tables (multi-master eventual consistency with last-write-wins conflict resolution) could  be okay as most changes are made to the local table and then just followed by other regions
----
2019-08-22 17:49:43 UTC - Matteo Merli: eventual consistency is perfectly fine
----
2019-08-22 17:50:04 UTC - Addison Higham: and the last-write-wins? is there any data that would be contented between different regions?
----
2019-08-22 17:50:21 UTC - Addison Higham: I couldn't see anything obvious in the geo-replication use case
----
2019-08-22 17:51:06 UTC - Matteo Merli: uhm, ideally there you’d want to last-write-wins, to ensure that every one will eventually reach the same state
----
2019-08-22 17:52:04 UTC - Matteo Merli: the advantage for ZK there is that we have a global quorum that is writable when 1 region is out, yet still strong-consistent on writes
----
2019-08-22 17:57:12 UTC - Addison Higham: :thinking_face: so dynamo global tables would seem to work fine from a consistency perspective, you could use dynamodb streams for notifications to get the cache updates as well
----
2019-08-22 18:00:06 UTC - Addison Higham: and you could mostly just change the current globalZk to a new interface of something like:
- put(fullPath, data)
- get(fullPath)
- listChildren(path)
- watch(path, callback)
?

Or would you want to make more of a DAO/Model pattern and move most of the state changes into a more explicit calls?
Like `configStore.addNamespace(namespace)` etc?
----
2019-08-22 18:08:00 UTC - Karthik Ramasamy: Some users are using close to 5MB size messages
----
2019-08-22 18:11:01 UTC - Raman Gupta: A related question to my previous one: what would the best metric be to track that would be the equivalent of Kafka's consumer lag?
----
2019-08-22 18:18:48 UTC - Matteo Merli: I need to retake a look at that code. The current approach is caching objects, already deserialized from JSON after reading from ZK, with the watches to trigger the cache invalidation.
----
2019-08-22 18:20:17 UTC - Devin G. Bost: ```
rg.apache.pulsar.client.impl.ClientCnx  : Error during handshake

javax.net.ssl.SSLException: SSLEngine closed already
	at org.apache.pulsar.shade.io.netty.handler.ssl.SslHandler.wrap(...)(Unknown Source) ~[pulsar-client-2.4.0.jar!/:2.4.0]

2019-08-22 12:14:34.068  WARN 7 --- [r-client-io-1-1] org.apache.pulsar.client.impl.ClientCnx  : [<http://dec01.overstock.com/10.15.33.233:8080|dec01.overstock.com/10.15.33.233:8080>] Got exception DecoderException : javax.net.ssl.SSLHandshakeException: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
```

Have any of you guys seen this before?
----
2019-08-22 18:20:55 UTC - Karthik Ramasamy: @Raman Gupta There is a backlog metric for every subscription
----
2019-08-22 18:30:38 UTC - Addison Higham: I will look as well and try and do a write up, I am about to embark on either this or global zk, so motivated to at least do the research to see if it is worth pursuing
----
2019-08-22 18:31:00 UTC - Matteo Merli: :+1:
----
2019-08-22 18:38:19 UTC - Devin G. Bost: I'm looking at:

```
PulsarClient client = PulsarClient.builder()
    .serviceUrl("<pulsar+ssl://broker.example.com:6651/>")
    .enableTls(true)
    .tlsTrustCertsFilePath("/path/to/ca.cert.pem")
    .authentication("org.apache.pulsar.client.impl.auth.AuthenticationTls",
                    "tlsCertFile:/path/to/my-role.cert.pem,tlsKeyFile:/path/to/my-role.key-pk8.pem")
    .build();
```
in the docs (<https://pulsar.apache.org/docs/en/security-tls-authentication/>), and I'm noticing that `.enableTls(..)` is actually deprecated.
----
2019-08-22 18:38:24 UTC - Devin G. Bost: What's the reason for the deprecation?
----
2019-08-22 18:38:58 UTC - Devin G. Bost: We're thinking that the handshake exception has something to do with the TLS configuration.
----
2019-08-22 18:39:58 UTC - Devin G. Bost: @David Kjerrumgaard Have you seen this before?
----
2019-08-22 18:47:34 UTC - Sijie Guo: `pulsar+ssl` indicates it is a TLS secured service. so we don’t actually need to specify `enableTls(true)`.
+1 : Devin G. Bost, Ali Ahmed
----
2019-08-22 18:50:26 UTC - Matteo Merli: Most of the time it’s a client with TLS connecting to a non encrypted endpoint or vice-versa
+1 : Devin G. Bost
----
2019-08-22 18:50:57 UTC - Matteo Merli: (or non valid certificates configured in brokers)
----
2019-08-22 19:11:08 UTC - Pete Tanski: @Pete Tanski has joined the channel
----
2019-08-22 20:30:21 UTC - Raman Gupta: Thanks. I did note when I fired up the sandbox the backlog shown in the dashboard was wildly incorrect.
----
2019-08-22 20:54:08 UTC - Igor Zubchenok: --
After update from build `Pulsar 2.2.0-streamlio-5` to `Pulsar 2.4.0`, I noticed ~3-4x slower* performance than we had with 2.2.0.
We investigated a bit changes and found out that for every topic (we have around 50-100K alive topics) there is a new delayed delivery feature that uses some system resources and this caused to some performance degradation.
After I added `delayedDeliveryEnabled=false` we got I a bit better, but it is still ~2x slower.
What else could be tuned to get better performance as it was in 2.2.0 or better?
P.S. * _slower_ — I mean we have slower time of delivery of a message from publisher to consumer.
----
2019-08-22 21:42:21 UTC - Luke Lu: You mean the latency between consumer time and producer time? Is throughput (messages per second) is affected? Do you have any numbers?
----
2019-08-22 23:45:11 UTC - Matteo Merli: @Igor Zubchenok

&gt; there is a new delayed delivery feature that uses some system resources and this caused to some performance degradation.

Do you have a heap dump that shows the diff between the 2? The expectation is that there should be no difference if the messages are not marked for delays.

&gt;  I mean we have slower time of delivery of a message from publisher to consumer.

Can you quantify it in absolute numbers (eg: 10 to 20ms? avg or 99pct?)
----
2019-08-22 23:57:02 UTC - Addison Higham: @Matteo Merli <https://docs.google.com/document/d/18HPgFN8LOsxSBIScrldKWTmJvFkUvRemMoit1KQjcik/edit#> first really rough pass at some research and my best approximation of what I think might work
----
2019-08-22 23:57:44 UTC - Addison Higham: right now, I am thinking we are going to move forward with global ZK right now, but we might be able to lend a hand if this is something that get some traction
----
2019-08-22 23:58:20 UTC - Addison Higham: am headed out, but feel free to add comments there
----
2019-08-22 23:58:51 UTC - Matteo Merli: yes, we had this task planned for this year, both to make the conf store pluggable as well as the general metadata store
----
2019-08-22 23:59:14 UTC - Matteo Merli: (don’t have access to your doc yet)
----
2019-08-23 00:56:25 UTC - Igor Zubchenok: I need to prepare to answer. I'll back with something.
----
2019-08-23 03:12:00 UTC - Igor Zubchenok: I din't found anything in Pulsar broker metrics, but found that producer has unstable send latency. (this chat is for pulsar producer stat, I publish a small message every 100 ms to topic in production cluster and collect stat every second)
----
2019-08-23 03:14:33 UTC - Igor Zubchenok: @Matteo Merli I din't found anything in Pulsar broker metrics, but found that producer has unstable send latency.
----
2019-08-23 03:15:29 UTC - Igor Zubchenok: &gt;Do you have a heap dump that shows the diff between the 2
No, I don't have old version running anymore.
----
2019-08-23 03:46:59 UTC - Igor Zubchenok: How to find the reason of latency?
----
2019-08-23 04:34:03 UTC - tuteng: hi jerry

any progress on this PR?
----
2019-08-23 06:40:44 UTC - Walter: @Walter has joined the channel
----
2019-08-23 07:17:13 UTC - Vladimir Ontikov: @Vladimir Ontikov has joined the channel
----
2019-08-23 07:44:27 UTC - Richard Sherman: It isn't so much wildly incorrect just out of date, as the dashboard by default  collects stats once a minute.
----
2019-08-23 08:38:30 UTC - Wenjian Jiang: @Wenjian Jiang has joined the channel
----