You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/01/06 09:11:03 UTC

Slack digest for #general - 2020-01-06

2020-01-05 10:16:31 UTC - Eugen: Question about multiple subscriptions. [The docs](<https://pulsar.apache.org/docs/en/concepts-messaging/#multi-topic-subscriptions>) read:
&gt; *No ordering guarantees*
&gt; When a consumer subscribes to multiple topics, all ordering guarantees normally provided by Pulsar on single topics do not hold. If your use case for Pulsar involves any strict ordering requirements, we would strongly recommend against using this feature.
`all ordering guarantees normally provided by Pulsar on single topics do not hold` sounds like even messages of a single topic can be received out of order, which is something I cannot really believe. So am I reading this right? And if so, do these out-of-order deliveries happen in failover scenarios only?
----
2020-01-05 12:14:42 UTC - juraj: two topics can run on two different brokers (processes), so naturally there's no built-in ordering guarantee, as the performance and complexity cost to organize this would be prohibitive.
to get a guaranteed per-topic ordering, just use one consumer per one topic.
these separate consumers still share the same pulsar client on the client side, so not a huge deal.
(you can also add ordering information into your data, and sort on the client.)
----
2020-01-05 13:31:34 UTC - Nicolò Paganin: @Nicolò Paganin has joined the channel
----
2020-01-05 14:36:27 UTC - Erik Oomen: @Erik Oomen has joined the channel
----
2020-01-05 19:27:40 UTC - Sean Carroll: is there any where to read information regarding this and why that is the case? I can understand why there would be no ordering guarantees across topics but would still think that there would be a order guarantee for each individual topic
+1 : Eugen
----
2020-01-05 20:22:15 UTC - Devin Fee: @Devin Fee has joined the channel
----
2020-01-05 20:46:47 UTC - Devin Fee: I have a few questions regarding how Pulsar readers work:
1. is it possible to update the topics a reader is subscribed to, or would I create new readers on the client for each new topic i’d like to subscribe to? (e.g. for something like 20k topics on a single server, and aren’t regex friendly)
2. i was reading about using pulsar with 1 topic per user – and that works fine enough if all of your messages can be handled by the same subscriber… but as soon as you want different services to handle different messages – this model seems to break. Unless… there’s a way to subscribe to topic-partitions?
3. (aka 2.b) is the general suggested architecture to have one topic per activity-type and use user-ids as keys? but then without scanning through all events, how would we get the activity feed for a particular user?
Thanks!
----
2020-01-05 22:22:35 UTC - Eugen: @juraj So to paraphrase: Per-topic ordering guarantees are not affected by multi topic subscriptions, but cross-topic ordering guarantees cannot be made
----
2020-01-05 22:23:16 UTC - Eugen: This is what I would have expected, but the wording of the docs suggest that "all bets are off", which did not make a lot of sense to me.
----
2020-01-05 22:44:34 UTC - juraj: yeah that's what i'd expect based on how i understand pulsar working inside... the docs have a big room for improvement
----
2020-01-06 00:04:29 UTC - markg: @Matteo Merli  - Thanks for running through those points from that medium post.
----
2020-01-06 00:47:05 UTC - David Kjerrumgaard: @Devin Fee For #1, I am going to assume you mean consumers, which have subscriptions, vs. readers, which read data from a fixed position based on how it is configured. There isn't a way to have a consumer dynamically subscribe to new topics other than the regex subscription. If you can't describe the new set of topics using a regex, how would you identify the topics you are interested in?   For #2,  There are a couple of solutions that come to mind, the first is keyed-subscriptions, which ensures that messages with the same key are consumed by the same consumer. If that doesn't work, it would be very easy to have a simple Pulsar function that consumes from the "main" topic and routes the messages to different topics based on the message content and/or properties.  For #3, I guess it depends on what you are trying to achieve. If you want to use the same logic for all events of the same activity-type, then one topic with one consumer that processes the activity-type data is the best approach. You can also have a second key-shared subscription that separates the activity by user-id.
----
2020-01-06 02:50:47 UTC - Devin Fee: thanks for replying! a couple thoughts…

i _could_ describe it as a regex, but that could be a huge regex : `&lt;user-id-1&gt;|&lt;user-id-2&gt;|&lt;user-id-3&gt;|&lt;...etc.&gt;` and then my consumer wants to update those topics it’s interested in, rather than providing a change-set of incremental (un-)subs. my hunch is that this is not a good idea.
----
2020-01-06 03:27:44 UTC - Eugen: I've created <https://github.com/apache/pulsar/pull/5995> to improve the docs
----
2020-01-06 03:36:42 UTC - David Kjerrumgaard: @Devin Fee Why not use a regex like `user-id-*` ?
----
2020-01-06 03:42:01 UTC - David Kjerrumgaard: The regex subscription would only pick up topics that are created AFTER the consumer is started (assuming they match the regex).  Are you looking for a way to dynamically change the topics that the consumer is consuming from, i.e. add and remove topics? E.g.  Start consuming from `user-id-1`, `user-id-5`, and `user-id-7`.....and later add `user-id-9` &amp; `user-id-10`......then later stop listening to `user-id-5` ?
----
2020-01-06 03:43:47 UTC - Devin Fee: ^^ exactly
----
2020-01-06 03:43:55 UTC - Devin Fee: 
----
2020-01-06 03:45:59 UTC - David Kjerrumgaard: First question then is how would you determine / generate this list?  If you can automate it then there might be a way to have the code running the consumer dynamically scan a DB table, file etc for this list
----
2020-01-06 03:46:13 UTC - Devin Fee: you could even think of that as a basic chat app. a user might want to join a particular channel (e.g. slack channel), and register a subscription.
----
2020-01-06 03:48:05 UTC - Devin Fee: yeah, so what topics a user wants to follow… is up to the user. let’s assume it’s stored as many-to-many mapping in a sql-db. e.g. `[channels] &gt;--&lt; [channels_users] &gt;---&lt; [users]`
----
2020-01-06 03:48:51 UTC - David Kjerrumgaard: I think that is doable with a combination of some coding logic wrapped around the consumer that gets alerted to these changes and can start new subscriptions for additions and stop subscriptions for deletions.....
----
2020-01-06 03:49:11 UTC - Devin Fee: yeah, i actually thought that might be too complex.
----
2020-01-06 03:49:53 UTC - David Kjerrumgaard: It is a bit complex, which is why isn't implemented directly inside Pulsar...  :smiley:
----
2020-01-06 03:50:21 UTC - Devin Fee: basically, i can get an ordered list of messages for any channel from SQL (`select * from … order by created_at desc limit 100`), but it’s the real-time component that makes this interesting…
----
2020-01-06 03:53:48 UTC - Devin Fee: i.e. if i have 20 websocket servers, each serving 1000 clients (browsers, smartphones, whatever) and each one of those can monitor a particular channel in realtime (e.g. you might be watching this thread between you and me right now)…
----
2020-01-06 03:55:19 UTC - David Kjerrumgaard: Yea, it would be best if there was a way to have the web-clients send these notifications to a "consumer-config" topic that notifies it of changes to the subscriptions then you could react accordingly....
----
2020-01-06 03:55:21 UTC - Devin Fee: the big gap in my understanding is whether pulsar supports this notion of drift… a user might want to switch the channel they’re watching (i.e. -1 / +1 subscription event at the user level)
----
2020-01-06 03:56:56 UTC - Devin Fee: so is your suggestion effectively to provide routing as a microservice itself? i.e. receiving the firehose of event data,  then re-publishing to downstread (subscribed) consumers?
----
2020-01-06 03:57:08 UTC - David Kjerrumgaard: So the goal is to be able to dynamically configure a consumer to listen to N of the 20K channels?
----
2020-01-06 03:57:18 UTC - Devin Fee: exactly
----
2020-01-06 03:58:36 UTC - David Kjerrumgaard: Yes, I am suggesting something to that effect.  It solves the real-time update nature of the problem. Your consumer would ALWAYS be listing to a "control topic" to receive these requests. Then you can adjust your consumers accordingly
----
2020-01-06 03:59:05 UTC - David Kjerrumgaard: check it is already subsribed, if it matches an existing regex sub, etc.
----
2020-01-06 03:59:32 UTC - David Kjerrumgaard: if not, start a new consumer thread.
----
2020-01-06 03:59:45 UTC - Devin Fee: “consumer thread” being a new “topic”?
----
2020-01-06 04:01:05 UTC - David Kjerrumgaard: potentially, depending on the request.  You could have requests that match a regex, etc
----
2020-01-06 04:02:04 UTC - David Kjerrumgaard: the problem would be scaling one process past 50+ topics.
----
2020-01-06 04:02:20 UTC - Devin Fee: ok yeah, you mean “thread” as in the operating system unit
----
2020-01-06 04:02:26 UTC - David Kjerrumgaard: so getting more topics / thread would be a big win
----
2020-01-06 04:03:47 UTC - Devin Fee: it’s really about where the filtering happens though… right?
----
2020-01-06 04:04:02 UTC - David Kjerrumgaard: Yes, I am envisioning the consumer being a Java / Python app that has a thread pool associated with it. One thread is always reading from the "control topic" and when a new message comes in, it decodes the command and either starts a new consumer in a thread or halts one for a delete etc.
----
2020-01-06 04:04:31 UTC - Devin Fee: e.g. each `websocket subscription server` could receive a firehose, and filter that firehose itself… or that could be pre-filtered.
----
2020-01-06 04:05:38 UTC - David Kjerrumgaard: What message type is in the firehose?
----
2020-01-06 04:05:48 UTC - Devin Fee: i guess the point i’m trying to clear up is that when you say “topic” you’re talking about a concept outside the domain of pulsar right?
----
2020-01-06 04:06:22 UTC - David Kjerrumgaard: no, I meant a pulsar topic. just used for a different purpose.
----
2020-01-06 04:09:38 UTC - David Kjerrumgaard: If I am understanding the use case correctly (big if). Then when a web-client wants to start listening to a subset of the 20K channels (20 websocket servers, each serving 1000 clients) then it would send a command that encodes that into a Pulsar topic that the Consumer app is subscribed to.  One consumer app / web-client
----
2020-01-06 04:10:30 UTC - David Kjerrumgaard: the consumer app would interpret the commands to get the establish a subscription on the proper channels, collect the messages from them and send them back to the web-client
----
2020-01-06 04:10:59 UTC - David Kjerrumgaard: web-client -----&gt; consumer app ---&gt; (multiple topics)
----
2020-01-06 04:17:23 UTC - Devin Fee: maybe a concrete example is `slack`. you and i are both in this `#general` channel in our web browser, (or electron apps, or mobile apps, etc.) and potentially have multiple concurrent sessions.

`slack` has (let’s say) 100 websocket-subscription servers running that we’re connecting to … to get real time updates from these chat messages).

if they were going to use `pulsar` , then a dumb implementation would be that each websocket-subscription server would need to receive the firehose of all messages, filter messages relevant to it’s websocket-consumers, and then transform-and-forward them.

i.e. “it’s brokers all the way down, but a pulsar broker at the top”
----
2020-01-06 04:22:16 UTC - Devin Fee: because they can’t revise the topics they’re subscribed to without destroying their current `subscription` and creating a new `subscription`, they really do have to subscribe to all topics from the get-go.
----
2020-01-06 04:23:19 UTC - Devin Fee: also, these subscriptions are ephemeral. if kubernetes kills off a server, or creates a new one, we don’t need to persist those pulsar-subscriptions indefinitely. (maybe this is where the idea of the reader interface comes in?)
----
2020-01-06 04:25:28 UTC - Devin Fee: this appears to be the same problem as the one in computer networking – “broadcast / multicast / unicast” <https://www.esds.co.in/blog/wp-content/uploads/2016/04/Difference-between-unicast-broadcast-and-multicast-diagram.png>
----
2020-01-06 04:29:13 UTC - Devin Fee: so pulsar seems to support `uincast` fine (one subscription), and `broadcast` fine (more-than-one subscription… and perhaps this mysterious “reader interface”), but `multicast` is a “domain-level” problem that gets pushed to devs
----
2020-01-06 04:31:15 UTC - David Kjerrumgaard: The reader interface just allows you to start consuming messages from a topic that a previous point in time, so you can review historical data, i.e. data was delivered and acknowledged by all active consumers at the time it was created.
----
2020-01-06 04:32:06 UTC - Devin Fee: does the reader interface also update you with the latest messages in a topic?
----
2020-01-06 04:32:27 UTC - Devin Fee: i.e. is it like an *ephemeral* subscription, or is there something like that with pulsar?
----
2020-01-06 04:44:31 UTC - David Kjerrumgaard: No it allows you to control where you start reading from and you can do a `while reader.*hasMessageAvailable*() { reader.readNext() }`
----
2020-01-06 04:47:35 UTC - David Kjerrumgaard: The consumer on the other hand does `while (consumer.isConnected() ) { Message m = consumer.receive(); }`
----
2020-01-06 04:48:09 UTC - David Kjerrumgaard: so it blocks until a new message arrives, but doesn't read previous messages on the topic. It starts from the most recent message
----
2020-01-06 04:49:17 UTC - David Kjerrumgaard: You _can_ use the `seek` method on a consumer to position yourself before the most recent message if you desire.
----
2020-01-06 04:51:32 UTC - Devin Fee: alright, thanks for your help.
----
2020-01-06 04:51:42 UTC - Devin Fee: i’ve got to spend some time thinking about these constraints
----
2020-01-06 06:34:02 UTC - Tilden: Hi All, anyone knows how to do a backup and restore of Apache Pulsar ZooKeeper and Bookkeeper ? any document reference?
----