You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/10/04 09:11:03 UTC

Slack digest for #general - 2019-10-04

2019-10-03 12:02:10 UTC - Julien Lechalupé: Hello, i would like to know if Pulsar is able to support several millions of topics (3M to 4M in my use case) ? I found this link: <https://github.com/apache/pulsar/wiki/PIP-8:-Pulsar-beyond-1M-topics>
Just wanted to know if this method is still up to date ?
----
2019-10-03 14:12:19 UTC - Raman Gupta: Would the best way to upgrade from a standalone Pulsar to a multi-node cluster be to configure synchronous geo-replication, wait for everything to be caught up, switch over the clients to the new cluster, and then finally shut down the standalone cluster?
----
2019-10-03 16:38:55 UTC - BigSam: @Ali Ahmed Thanks for your reply. The first try for the Java client on Android failed, seems the Java client is not pure Java. It depends on some native libraries that not compatible with Android. Will give a second try to look into the details.
----
2019-10-03 16:43:34 UTC - Joshua Dunham: @Joshua Dunham has joined the channel
----
2019-10-03 16:44:50 UTC - Joshua Dunham: Hi Everyone
----
2019-10-03 16:45:28 UTC - Jon Bock: Welcome @Joshua Dunham!
----
2019-10-03 16:45:52 UTC - Joshua Dunham: I have a question on the schema registry component -- The docs say more info for integrations coming soon but I'm wondering if these include a hook into Apache Atlas.
----
2019-10-03 16:46:21 UTC - Raman Gupta: Its probbaly the native epoll stuff for Netty
----
2019-10-03 16:47:18 UTC - Ali Ahmed: @BigSam if you have a stacktrace please share it, I have not aware of required native dependencies. There should be a fallback
----
2019-10-03 16:48:31 UTC - Joshua Dunham: I have a standalone cluster running and it's been great so far. One concern is that many existing functions that the Apache foundation has 'solved' are being re-done in Pulsar. (Pulsar functions == Whisk, Pulsar Schema == Atlas).
----
2019-10-03 16:49:10 UTC - Matteo Merli: @BigSam there are few native library, in addition to netty, though these are all used in a way to fall back into pure java implementation
----
2019-10-03 16:51:51 UTC - Matteo Merli: The reasons for that was to have a self-contained and well integrated implementation.

For example, OpenWhisk would require a number of other supporting systems (from CouchBases on.. )

For schema registry, we wanted it to be an integral component of Pulsar, to enforce the schema at the broker level, and not introduce any other additional system dependencies.
----
2019-10-03 16:52:39 UTC - Matteo Merli: Barring that, we’re always open to work on integration / interoperability with these external systems
----
2019-10-03 17:19:40 UTC - Luke Lu: IMO, the broker should focus on pushing opaque messages efficiently and correctly (e.g. by taking advantage of an end-to-end message checksum/digest). We already had to deal with embedded websocket proxy causing production issues. Schema resolution could be better done in the client or a separate proxy.
----
2019-10-03 17:29:14 UTC - Matteo Merli: The broker is not doing ser/deser though, it will just validate the schema definition when a producer session is established. There’s no perf penalty in using the schema
----
2019-10-03 17:35:49 UTC - Devin G. Bost: What causes a `no space left on device` error in a function?
----
2019-10-03 17:38:55 UTC - Luke Lu: My point is that the more features you put in the broker, the more bugs that can potentially affect the rest of the broker via unanticipated error paths. e.g. the pulsar client in the embedded websocket proxy has resource leaks on consumer subscription failures: <https://github.com/apache/pulsar/issues/5200>
----
2019-10-03 17:43:25 UTC - Joshua Dunham: I agree that bringing the logic / machinery to the data makes sense for performance. Atlas (for instance) is a very full featured registry that works with similar overlapping technologies already (avro for me).
----
2019-10-03 17:44:18 UTC - Joshua Dunham: From my Q above, Functions vs Schema Reg, schema reg is a good candidate to control externally and just sync changes back and forth. Performance should not be impacted that much.
----
2019-10-03 17:44:55 UTC - Joshua Dunham: Whisk is a different story though, I would not argue it's made to be able to keep pace with the volume of messages that Pulsar can ingest.
----
2019-10-03 17:45:30 UTC - Joshua Dunham: Having a connector for both would still be beneficial for the folks that could use the extra functionality both probive.
----
2019-10-03 17:45:33 UTC - Joshua Dunham: provide*
----
2019-10-03 17:46:35 UTC - Joshua Dunham: Both can integrate w/ Kafka currently and I remember there was a github issue about making an endpoint which spoke Kafka (and more) to aid adoption time.
----
2019-10-03 17:46:37 UTC - Joshua Dunham: Is this a thing?
----
2019-10-03 17:47:05 UTC - Joshua Dunham: Snapping in Pulsar to these systems would be game changing (for me at least).
----
2019-10-03 18:05:18 UTC - Poule: Talking about Whisk, I'd love to have an apigateway in Pulsar <https://github.com/apache/pulsar/issues/4249>
heart_eyes : Poule, Andrey Popelo
----
2019-10-03 18:27:01 UTC - Chris Bartholomew: @Raman Gupta I am not sure this is the best way, but if you used geo-replication for this, you would also need to use replicated subscriptions to synchronize the subscription state between the standalone and multi-node cluster. I would use async replication since it is easier to configure and probably good enough since you are coordinating the switchover.
----
2019-10-03 18:33:56 UTC - Anubhav Jain: @Anubhav Jain has joined the channel
----
2019-10-03 19:01:10 UTC - Raman Gupta: Is it ok for multiple `Consumer` instances in one process to have the same consumer name? I want the name to reflect the Kubernetes pod name., but I have multiple consumers in each container.
----
2019-10-03 19:02:01 UTC - Raman Gupta: Thanks @Chris Bartholomew. Do you have suggestions for better approaches?
----
2019-10-03 19:10:17 UTC - Jerry Peng: @Raman Gupta yes
ok_hand : Raman Gupta
----
2019-10-03 19:12:39 UTC - Chris Bartholomew: I think it might be easier to set up a single node cluster using a copy of the files from the standalone cluster and then expand from a single node to multi-node. This would require an outage on the standalone cluster while you are transferring its files to the single-node cluster.
+1 : Raman Gupta
----
2019-10-03 19:39:45 UTC - Addison Higham: am I recalling correctly that when using storage offloading, any segments moved to s3 won't be cleaned up even after retention passes?
----
2019-10-03 19:41:31 UTC - Addison Higham: along with that, from what I can't tell, there isn't an option to enable offloading for newly created namespaces by default. Seems like it should be straight forward to add
----
2019-10-03 20:19:56 UTC - V.V.S: @V.V.S has joined the channel
----
2019-10-03 20:22:17 UTC - V.V.S: Hi all, just a small query. Pulsar also, however, supports non-persistent topics, which are topics on which messages are never persisted to disk and live only in memory. When using non-persistent delivery, killing a Pulsar broker or disconnecting a subscriber to a topic means that all in-transit messages are lost on that (non-persistent) topic, meaning that clients may see message loss- to this statement is there a way i can transfer the data from one broker to other broker before i can gracefully shutdown one.?
----
2019-10-03 20:22:51 UTC - gangadhar.chinnireddy: @gangadhar.chinnireddy has joined the channel
----
2019-10-03 20:28:06 UTC - Matteo Merli: Not currently, that would be technically challenging to achieve and it won’t anyway be able to cover for brokers failures
----
2019-10-03 20:29:23 UTC - Oleg Kozlov: @Oleg Kozlov has joined the channel
----
2019-10-03 20:44:45 UTC - GC: @GC has joined the channel
----
2019-10-03 21:07:49 UTC - Oleg Kozlov: Hello all, sorry for double-posting with the dev-websocket channel... Quick question - is it possible to set deliverAt or deliverAfterSeconds configuration properties on messages from websocket producer ?
----
2019-10-03 21:08:29 UTC - Oleg Kozlov: basically - can I produce delayed / scheduled messages via WebSocket API?
----
2019-10-03 21:11:25 UTC - Matteo Merli: We haven’t exposed these settings yet outside the Java API
----
2019-10-03 21:12:00 UTC - Oleg Kozlov: are there plans to do that? And also, are they available via protobuf ?
----
2019-10-03 21:12:38 UTC - Matteo Merli: Yes, it’s a simple additional property that has to be set in the message protobuf metadata.
----
2019-10-03 21:14:02 UTC - Oleg Kozlov: got it.. basically, we have an erlang app , and looking at Pulsar to use as a replacement for our current message broker, so the only two options for connecting erlang -&gt; pulsar are: 1) websockets api, 2) implement a client using protobuf
----
2019-10-03 21:14:05 UTC - Oleg Kozlov: is that correct?
----
2019-10-03 21:14:39 UTC - Matteo Merli: 3. wrap c++ client lib from erlang
----
2019-10-03 21:15:38 UTC - Oleg Kozlov: hm, ok, that's interesting, we'll look into that
----
2019-10-03 21:16:05 UTC - Oleg Kozlov: but so far websockets seems to be the easiest option.. would it be possible to add support for exposing deliverAt via WebSockets?
----
2019-10-03 21:16:27 UTC - Matteo Merli: yes, it’s very easy to add it
----
2019-10-03 21:17:04 UTC - Oleg Kozlov: seems like the change would be in org.apache.pulsar.websocket.ProducerHandler?
----
2019-10-03 21:17:30 UTC - Matteo Merli: correct
----
2019-10-03 21:18:24 UTC - Matteo Merli: and the docs are at: `site2/docs/client-libraries-websocket.md`
----
2019-10-03 21:19:37 UTC - Oleg Kozlov: got it, thank you :slightly_smiling_face:
----
2019-10-03 22:11:05 UTC - Luke Lu: It appears that much of the data plane work (esp. managedledger stuff) currently in pulsar broker can be delegated to DistributedLog: <https://bookkeeper.apache.org/distributedlog/> Can I assume that pulsar will eventually adopt the distributed log core api and essentially becomes read/write proxy of distributed log?
----
2019-10-04 02:14:07 UTC - Ali Ahmed: @Luke Lu no dlog is a legacy api
----
2019-10-04 02:24:05 UTC - Luke Lu: So the current dlog api is deprecated? Will ManagedLedger (appears already in bookkeeper package) and friends be absorbed into bookkeeper?
----
2019-10-04 02:30:09 UTC - Ali Ahmed: dog api can be considered deprecated, ManagedLedger will stay as is.
ok_hand : Luke Lu
----
2019-10-04 04:17:55 UTC - Matteo Merli: I wouldn’t say that. Managed ledger and DLog are 2 libraries that were created for the same purpose and have a very big overlap in functionalities, though there are few differences. The differences are not big, but still require careful thinking to be able to syntetize them into a single API that could support systems using the 2 libraries.

Some time back, we had thought of merging the 2 libraries into 1 which would have a superset of the features. The main challenges for that are:
1. Time. It would be quite a huge task to complete
2. Ensure metadata compatibility and path for live migrations
3. Opportunity cost. We decided, for now, to use that time to build features/improvements/etc.. that are more directly useful to users.
----
2019-10-04 04:42:04 UTC - Luke Lu: Thanks for the pragmatic and historical perspectives! Make sense.
----
2019-10-04 04:46:16 UTC - Luke Lu: It’s a pity that much of the logic is duplicated…
----
2019-10-04 04:51:59 UTC - Matteo Merli: Yes, the reason is that the 2 libs were created in parallel as closed source at Yahoo and Twitter
----