You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/08/16 09:11:03 UTC

Slack digest for #general - 2019-08-16

2019-08-15 11:53:14 UTC - Jan-Pieter George: Great news!
Would be happy to give this a test-drive if that'd be helpful.
+1 : dba
----
2019-08-15 12:59:36 UTC - MichaelM: @MichaelM has joined the channel
----
2019-08-15 13:03:20 UTC - Aaron: Not in the server logs, no. I don't understand how their could be missing key value pairs if the JSON is generated from the same POJO every time
----
2019-08-15 13:11:22 UTC - Ming Fang: Are there any advantages to running multiple Pulsar components on the same host, besides making it easier to start?
For example, if Broker and Bookie are on the same host then do they know and take advantage of that?
I do see a potential problem with running Functions Worker with Broker since a bad function can take resources away from the Broker.
----
2019-08-15 13:25:33 UTC - Alexandre DUVAL: @Sijie Guo poke
----
2019-08-15 15:36:03 UTC - Addison Higham: @Ming Fang no real advantage, in the case of publishes, you are communicating with multiple bookies and waiting for them all to confirm, so there really isn't any advantage to be gained there. In the case of tailing reads (i.e. a subscription that is caught up to the "edge" of the topic) that in most cases won't hit a bookie at all as the broker buffers the message. In the case of catchup reads, where the broker no longer has the message and it needs to fetch from a bookie, you could *maybe* squeeze out a small latency advantage, but likely better to have them on separate hosts and have better isolation
----
2019-08-15 15:38:52 UTC - Ming Fang: @Addison Higham Thanks for the detailed response. It was very helpful.
----
2019-08-15 15:39:36 UTC - Addison Higham: np! this blog post is super helpful in getting a more detail idea of how pulsar works: <https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works>, also, the streamlio blog posts contain a lot of great technical details
100 : Ming Fang
----
2019-08-15 15:45:00 UTC - Raman Gupta: Getting tired of endless weird errors from Kafka. Researching alternatives, and Pulsar looks promising. Obviously this is a biased community, and this is a very generic/open-ended question, but can anyone speak to experiences migrating from Kafka+Avro+Confluent Schema Registry+streams to Pulsar?
----
2019-08-15 15:49:01 UTC - Ming Fang: @Addison Higham Thanks for the link. It’s an excellent article indeed.  Btw I’m building my Pulsar stack using Terraform and Kubernetes here <https://github.com/mingfang/terraform-provider-k8s/tree/master/modules/pulsar>.  It does require my own plugin and fork of Terraform but can be a good reference for anyone.
----
2019-08-15 15:56:51 UTC - Addison Higham: interesting, where is your fork of terraform? curious what all you had to changed there. Yeah, so we are starting in on a terraform provider for pulsar resources themselves, like managing namespaces/topics/etc, just waiting to get the last approval and we will be doing that development in the open and will post here
----
2019-08-15 15:58:03 UTC - Ming Fang: It’s a simple but critical change here <https://github.com/mingfang/terraform/commit/a451ae6ab50108d350ac7a17e3f499c58c5615d2>
----
2019-08-15 15:59:29 UTC - Ming Fang: Basically Terraform never pass the “actual” config to the plugin, but rather an interpreted version.  And in the case of Kubernetes, it’s interpretation can be wrong so my plugin needed the original config to be passed in.
----
2019-08-15 16:02:02 UTC - Ming Fang: The use case is when you remove something from the config. TF does not have a concept of removing things so it will merge the previous state(the stuff you want to remove) with your config(which you tried to remove something). The plugin will end up no able to tell that something was removed.
----
2019-08-15 16:03:19 UTC - Addison Higham: :thinking_face: seem like they will accept it upstream? Your k8s provider looks really nice! I can probably make a custom provider for k8s fly but a fork of TF will be tougher for me to get people to use :stuck_out_tongue:
----
2019-08-15 16:05:38 UTC - Ming Fang: I submitted PR <https://github.com/hashicorp/terraform/pull/21218> but it’s still open.
----
2019-08-15 16:34:28 UTC - Raman Gupta: This is what I have documented so far in terms of advantages/disadvantages for us: <https://docs.google.com/document/d/11lw2cFABwZvqHi-l20Zm2fe1BsQ2F6D5MzxFwbBuN5Y/edit?usp=sharing>, comments/corrections are welcome!
----
2019-08-15 17:31:48 UTC - Ali Ahmed: @Ming Fang there are advantages of running all components in one instance even for production , think edge or iot deployments , you can have lots of small instances serving and processing data at the edge and replication data to a datacenter asynchronously.
----
2019-08-15 17:33:46 UTC - Ali Ahmed: @Raman Gupta some general comments  in your doc
pulsar supports protobuf for schema also
Tiered storage is also available for hadoop  and any object store that is supported by jclouds or has an s3 api example minio.
----
2019-08-15 17:34:43 UTC - Ali Ahmed: pulsar functions have a state store built in , it’s fully replication on bookkeeper
----
2019-08-15 17:35:38 UTC - Ali Ahmed: depends of the json serializer.
----
2019-08-15 17:36:06 UTC - Aaron: Im using JSONSchema.of
----
2019-08-15 17:36:28 UTC - Ali Ahmed: you can embed pulsar functions in another jvm it just need to documented better
----
2019-08-15 17:37:45 UTC - Guillaume Braibant: A thing I haven't seen in your advantages list is that Pulsar does not need a third party (like Kafka + RabbitMQ) to act like a message queue thanks to shared subscriptions.

It is the main reason why we chose to use Pulsar and not Kafka for a PoC where requests were sent to a distribution layer (a topic in Pulsar) to be processed by one node among an indefinite number of nodes.

In your disadvantages list, you mention Kafka Stream and the query stores. If I remember well, query stores are local to each Kafka Stream instance. I know you can store some state in your bookies with the Pulsar Function SDK but I don't know if that state is available for all your function instances or not.
----
2019-08-15 17:39:01 UTC - Ali Ahmed: it’s available to all function instance, since it’s written to bookkeeper.
slightly_smiling_face : Guillaume Braibant
----
2019-08-15 17:40:44 UTC - Guillaume Braibant: And another advantage is that writing a pulsar function requires less boiler plate code than a Kafka Stream application and provides metrics and logging out of the box
----
2019-08-15 17:42:16 UTC - Ali Ahmed: functions were written so there no learning curve for java developers.
----
2019-08-15 17:44:19 UTC - Raman Gupta: Great feedback guys, thanks. Good to know I can embed functions in existing JVMs, which makes managing them a bit easier. Love that function state is replicated and available to all instances. For really super-simple functions, deploying them into the brokers is a nice option to have. It looks like Jclouds supports Azure Blob for tiered storage, so that's great too.
----
2019-08-15 17:44:47 UTC - Raman Gupta: Great point about supporting the message queue use case.
----
2019-08-15 17:46:06 UTC - Raman Gupta: I thought I saw something in the Pulsar docs about supporting the message request/reply case, but now can't seem to find it. Or was that NATS I'm thinking about?
----
2019-08-15 17:47:59 UTC - Ali Ahmed: @Raman Gupta there are no current plans to support a request response model , it can done but it’s not been requested
----
2019-08-15 17:48:56 UTC - Raman Gupta: @Ali Ahmed No worries, we could implement it easily ourselves, if we needed it. We don't use it now with Kafka.
----
2019-08-15 17:50:28 UTC - Ali Ahmed: there is middleware that is emulating request response on top of kafka, I don’t know how well any of them work.
----
2019-08-15 17:51:16 UTC - Raman Gupta: Is the max message size PIP in 2.4? It seems to be but the PIP still shows as PENDING. We do currently send some messages up to 5 MB.
----
2019-08-15 17:57:28 UTC - Ali Ahmed: 5MB is the largest message size it depends on configs but sending messages over 1 MB is not really recommended for pub sub systems.
----
2019-08-15 18:01:47 UTC - Raman Gupta: Yeah its likely we'll move this large data into a blob storage system instead, and just link to it. Though it would be super-nice if we can avoid that work via the PIP for chunking.
----
2019-08-15 18:06:16 UTC - Aaron: @Ali Ahmed Is there another serializer I should use?
----
2019-08-15 19:43:04 UTC - Ali Ahmed: potentially even use bk as an underlying blog storage for  the future
----
2019-08-15 19:47:17 UTC - Raman Gupta: Interesting idea. It doesn't seem like the API is ideal for that though.
----
2019-08-15 19:49:31 UTC - Ali Ahmed: it has been tried and used in production <https://github.com/diennea/blobit>
+1 : Raman Gupta
----
2019-08-15 20:28:09 UTC - Tarek Shaar: My java consumer seems to not get messages beyond a certain number. I have a producer sending patches of messages (roughly 200 messages every minute), the consumer subscribes using a reg exp pattern  (<persistent://tenant/namespace/.*> ). The subscriber keeps receiving messages then it stops at message number 500. I have tried to stop and start many times and the same thing happens all the time. Is there a limit or a setting that I need to change?
----
2019-08-15 20:35:22 UTC - Chris Bartholomew: Is the subscriber acknowledging the messages as it receives them ? What subscription type are you using?
----
2019-08-15 20:40:05 UTC - Chris Bartholomew: There are several broker settings around the maximum number of messages that can be unacknowledged: maxUnackedMessagesPerConsumer, maxUnackedMessagesPerSubscription, maxUnackedMessagesPerBroker. Perhaps you are hitting one of these limits.
----
2019-08-15 21:04:32 UTC - Tarek Shaar: Am acking as soon as I get the message. I am using Exclusive sub. When sending to individual subscriptions (for example from producer 1 topic1 to consumer 1 topic1) then all the messages are delivered. Even if I create 4000 producers and 4000 consumers. The case that chokes is when I create 4000 producers and send all of the messages to one sub that's subscribed to a topic pattern that matches all of the produced topics patterns.
----
2019-08-15 21:21:17 UTC - Chris Bartholomew: In case it is not acking fast enough, you can try disabling all those unackedMessage settings by making them 0 in the broker.conf. If this "fixes" the problem, you know that your case is hitting the maxUnacked logic in the broker.  If you are hitting a limit on the consumer side, you try adjusting receiverQueueSize or maxTotalReceiverQueueSizeAcrossPartitions when creating the consumer.
----
2019-08-15 22:38:00 UTC - Tarek Shaar: Thanks Chris will take a look
----
2019-08-16 01:44:35 UTC - Poule: when I create a function using a wheel file, how can I tell it to install the dependencies? All I have now is `install_requires=[blblaballab]` set in my setup.py
----
2019-08-16 01:54:22 UTC - Poule: ..there is `install_usercode_dependencies=None,`, how can I set it to True when creating the function? I tried putting it in the yaml file to no luck
----
2019-08-16 02:10:47 UTC - Ali Ahmed: @Poule <https://github.com/apache/pulsar/blob/master/site2/docs/functions-quickstart.md#package-python-dependencies>
----
2019-08-16 02:13:54 UTC - Poule: @Ali Ahmed wheels not yet supported?
----
2019-08-16 02:14:18 UTC - Poule: they are in `<https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/python/python_instance_main.py>`
----
2019-08-16 02:16:38 UTC - Poule: line 100
----
2019-08-16 02:17:53 UTC - Ali Ahmed: I haven’t tried that option
----
2019-08-16 04:36:21 UTC - Raman Gupta: I'm trying to understand the point of `redeliverUnacknowledgedMessages`. Wouldn't Pulsar automatically redeliver these? Why and when should this be called?
----
2019-08-16 04:54:51 UTC - Vinay Aggarwal: Thanks a lot, it worked :slightly_smiling_face:
----
2019-08-16 05:40:38 UTC - jinfeng105: @jinfeng105 has joined the channel
----
2019-08-16 06:23:13 UTC - Poule: when trying to delete a subscription I get `Failed: Subscription has active connected consumers`
----
2019-08-16 06:23:24 UTC - Poule: how can I view/delete those consumers?
----
2019-08-16 06:23:36 UTC - Poule: so I can delete the subscription
----
2019-08-16 07:52:42 UTC - Sijie Guo: you can run `pulsar-admin topic stats` to get the stats of a topic, which includes the connected consumers.
----
2019-08-16 08:06:05 UTC - tasguocheng: @tasguocheng has joined the channel
----
2019-08-16 08:17:51 UTC - Alexandre DUVAL: :confused:
----
2019-08-16 08:32:40 UTC - Federico Ponzi: @msk docker just makes stuff easier to run (usually). If you know a bit of docker, you can use the Dockerfile [0] as a guidance to run the dashboard outside docker (e.g. see which packets to install and how to run the app)
[0]: <https://github.com/apache/pulsar/blob/master/dashboard/Dockerfile>
----
2019-08-16 08:59:26 UTC - Poule: ok I thought functions were bounded to tenant+namespace
----
2019-08-16 08:59:49 UTC - Poule: looks like an old deleted func. was still a connected consumer
----
2019-08-16 09:00:21 UTC - Poule: even after the tenant were deleted
----