You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/05/15 09:11:03 UTC

Slack digest for #general - 2019-05-15

2019-05-14 11:23:14 UTC - tuteng: @Jerry Peng <https://github.com/apache/pulsar/issues/4248> Can you spare some time to help look at this issue?
----
2019-05-14 13:29:44 UTC - Nathan Linebarger: Hi everyone, I was hoping for some help (searched all the docs available so far). Is there any way to subscribe to messages with a certain key? E.g., some sort of server-side message filtering? I haven't found any way yet. I need it b/c I'll have many consumers only interested in a subset of keys, and don't want to max out my NICs by consuming the entire topic and doing client-side filtering
----
2019-05-14 14:07:38 UTC - Devin G. Bost: @Sanjeev Kulkarni What's the motivation/intuition? It seems like we would just need to write a lot of functionality that is already supported by the admin library.
----
2019-05-14 14:11:53 UTC - Devin G. Bost: The REST interface is not as well documented, so it will be harder for us to get others to support it.
----
2019-05-14 14:17:41 UTC - Devin G. Bost: I'm not sure that I understand the value of the Java Admin API at all if we can't use it to write Pulsar functions to control governance. It seems to me that its greatest value proposition is its integration with Pulsar.
----
2019-05-14 14:19:06 UTC - Devin G. Bost: @Nathan Linebarger I wonder if you're referring to the message-property feature that they're currently working on. I think they're planning on releasing it with 2.3.2.
----
2019-05-14 14:35:01 UTC - Devin G. Bost: I was able to locate documentation on the REST API here: <http://pulsar.apache.org/docs/latest/reference/RestApi/>
but that documentation says nothing about what objects/data we expect to get back from any of those calls.
----
2019-05-14 14:37:40 UTC - Devin G. Bost: It also would have been nice to know weeks ago that this use case of the Java Admin API would not be supported/recommended before we wasted our time on it.
----
2019-05-14 14:38:07 UTC - Matteo Merli: No, currently there’s no direct way to filter a message on server side based on key or properties. 

A possible workaround is to publish the filtered streams into different topics and have consumers pick it up from there. 
----
2019-05-14 14:46:06 UTC - Brian Doran: @Sijie Guo Is this something that will be looked at in the short term? Thanks
----
2019-05-14 14:56:27 UTC - Nathan Linebarger: Thanks @Matteo Merli, good idea, maybe each such consumer can deploy a specialized Function that will filter to a destination topic. Two other work-arounds I thought of: (1) Have many partitions (say, 1M), and only subscribe to the partitions that have a key you're interested in (assuming the key/partition mapping is predetermined and not round-robin), and (2) Have a Function that for each message, outputs (key, sequence_id) to Topic B. You can then subscribe to Topic B (a much smaller topic) to determine which exact sequence IDs you'd be interested in and poll them somehow. But your first suggestion of filtering to a separate topic looks best.
----
2019-05-14 14:56:44 UTC - Nathan Linebarger: Thanks @Devin G. Bost, do you know of any Issue # associated with it or any way I can look at the feature?
----
2019-05-14 15:03:58 UTC - Devin G. Bost: @Nathan Linebarger Here's what I was thinking of: <https://github.com/apache/pulsar/issues/4042>
----
2019-05-14 15:04:39 UTC - Richard Sherman: @Richard Sherman has joined the channel
----
2019-05-14 15:13:04 UTC - Nathan Linebarger: It looks like the issue described is a little different @Devin G. Bost, but thank you for linking. Maybe I will add an issue for server-side message filtering for subscribers
----
2019-05-14 15:23:27 UTC - Devin G. Bost: @Nathan Linebarger
If the key is in a message property, can't you just create a filter/router function?
----
2019-05-14 15:24:24 UTC - Devin G. Bost: The function would route the messages according to the key.
----
2019-05-14 15:25:25 UTC - Devin G. Bost: Alternatively, you could put the key in the topic name.
----
2019-05-14 15:31:11 UTC - Nathan Linebarger: yes I think that work around could work (also suggested by merlimat), which is to have a Function which routers to a "filter topic." Exactly what keys should be filtered is constantly changing in my use case, so the Function would need to have a way to dynamically update what keys it should filter (perhaps through using the State API, or maybe even another topic to publish filter criteria changes)
----
2019-05-14 15:34:32 UTC - Nathan Linebarger: Separate topic for each key could work, I know that Puslar can scale to 1M+ topics, I wonder if this solution would scale arbitrarily (e.g., 1B topics, with adequate hardware)
----
2019-05-14 15:43:00 UTC - Devin G. Bost: If we go the REST path, then we'd need to create something almost identical to the Java Admin API, which seems like duplicate work.
----
2019-05-14 15:58:01 UTC - Jerry Peng: Sure u can take a look
----
2019-05-14 15:58:36 UTC - Jerry Peng: Sure I can take a look
----
2019-05-14 16:05:38 UTC - Jerry Peng: To get pulsar client or pulsar admin client to work inside a pulsar function is harder than I first imagined. In Java function instance library we shade all of the 3rd party dependencies but not pulsar dependencies which is normal practice for something like this. I will need sometime time to think and discuss with other people about how if possible to support something like this
----
2019-05-14 16:11:03 UTC - Sanjeev Kulkarni: @Devin G. Bost The current structure of pulsar-admin is too complicated. All pulsar admin calls are rest based. thus ideally pulsar-admin should have been a thin wrapper around a rest library. However it uses far too many ‘internal’ data structures that become a shading nightmare since these very structures are used by other internal parts of pulsar. There is a long pending item of restructuring the client, but it just hasnt been done yet
----
2019-05-14 16:11:37 UTC - Devin G. Bost: That's helpful to know. Thanks for the explanation.
----
2019-05-14 16:12:30 UTC - Devin G. Bost: Where on your priority list / roadmap is addressing this technical debt?
----
2019-05-14 16:15:10 UTC - Sanjeev Kulkarni: there have been multiple thoughts around it. The current thinking is to completely deprecate the pulsar-admin and move to a thin go based admin tool.
----
2019-05-14 16:16:22 UTC - Devin G. Bost: I'm in favor of completely deprecating the pulsar-admin.
----
2019-05-14 16:17:19 UTC - Devin G. Bost: We're interested in helping with this initiative because we need to build something that will work for our CI/CD automation.
----
2019-05-14 16:17:44 UTC - Sanjeev Kulkarni: that sounds good.
----
2019-05-14 16:23:17 UTC - Jerry Peng: @Devin G. Bost if you are adventurous what you can also do is shade your own version of pulsar-client-admin.  Similar to what we do here: <https://github.com/apache/pulsar/blob/master/pulsar-client-admin-shaded/pom.xml>
You can relocate all the classes in the pulsar-client-admin JAR to be under something like com.overstock.org.apache.pulsar… and include that as a dependency to your function and use that custom shaded version of the pulsar-client-admin
----
2019-05-14 16:28:04 UTC - Devin G. Bost: I appreciate the suggestion, but if there's interest in just deprecating the existing library and creating a new tool in Go, I'd rather just get a jumpstart on that once I have time.
----
2019-05-14 16:29:22 UTC - Jerry Peng: but that doesn’t really solve running pulsar admin in a Java Function?
----
2019-05-14 17:34:36 UTC - Jerry Peng: @Devin G. Bost  can you describe more in detail the CI/CD pipeline you are building for functions with functions? So we can get a better understanding of how to improve functions 
----
2019-05-14 17:55:48 UTC - Devin G. Bost: Yes, I'll see if we can get you some diagrams. We might need to go over them in a call after you have a chance to look them over.
----
2019-05-14 17:58:14 UTC - Devin G. Bost: Regarding your question about running Pulsar admin in a Java Function... We're not attached to Java. If we could use a different language for governance and automation within a Pulsar function, we'd be happy with that. Python would be my first choice, but Go is a useful language as well.
----
2019-05-14 18:02:56 UTC - Devin G. Bost: I also like Scala, but that means working with Java, and I'd rather use .Net core than Java in most cases.
----
2019-05-14 18:15:30 UTC - Thor Sigurjonsson: @Jerry Peng I guess in trying to describe our use case or rather where we're thinking we want to go -- it's a little abstract -- but here goes:

I think ultimately we're trying to take more complex representations of systems of flows and functions and boil it down to deltas that need deployment from one state of the representation and another (a deploy of a subset of functions for example that have changed).

We'd like to turn a "topology" like that into a "data contract" with a component that just does the work of provisioning what is in a message it receives.

What that component is might be less important than getting that data contract right and having transformations of one representation  into another (deploy something this way or that).
----
2019-05-14 18:16:44 UTC - Thor Sigurjonsson: We want to build a component that just takes a message and abstracts away the "devops" work of calling the Right APIs and does it quickly and well.
----
2019-05-14 18:17:14 UTC - Thor Sigurjonsson: We can be agile in building that, but we'd like to settle on a data contract that is general enough. We can re-write our implementation or improve it as needed.
----
2019-05-14 18:18:01 UTC - Thor Sigurjonsson: We started with a wrapper around the `pulsar-admin` tool and sort of have been evolving it. We got a huge speedup of course doing it in a smarter way.
----
2019-05-14 18:18:47 UTC - Thor Sigurjonsson: We'd like to get to a place where we're dealing with message transformations as "DevOps" than integrations with components.
----
2019-05-14 18:19:21 UTC - Thor Sigurjonsson: We expect there may be more than just pulsar artifacts involved as we build this out, but we are starting with pulsar functions, namespaces, etc.
----
2019-05-14 18:20:08 UTC - Thor Sigurjonsson: Wish I had an elevator speech on hand but I hope that communicates that part of our architecture and use case...
----
2019-05-14 18:22:03 UTC - Thor Sigurjonsson: @Sanjeev Kulkarni @Matteo Merli I hope this explanation is helpful for you guys too (I think you and Devin had more related conversations earlier).
----
2019-05-14 18:27:32 UTC - Devin G. Bost: Regarding:

&gt; 'We'd like to turn a "topology" like that into a "data contract" with a component that just does the work of provisioning what is in a message it receives.'

The idea is that it's a more declarative, data-driven, stream-based approach to enabling incremental/rolling updates to Pulsar at-scale. The traditional Jenkins process may not work for some teams, so we want to leverage what we love about Pulsar instead of forcing people to use "this" tool or "that" tool for CI/CD. The idea is that as long as they can meet the data contract, we can deploy to Pulsar. That's our objective.
----
2019-05-14 18:28:28 UTC - Devin G. Bost: Using a Function with the Java Admin API was one hypothetical way of doing that, but we're open to other ideas.
----
2019-05-14 18:30:39 UTC - Thor Sigurjonsson: I think for our first deliverable we'll probably just get our java admin api solution rigged up against producer/consumer or `main` based cli tool if that snags out for some reason. Functions would be elegant in a way, but I see the complication when it's also java codebase within pulsar and not containerized/isolated workload...
----
2019-05-14 18:32:53 UTC - Thor Sigurjonsson: I guess to add to what Devin was saying:
"The idea is that as long as they can meet the data contract, we can deploy to Pulsar. That's our objective."
I'd just add that "their" data contract won't be a "full system" or how to safely deploy or prove it in production. That's where my comments about "transforming one representation into another come in".
----
2019-05-14 20:22:29 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson thanks for the explanation
----
2019-05-14 20:25:18 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson I have to discussing approach to java functions in which we use separate classloaders to side step this shading issue and in theory will then allow users to use pulsar admin client and pulsar client in a function.  I think that will be a better approach for java function in general.  However, this implementing this approach will take time (many be a couple of days of work)
----
2019-05-14 20:26:18 UTC - Devin G. Bost: Thanks for the feedback.
----
2019-05-14 20:26:20 UTC - Karthik Ramasamy: If you are in Bay Area - check out the meetup in July on Kafka and Pulsar
----
2019-05-14 20:26:22 UTC - Karthik Ramasamy: <https://www.meetup.com/SF-Big-Analytics/events/261460187/>
+1 : Devin G. Bost, Sijie Guo, David Kjerrumgaard, Nathan Linebarger, Ali Ahmed, Chris Bartholomew, Matteo Merli, Ezequiel Lovelle, Sree Vaddi, Guangzhong Yao
----
2019-05-14 20:29:02 UTC - Jerry Peng: but again you can always use python functions or java functions to issue your own HTTP calls to the pulsar REST API to unblock yourselves for now.
----
2019-05-14 20:57:15 UTC - Nathan Linebarger: Thank you. I will try to catch it online :+1:
----
2019-05-14 21:00:01 UTC - Matteo Merli: 1B is a very big number :slightly_smiling_face:
----
2019-05-15 02:16:32 UTC - Sree Vaddi: don't miss this, team :slightly_smiling_face: please.
----
2019-05-15 07:56:47 UTC - Eugene: @Eugene has joined the channel
----
2019-05-15 09:03:13 UTC - bhagesharora: Hello Everyone,
I just want to undestand one scenario.We are pushing the messages from producer and consumed the messages through consumer. After that I acknowledge all the messages using consumer.acknowledge(msg) in python client.
Again When I am trying to read the messages I am using Reader Interface(I am doing rewind process using messageID) so all the acknowledge messages is coming. I just want to understand How all the messages is coming if its already acknowledge. If all the messages is coming it means somewhere its stored so what is the memory location and How we can override this behaviour. Is there any pulsar-admin command or configuration related changes need to be done ??
----