You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Tzu-Li (Gordon) Tai (Jira)" <ji...@apache.org> on 2020/02/28 03:09:00 UTC

[jira] [Comment Edited] (FLINK-16319) Pubsub-/Broadcast implementation

    [ https://issues.apache.org/jira/browse/FLINK-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047172#comment-17047172 ] 

Tzu-Li (Gordon) Tai edited comment on FLINK-16319 at 2/28/20 3:08 AM:
----------------------------------------------------------------------

Hi [~danp11]! Thanks a lot for bringing this up.

Supporting a pubsub pattern in Stateful Functions has definitely been on the radar on brought up quite a few times in discussions.
Ideally we would be able to support this as a first-class primitive in the SDK.
I also believe that this will be a valuable addition to the project.

I took a look at your current implementation, and came up with the following understandings (please correct me if I got anything wrong!):
* Function instances can publish / subscribe to new topics dynamically at runtime, by sending {{Subscription}} and {{PublishMessage}} events to the PubSub function.
* The PubSub function then routes those messages to the correct topic partition (hash partitioning using subscribed / target Address, over a fixed number of partitions). The routing logic is currently part of the PubSub function.
* The PubSub function only has in state the topic partition to a list of its subscribers (although we might want to use a more efficient state primitive for this, rather than {{PersistedValue}}).

Overall, this looks like it is heading towards a good direction.

Before moving forward with anything else / writing or looking at code or PRs, I'd like to discuss the following:
* Instead of having functions send {{Subscription}} and {{PublishMessage}} messages, lets consider integrating that as a first-class messaging primitive provided via {{Context}}. i.e. consider {{Context#subscribe(TopicID)}}, {{Context#publish(TopicID, Message)}}. The routing to the correct topic partition can probably happen there instead of being handled by the PubSub function. In this case with the routing responsibility removed, the PubSub function really just becomes a {{TopicPartitionFunction}}.
* What is the current delivery guarantee of published messages / subscription timeliness? Do we need to / want to persist published messages in state? This is definitely out-of-scope for a first version of this, as it would require some more state primitives like a {{PersistedQueue}}, and state TTL in Stateful Functions, but none-the-less quite interesting to think beforehand.
* With the current approach, a function *must* be invoked first with some other message, before it can subscribe to a topic and receive published messages. Would there be cases where a function only ever wants to be invoked with messages from a subscribed topic (some way to "eagerly" subscribe a function type to a topic)? Also consider the case where if we support in the {{Router}} a {{publish(TopicID,  Message)}} method, how would that work?

Please let me know what you think :)
I'm quite excited about this, and would be happy to work with you on this feature,
but can't promise I would be super responsive on this specific feature as of now, since as you can see a lot of work is going on right now with the multi-language support in Statefun.
I think we can discuss these details asynchronously on the side, and once we have more capacity after the multi-language work eventually push into adding this.


was (Author: tzulitai):
Hi [~danp11]! Thanks a lot for bringing this up.

Supporting a pubsub pattern in Stateful Functions has definitely been on the radar on brought up quite a few times in discussions.
Ideally we would be able to support this as a first-class primitive in the SDK.
I also believe that this will be a valuable addition to the project.

I took a look at your current implementation, and came up with the following understandings:
* Function instances can publish / subscribe to new topics dynamically at runtime, by sending {{Subscription}} and {{PublishMessage}} events to the PubSub function.
* The PubSub function then routes those messages to the correct topic partition (hash partitioning using subscribed / target Address, over a fixed number of partitions). The routing logic is currently part of the PubSub function.
* The PubSub function only has in state the topic partition to a list of its subscribers (although we might want to use a more efficient state primitive for this, rather than {{PersistedValue}}).

Overall, this looks like it is heading towards a good direction.

Before moving forward with anything else / writing or looking at code or PRs, I'd like to discuss the following:
* Instead of having functions send {{Subscription}} and {{PublishMessage}} messages, lets consider integrating that as a first-class messaging primitive provided via {{Context}}. i.e. consider {{Context#subscribe(TopicID)}}, {{Context#publish(TopicID, Message)}}. The routing to the correct topic partition can probably happen there instead of being handled by the PubSub function. In this case with the routing responsibility removed, the PubSub function really just becomes a {{TopicPartitionFunction}}.
* What is the current delivery guarantee of published messages / subscription timeliness? Do we need to / want to persist published messages in state? This is definitely out-of-scope for a first version of this, as it would require some more state primitives like a {{PersistedQueue}}, and state TTL in Stateful Functions, but none-the-less quite interesting to think beforehand.
* With the current approach, a function *must* be invoked first with some other message, before it can subscribe to a topic and receive published messages. Would there be cases where a function only ever wants to be invoked with messages from a subscribed topic (some way to "eagerly" subscribe a function type to a topic)? Also consider the case where if we support in the {{Router}} a {{publish(TopicID,  Message)}} method, how would that work?

Please let me know what you think :)
I'm quite excited about this, and would be happy to work with you on this feature,
but can't promise I would be super responsive on this specific feature as of now, since as you can see a lot of work is going on right now with the multi-language support in Statefun.
I think we can discuss these details asynchronously on the side, and once we have more capacity after the multi-language work eventually push into adding this.

> Pubsub-/Broadcast implementation
> --------------------------------
>
>                 Key: FLINK-16319
>                 URL: https://issues.apache.org/jira/browse/FLINK-16319
>             Project: Flink
>          Issue Type: New Feature
>          Components: Stateful Functions
>            Reporter: Dan Pettersson
>            Priority: Major
>
> Hi everyone,
>  
> I have a use case where the id of the functions are brokerId + instrumentId that receives trades and orders. The instrument has state changes (Open, halted, closed etc) that almost all the functions are interested in. Some functions only wants for example the Close message whereas other functions wants all state changes for the specific instrument. 
>  
> I've built a statefun pubsub module that exposes two interfaces, Subscriber and Publisher, with these two methods:
>  
> default void subscribe(Context context, Subscription... subscriptions)
>   
> default void publish(Context context, PublishMessage publishMessage)
>  
> Behind the interfaces is a hidden StatefulPubSubFunction that keeps track of which partition the subscriber is located in and to which topic it listens to.
>  
> Code is located under [https://github.com/danp11/flink-statefun/tree/master/statefun-pubsub] if anyone is interested.
>  
> This code is a "classic pub sub" pattern and I think that this kind of functionality would be a great addition to Stateful functions. I create this Jira to see if there is an interest to discuss how a optimal pubsub-/broadcast solution would look like in SF? Igal has previously mentioned that Broadcast could be a good fit for this kind of flow.
> At the moment I don't know the internals of SF and-/or Flink good enough to come up with a proposal myself unfortunately.
>  
> I know you are very busy at the moment (Its impressive how much you have produced only the last couple of weeks!:-) but if someone, on a high level, has any ideas on where and how a pub sub pattern could be implemented I'd really appreciate it. In the future I hope we can come up with a proposal together as I need your help here.  If you think that a pubsub-/broadcast solution would make SF better that is :-)
>  
> Hope to hear your thoughts on this!
>  
> Thanks,
>  
>  /Dan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)