You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Yosi Attias <yo...@gmail.com> on 2020/03/08 12:02:17 UTC

Pulsart event routing with tenants

Hi!

*I posted this to google groups and then the message somehow disappeared, I
will send it again here. Sorry for the duplication.*

I am checking out pulsar for using it as our events bus, and it's awesome!

Our services (written in nodejs) requirements that they need to listen to
multiple tenants (or all tenants - we have 10k tenants, and it's growing)
and the list of tenants can change dynamically at runtime (changes are not
that frequent, we can have 200/300 changes max at a day).
Pulsar sounds like an excellent fit for this because I can create topic per
tenant, like "tenant:XX:events" (XX = tenant id) and use shared
subscription for consumer groups.

As I said, the list of tenants needed to be subscribed all consumers in a
group gets a message (it's broadcasted via Redis pub/sub).

I am not sure what is the best solution to implement this, I see I have two
options:

   - Client-side: consumer receives a tenant he needs listening to, and he
   adds the topic to the shard subscription - sounds a like a right solution,
   but:
      - Since all consumers will add the same topic at the same time - is
      there any issues with this? Or I need to make sure it happens
once, so only
      one consumer mutates the shared subscription?
      - There are consumers (small fraction, but important ones) that needs
      to listen to all events - this makes the subscription consume
all topics -
      is it makes sense in terms of performance? Attaching subscription to 10k+
      topics?
   - Functions: I thought about creating a function that will have a list
   of application subscriptions (not pulsar subscription) and will listen to
   the main topic called "events" (or to all tenant topics? not sure how to
   implement this with function) and will route the events based on
   subscriptions to service topic. For example, service named "users" will
   have "users-service" topic and the function will route all events to
   "users-service" topic. This sounds like a good solution as well, but:
      - I am not sure where functions are running, if they are running as a
      separate container we will have massive traffic waste - I see there is
      threaded option to run the function - is the function runs inside pulsar?
      So I don't have traffic waste?
      - Is this overkill for functions?
      - Storing of application subscriptions - I can save them inside our
      database, and I see I can store them inside pulsar state tables - what is
      most preferred here?
      - Once I want to listen to more topic - Should I notify the function
      somehow to reload the list of subscriptions (since I will cache it) OR I
      need to implement some refresh timer?


Hopefully, this makes sense! If you have any questions and want me to
elaborate, please let me know!

If you want me to ask in other places (like Slack) or somewhere else, let
me know and I will ask their instead.

Re: Pulsart event routing with tenants

Posted by Yosi Attias <yo...@gmail.com>.
Hi!

Thanks for the quick reply!

I actually need a shared subscription, so I can have multiple instances of
consumer consuming the same topic.

I think I didn't explain my issue well, I'll try to explain it again, the
flow is like this:

   1. Producer - publish events from anywhere in the system (consumer can
   publish events, producers can publish directly to pulsar or pulsar-proxy)
   to a topic/topics (this is the question).
   2. Service (multiple consumers that scale-out/scale-down) - created a
   shared subscription that needs to listen to multiple tenants of events (the
   list of tenants can change dynamically) OR to all events.

Now, I am not sure how to implement the event routing and I don't want to
have traffic waste, let me elaborate on that.
Given that all producers publish all events at 30mb/s I don't want a
service that listens to two tenants (let's say 10% of traffic) will consume
30mb/s and filter on the client-side.

Looks like my solution will come to a function that will do a routing, so
the implementation will be something like this:

   1. Producer - publish all events to a topic named "events"
   2. Pulsar function - will process all those events and will route to
   them to service topics
   3. Service - will create shared subscription to its topic

Producer -> topic "events" -> Pulsar functions routes to "service-a" events
-> Service A will listen to "service-a" topic.

Is that something that makes sense?

If so, about a function runtimes - "thread" - is running inside the pulsar
broker OR it runs inside in another process dedicated for functions
(different pod in k8s deployment)




On Mon, 9 Mar 2020 at 9:02 Sijie Guo <gu...@gmail.com> wrote:

> Thank you, Yosi! The mailing list is a great place to ask questions since
> the emails are indexed and searchable.
>
> If most of the time, a consumer only listens to a "tenant" topic, you can
> use a master topic and a key_shared subscription to distribute your list of
> tenants. So each of the consumers of the master topic will be receiving a
> subset of the tenants. Then it can listen to those "tenant" topics to
> subscribe. So you don't need to all consumers to subscribe to all topics.
>
> Other comments inline.
>
>
> On Sun, Mar 8, 2020 at 5:02 AM Yosi Attias <yo...@gmail.com> wrote:
>
>> Hi!
>>
>> *I posted this to google groups and then the message somehow disappeared,
>> I will send it again here. Sorry for the duplication.*
>>
>> I am checking out pulsar for using it as our events bus, and it's awesome!
>>
>> Our services (written in nodejs) requirements that they need to listen to
>> multiple tenants (or all tenants - we have 10k tenants, and it's growing)
>> and the list of tenants can change dynamically at runtime (changes are not
>> that frequent, we can have 200/300 changes max at a day).
>> Pulsar sounds like an excellent fit for this because I can create topic
>> per tenant, like "tenant:XX:events" (XX = tenant id) and use shared
>> subscription for consumer groups.
>>
>> As I said, the list of tenants needed to be subscribed all consumers in a
>> group gets a message (it's broadcasted via Redis pub/sub).
>>
>> I am not sure what is the best solution to implement this, I see I have
>> two options:
>>
>>    - Client-side: consumer receives a tenant he needs listening to, and
>>    he adds the topic to the shard subscription - sounds a like a right
>>    solution, but:
>>
>>
>>    - Since all consumers will add the same topic at the same time - is
>>       there any issues with this? Or I need to make sure it happens once, so only
>>       one consumer mutates the shared subscription?
>>
>> It sounds like you need to use an exclusive subscription for this case.
>
>
>>
>>    - There are consumers (small fraction, but important ones) that needs
>>       to listen to all events - this makes the subscription consume all topics -
>>       is it makes sense in terms of performance? Attaching subscription to 10k+
>>       topics?
>>
>>
> It is okay to subscribe to a 10k+ topic. However, you need to pay
> attention to allocating memory for your client.
>
> But I would recommend thinking of architecting your service in a different
> way to avoid this if possible.
>
>
>>
>>    - Functions: I thought about creating a function that will have a
>>    list of application subscriptions (not pulsar subscription) and will listen
>>    to the main topic called "events" (or to all tenant topics? not sure how to
>>    implement this with function) and will route the events based on
>>    subscriptions to service topic. For example, service named "users" will
>>    have "users-service" topic and the function will route all events to
>>    "users-service" topic. This sounds like a good solution as well, but:
>>       - I am not sure where functions are running, if they are running
>>       as a separate container we will have massive traffic waste - I see there is
>>       threaded option to run the function - is the function runs inside pulsar?
>>       So I don't have traffic waste?
>>
>>
> Function have different runtimes - thread, process, and Kubernetes. It is
> pretty flexible.
>
> > So I don't have traffic waste?
>
> I am not sure what does "traffic waste" means. If you are referring
> messages that will be read and write multiple times, that's true. If your
> "service" topics (like users-service) will be used by different
> subscriptions, I would recommend going with function approaches.
>
>
>
>
>>
>>    - Is this overkill for functions?
>>       - Storing of application subscriptions - I can save them inside
>>       our database, and I see I can store them inside pulsar state tables - what
>>       is most preferred here?
>>       - Once I want to listen to more topic - Should I notify the
>>       function somehow to reload the list of subscriptions (since I will cache
>>       it) OR I need to implement some refresh timer?
>>
>>
>> Hopefully, this makes sense! If you have any questions and want me to
>> elaborate, please let me know!
>>
>> If you want me to ask in other places (like Slack) or somewhere else, let
>> me know and I will ask their instead.
>>
>

Re: Pulsart event routing with tenants

Posted by Sijie Guo <gu...@gmail.com>.
Thank you, Yosi! The mailing list is a great place to ask questions since
the emails are indexed and searchable.

If most of the time, a consumer only listens to a "tenant" topic, you can
use a master topic and a key_shared subscription to distribute your list of
tenants. So each of the consumers of the master topic will be receiving a
subset of the tenants. Then it can listen to those "tenant" topics to
subscribe. So you don't need to all consumers to subscribe to all topics.

Other comments inline.


On Sun, Mar 8, 2020 at 5:02 AM Yosi Attias <yo...@gmail.com> wrote:

> Hi!
>
> *I posted this to google groups and then the message somehow disappeared,
> I will send it again here. Sorry for the duplication.*
>
> I am checking out pulsar for using it as our events bus, and it's awesome!
>
> Our services (written in nodejs) requirements that they need to listen to
> multiple tenants (or all tenants - we have 10k tenants, and it's growing)
> and the list of tenants can change dynamically at runtime (changes are not
> that frequent, we can have 200/300 changes max at a day).
> Pulsar sounds like an excellent fit for this because I can create topic
> per tenant, like "tenant:XX:events" (XX = tenant id) and use shared
> subscription for consumer groups.
>
> As I said, the list of tenants needed to be subscribed all consumers in a
> group gets a message (it's broadcasted via Redis pub/sub).
>
> I am not sure what is the best solution to implement this, I see I have
> two options:
>
>    - Client-side: consumer receives a tenant he needs listening to, and
>    he adds the topic to the shard subscription - sounds a like a right
>    solution, but:
>
>
>    - Since all consumers will add the same topic at the same time - is
>       there any issues with this? Or I need to make sure it happens once, so only
>       one consumer mutates the shared subscription?
>
> It sounds like you need to use an exclusive subscription for this case.


>
>    - There are consumers (small fraction, but important ones) that needs
>       to listen to all events - this makes the subscription consume all topics -
>       is it makes sense in terms of performance? Attaching subscription to 10k+
>       topics?
>
>
It is okay to subscribe to a 10k+ topic. However, you need to pay attention
to allocating memory for your client.

But I would recommend thinking of architecting your service in a different
way to avoid this if possible.


>
>    - Functions: I thought about creating a function that will have a list
>    of application subscriptions (not pulsar subscription) and will listen to
>    the main topic called "events" (or to all tenant topics? not sure how to
>    implement this with function) and will route the events based on
>    subscriptions to service topic. For example, service named "users" will
>    have "users-service" topic and the function will route all events to
>    "users-service" topic. This sounds like a good solution as well, but:
>       - I am not sure where functions are running, if they are running as
>       a separate container we will have massive traffic waste - I see there is
>       threaded option to run the function - is the function runs inside pulsar?
>       So I don't have traffic waste?
>
>
Function have different runtimes - thread, process, and Kubernetes. It is
pretty flexible.

> So I don't have traffic waste?

I am not sure what does "traffic waste" means. If you are referring
messages that will be read and write multiple times, that's true. If your
"service" topics (like users-service) will be used by different
subscriptions, I would recommend going with function approaches.




>
>    - Is this overkill for functions?
>       - Storing of application subscriptions - I can save them inside our
>       database, and I see I can store them inside pulsar state tables - what is
>       most preferred here?
>       - Once I want to listen to more topic - Should I notify the
>       function somehow to reload the list of subscriptions (since I will cache
>       it) OR I need to implement some refresh timer?
>
>
> Hopefully, this makes sense! If you have any questions and want me to
> elaborate, please let me know!
>
> If you want me to ask in other places (like Slack) or somewhere else, let
> me know and I will ask their instead.
>