You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "M. Manna" <ma...@gmail.com> on 2019/02/18 14:04:16 UTC

Kafka Topic Volume and (possibly ACL) question

Hello,

We have a requirement where, based on business requirementes, we need to
publish data only for a specific set of clients. For example, an invoice
update shouldn't go to all clients, only the specific client. But a company
remittance info should be published to all clients. Also, in some cases, a
specific client changes some contract info which is published in a P2P
fashion. We have about 8k clients.

What is the ideal way to control this flow?

1) specific topic per client
2) Some form of ACL?

For option 1, we are not 100% sure if Kafka can handle 8k topics (or, the
resource issues for that matter). Has anyone solved a similar business
problem? If so, would you mind sharing your solution?

Btw, we are not using stream platform, it's simply pub-sub. Because we
don't need real-time aggregation of various items. For us, it's key that
the synchronisation occurs, and has "exactly-once" semantics.

Thanks,

Re: Kafka Topic Volume and (possibly ACL) question

Posted by Pere Urbón Bayes <pe...@gmail.com>.
Hi,
 as Evelin has already said, I would recommend you to think about your
topic / data modeling, from your email

> We have a requirement where, based on business requirementes, we need to
publish data only for a specific set of clients. For example, an invoice
update shouldn't go to all clients, only the specific client. But a company
remittance info should be published to all clients. Also, in some cases, a
specific client changes some contract info which is published in a P2P
fashion. We have about 8k clients

From this paragraph I do understand you're thinking to have a topic per
client, right? what happen when your client base grows more? My
recommendation here would be to think about having action based topics for
example, something where your different kinds of notification goes to.


Hope this helps,

-- Pere


Missatge de Evelyn Bayes <ev...@confluent.io> del dia dc., 20 de febr.
2019 a les 7:53:

> Hi,
>
> I would use ACLs or something similar.
>
> For instance, you might assign the records which are limited to a subset
> of clients to a specific topic with an associated ACL.
>
> I expect you’ll find having 8k extra topics very problematic in a range of
> ways, such as:
>
> * Replication issues;
> * Poor batching;
> * Memory issues due to the additional buffers.
>  And so many more.
>
> Currently the upper limit for partitions is generally considered ~100,000
> and that’s on a LARGE cluster.
> I usually see a lot of issues coming up long before that limit is hit and
> on smaller clusters even more so.
>
> If you had a replication factor of 3 and a single partition per topic,
> this adds 24,000 partitions to your cluster.
>
> The end solution really depends on how the clients get the data.
>
> Do you have a consumer read it, do some preprocessing and send it to them?
> Then you can handle this in the business logic.
>
> Do they have direct consumption rights to the cluster?
> Then you NEED to have ACLs, because there won’t be anything stopping them
> from simply subscribing to another clients topic.
>
> Cheers,
> Eevee.
>
>
>
> > On 19 Feb 2019, at 1:04 am, M. Manna <ma...@gmail.com> wrote:
> >
> > Hello,
> >
> > We have a requirement where, based on business requirementes, we need to
> > publish data only for a specific set of clients. For example, an invoice
> > update shouldn't go to all clients, only the specific client. But a
> company
> > remittance info should be published to all clients. Also, in some cases,
> a
> > specific client changes some contract info which is published in a P2P
> > fashion. We have about 8k clients.
> >
> > What is the ideal way to control this flow?
> >
> > 1) specific topic per client
> > 2) Some form of ACL?
> >
> > For option 1, we are not 100% sure if Kafka can handle 8k topics (or, the
> > resource issues for that matter). Has anyone solved a similar business
> > problem? If so, would you mind sharing your solution?
> >
> > Btw, we are not using stream platform, it's simply pub-sub. Because we
> > don't need real-time aggregation of various items. For us, it's key that
> > the synchronisation occurs, and has "exactly-once" semantics.
> >
> > Thanks,
>
>

-- 
Pere Urbon-Bayes
Software Architect
http://www.purbon.com
https://twitter.com/purbon
https://www.linkedin.com/in/purbon/

Re: Kafka Topic Volume and (possibly ACL) question

Posted by Evelyn Bayes <ev...@confluent.io>.
Hi,

I would use ACLs or something similar.

For instance, you might assign the records which are limited to a subset of clients to a specific topic with an associated ACL.

I expect you’ll find having 8k extra topics very problematic in a range of ways, such as:

* Replication issues;
* Poor batching;
* Memory issues due to the additional buffers.
 And so many more.

Currently the upper limit for partitions is generally considered ~100,000 and that’s on a LARGE cluster.
I usually see a lot of issues coming up long before that limit is hit and on smaller clusters even more so.

If you had a replication factor of 3 and a single partition per topic, this adds 24,000 partitions to your cluster.

The end solution really depends on how the clients get the data.

Do you have a consumer read it, do some preprocessing and send it to them?
Then you can handle this in the business logic.

Do they have direct consumption rights to the cluster?
Then you NEED to have ACLs, because there won’t be anything stopping them from simply subscribing to another clients topic.

Cheers,
Eevee.



> On 19 Feb 2019, at 1:04 am, M. Manna <ma...@gmail.com> wrote:
> 
> Hello,
> 
> We have a requirement where, based on business requirementes, we need to
> publish data only for a specific set of clients. For example, an invoice
> update shouldn't go to all clients, only the specific client. But a company
> remittance info should be published to all clients. Also, in some cases, a
> specific client changes some contract info which is published in a P2P
> fashion. We have about 8k clients.
> 
> What is the ideal way to control this flow?
> 
> 1) specific topic per client
> 2) Some form of ACL?
> 
> For option 1, we are not 100% sure if Kafka can handle 8k topics (or, the
> resource issues for that matter). Has anyone solved a similar business
> problem? If so, would you mind sharing your solution?
> 
> Btw, we are not using stream platform, it's simply pub-sub. Because we
> don't need real-time aggregation of various items. For us, it's key that
> the synchronisation occurs, and has "exactly-once" semantics.
> 
> Thanks,