You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Moritz Petersen <mp...@adobe.com.INVALID> on 2019/05/06 14:17:10 UTC

Best Practice Scaling Consumers

Hi all,

I’m new to Kafka and have a very basic question:

We build a cloud-scale platform and evaluate if we can use Kafka for pub-sub messaging between our services. Most of our services scale dynamically based on load (number of requests, CPU load etc.). In our current architecture, services are both, producers and consumers since all services listen to some kind of events.

With Kafka, I assume we have two restrictions or issues:

  1.  Number of consumers is restricted to the number of partitions of a topic. Changing the number of partitions is a relatively expensive operation (at least compared to scaling services). Is it necessary to overprovision on the number of partitions in order to be prepared for load peaks?
  2.  Adding or removing consumers halts processing of the related partition for a short period of time. Is it possible to avoid or significantly minimize this lag?

Are there any additional best practices to implement Kafka consumers on a cloud scale environment?

Thanks,
Moritz


Re: Best Practice Scaling Consumers

Posted by na...@gmail.com.
Hi Morritz - I don’t believe the number of Kafka consumers is restricted to the number of partitions.  

When you create a topic - and indicate both the number of partitions and a key - it causes your key value pairs to be allocated to a specific partition on the basis of a hash function on the key.  

I believe the purpose of partitioning is to speed up consumption of Kafka day from a specific key within a Kafka topic.  It essentially pre sorts your topic’s data into as many categories as you have partitions.

BTW as of yet I haven’t figured out how to consume data from one of the partitions while ignoring the others.




Sent from my iPhone

> On May 6, 2019, at 9:30 PM, Kamal Chandraprakash <ka...@gmail.com> wrote:
> 
> 1. Yes, you may have to overprovision the number of partitions to handle
> the load peaks. Refer this
> <https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster>
> document to choose the no. of partitions.
> 2. KIP-429
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol>
> is
> proposed to reduce the time taken by the consumer rebalance protocol when a
> consumer instance is added/removed from the group.
> 
> On Mon, May 6, 2019 at 7:47 PM Moritz Petersen <mp...@adobe.com.invalid>
> wrote:
> 
>> Hi all,
>> 
>> I’m new to Kafka and have a very basic question:
>> 
>> We build a cloud-scale platform and evaluate if we can use Kafka for
>> pub-sub messaging between our services. Most of our services scale
>> dynamically based on load (number of requests, CPU load etc.). In our
>> current architecture, services are both, producers and consumers since all
>> services listen to some kind of events.
>> 
>> With Kafka, I assume we have two restrictions or issues:
>> 
>>  1.  Number of consumers is restricted to the number of partitions of a
>> topic. Changing the number of partitions is a relatively expensive
>> operation (at least compared to scaling services). Is it necessary to
>> overprovision on the number of partitions in order to be prepared for load
>> peaks?
>>  2.  Adding or removing consumers halts processing of the related
>> partition for a short period of time. Is it possible to avoid or
>> significantly minimize this lag?
>> 
>> Are there any additional best practices to implement Kafka consumers on a
>> cloud scale environment?
>> 
>> Thanks,
>> Moritz
>> 
>> 

Re: Best Practice Scaling Consumers

Posted by Kamal Chandraprakash <ka...@gmail.com>.
1. Yes, you may have to overprovision the number of partitions to handle
the load peaks. Refer this
<https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster>
document to choose the no. of partitions.
2. KIP-429
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol>
is
proposed to reduce the time taken by the consumer rebalance protocol when a
consumer instance is added/removed from the group.

On Mon, May 6, 2019 at 7:47 PM Moritz Petersen <mp...@adobe.com.invalid>
wrote:

> Hi all,
>
> I’m new to Kafka and have a very basic question:
>
> We build a cloud-scale platform and evaluate if we can use Kafka for
> pub-sub messaging between our services. Most of our services scale
> dynamically based on load (number of requests, CPU load etc.). In our
> current architecture, services are both, producers and consumers since all
> services listen to some kind of events.
>
> With Kafka, I assume we have two restrictions or issues:
>
>   1.  Number of consumers is restricted to the number of partitions of a
> topic. Changing the number of partitions is a relatively expensive
> operation (at least compared to scaling services). Is it necessary to
> overprovision on the number of partitions in order to be prepared for load
> peaks?
>   2.  Adding or removing consumers halts processing of the related
> partition for a short period of time. Is it possible to avoid or
> significantly minimize this lag?
>
> Are there any additional best practices to implement Kafka consumers on a
> cloud scale environment?
>
> Thanks,
> Moritz
>
>