You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Tauzell, Dave" <Da...@surescripts.com> on 2019/11/18 14:40:46 UTC
Re: [External] Allow parallel processing
I would go with #1:
1. It will be easier to add new "batch producers" since you won't need to worry about re-partitioning
2. You have more control over the parallelism since you can have different numbers of partitions for each topic
3. You can easily split out your consumer into N consumers if one of those producers is producing more data
4. You can more easily monitor each producer if you are monitoring by topic
-Dave
On 11/18/19, 4:41 AM, "pwozniak" <pw...@man.poznan.pl> wrote:
Hi all,
He is my usecase:
I have three message producers that submits batch of messages to Kafka
from time to time. Let's assume now that one of them just submitted 1k
messages, second one submitted some number of messages after that and
third one also submitted some messages.
I would like to make sure that, when the consumer will start to work,
messages from all producers will be processed (more or less) together.
In other words: That messages from second producer will not have to wait
for all that 1k messages to be processed first.
I have two ideas how to solve it:
1. Prepare three different Kafka topics. Each producer will write to its
dedicated topic, consumer will read from all topics. In this case
consumer will read messages in round-robin fashion (is it true?). So the
messages from second (and third) producer will not have to wait for all
messages from first producer (submitted earlier) to be processed by the
consumer.
2. Have one topic for all producers. Each producer will submit messages
only to some subset of partitions of given topic. For example we will
have 10 partitions in our topic and producers will write only to two
(or three) partitions.
And the questions are:
1. Which solution is the best?
2. Maybe there is another (even better) solution that you can recommend?
Regards,
Pawel
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.
Re: [External] Allow parallel processing
Posted by Eric Azama <ea...@gmail.com>.
I second Dave's suggestion.
With regards to the consumers round-robining between topics, they usually
round-robin in batches. So you'll probably see a consumer work on a large
batch of records from TopicA before moving on to TopicB. Depending on the
behavior of the producers this might appear the same as all of the records
in TopicA getting processed before TopicB.
On Mon, Nov 18, 2019 at 6:41 AM Tauzell, Dave <Da...@surescripts.com>
wrote:
> I would go with #1:
>
> 1. It will be easier to add new "batch producers" since you won't need to
> worry about re-partitioning
> 2. You have more control over the parallelism since you can have different
> numbers of partitions for each topic
> 3. You can easily split out your consumer into N consumers if one of those
> producers is producing more data
> 4. You can more easily monitor each producer if you are monitoring by topic
>
> -Dave
>
> On 11/18/19, 4:41 AM, "pwozniak" <pw...@man.poznan.pl> wrote:
>
> Hi all,
>
> He is my usecase:
>
> I have three message producers that submits batch of messages to Kafka
> from time to time. Let's assume now that one of them just submitted 1k
> messages, second one submitted some number of messages after that and
> third one also submitted some messages.
>
> I would like to make sure that, when the consumer will start to work,
> messages from all producers will be processed (more or less) together.
> In other words: That messages from second producer will not have to
> wait
> for all that 1k messages to be processed first.
>
>
> I have two ideas how to solve it:
> 1. Prepare three different Kafka topics. Each producer will write to
> its
> dedicated topic, consumer will read from all topics. In this case
> consumer will read messages in round-robin fashion (is it true?). So
> the
> messages from second (and third) producer will not have to wait for all
> messages from first producer (submitted earlier) to be processed by the
> consumer.
> 2. Have one topic for all producers. Each producer will submit messages
> only to some subset of partitions of given topic. For example we will
> have 10 partitions in our topic and producers will write only to two
> (or three) partitions.
>
>
> And the questions are:
> 1. Which solution is the best?
> 2. Maybe there is another (even better) solution that you can
> recommend?
>
> Regards,
> Pawel
>
>
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>