You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Tauzell, Dave" <Da...@surescripts.com> on 2019/11/18 14:40:46 UTC

Re: [External] Allow parallel processing

I would go with #1:

1. It will be easier to add new "batch producers" since you won't need to worry about re-partitioning
2. You have more control over the parallelism since you can have different numbers of partitions for each topic
3. You can easily split out your consumer into N consumers if one of those producers is producing more data
4. You can more easily monitor each producer if you are monitoring by topic

-Dave

On 11/18/19, 4:41 AM, "pwozniak" <pw...@man.poznan.pl> wrote:

    Hi all,

    He is my usecase:

    I have three message producers that submits batch of messages to Kafka
    from time to time. Let's assume now that one of them just submitted 1k
    messages, second one submitted some number of messages after that and
    third one also submitted some messages.

    I would like to make sure that, when the consumer will start to work,
    messages from all producers will be processed (more or less) together.
    In other words: That messages from second producer will not have to wait
    for all that 1k messages to be processed first.


    I have two ideas how to solve it:
    1. Prepare three different Kafka topics. Each producer will write to its
    dedicated topic, consumer will read from all topics. In this case
    consumer will read messages in round-robin fashion (is it true?). So the
    messages from second (and third) producer will not have to wait for all
    messages from first producer (submitted earlier) to be processed by the
    consumer.
    2. Have one topic for all producers. Each producer will submit messages
    only to some subset of partitions of given topic. For example we will
    have 10 partitions in our topic and producers will write  only to two
    (or three) partitions.


    And the questions are:
    1. Which solution is the best?
    2. Maybe there is another (even better) solution that you can recommend?

    Regards,
    Pawel



This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

Re: [External] Allow parallel processing

Posted by Eric Azama <ea...@gmail.com>.
I second Dave's suggestion.

With regards to the consumers round-robining between topics, they usually
round-robin in batches. So you'll probably see a consumer work on a large
batch of records from TopicA before moving on to TopicB. Depending on the
behavior of the producers this might appear the same as all of the records
in TopicA getting processed before TopicB.

On Mon, Nov 18, 2019 at 6:41 AM Tauzell, Dave <Da...@surescripts.com>
wrote:

> I would go with #1:
>
> 1. It will be easier to add new "batch producers" since you won't need to
> worry about re-partitioning
> 2. You have more control over the parallelism since you can have different
> numbers of partitions for each topic
> 3. You can easily split out your consumer into N consumers if one of those
> producers is producing more data
> 4. You can more easily monitor each producer if you are monitoring by topic
>
> -Dave
>
> On 11/18/19, 4:41 AM, "pwozniak" <pw...@man.poznan.pl> wrote:
>
>     Hi all,
>
>     He is my usecase:
>
>     I have three message producers that submits batch of messages to Kafka
>     from time to time. Let's assume now that one of them just submitted 1k
>     messages, second one submitted some number of messages after that and
>     third one also submitted some messages.
>
>     I would like to make sure that, when the consumer will start to work,
>     messages from all producers will be processed (more or less) together.
>     In other words: That messages from second producer will not have to
> wait
>     for all that 1k messages to be processed first.
>
>
>     I have two ideas how to solve it:
>     1. Prepare three different Kafka topics. Each producer will write to
> its
>     dedicated topic, consumer will read from all topics. In this case
>     consumer will read messages in round-robin fashion (is it true?). So
> the
>     messages from second (and third) producer will not have to wait for all
>     messages from first producer (submitted earlier) to be processed by the
>     consumer.
>     2. Have one topic for all producers. Each producer will submit messages
>     only to some subset of partitions of given topic. For example we will
>     have 10 partitions in our topic and producers will write  only to two
>     (or three) partitions.
>
>
>     And the questions are:
>     1. Which solution is the best?
>     2. Maybe there is another (even better) solution that you can
> recommend?
>
>     Regards,
>     Pawel
>
>
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>