You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pushkar Deole <pd...@gmail.com> on 2020/10/27 07:04:06 UTC

multi-threaded consumer configuration like stream threads?

Hi,

Is there any configuration in kafka consumer to specify multiple threads
the way it is there in kafka streams?
Essentially, can we have a consumer with multiple threads where the threads
would divide partitions of topic among them?

Re: multi-threaded consumer configuration like stream threads?

Posted by Pushkar Deole <pd...@gmail.com>.
I think we can handle the failures selectively e.g. if there are issues
with downstream database server then all the events will fail to process so
it will be worth to keep retrying. Else if there is issue only while
processing a particular event, then we can keep retry timeout and after
that timeout it will take another event for processing .

On Tue, Nov 24, 2020 at 5:35 AM Haruki Okada <oc...@gmail.com> wrote:

> I see.
>
> Then I think the appropriate approach depends on your delivery latency
> requirements.
> Just retrying until success is simpler but it could block subsequent
> messages to get processed. (also depends on thread pool size though)
>
> Then another concern when using dead letter topic would be retrying
> backoff.
> If you don't use backoff, the production for dead letter topic could burst
> when downstream db experiences transient problems but on the other hand
> injecting backoff-delay would require consideration about how to not block
> subsequent messages.
>
> (FYI, Decaton provides retry-queueing with backoff out-of-the box. :)
> https://github.com/line/decaton/blob/master/docs/retry-queueing.adoc)
>
> 2020年11月24日(火) 2:38 Pushkar Deole <pd...@gmail.com>:
>
> > Thanks Haruki... right now the max of such types of events that we would
> > have is 100 since we would be supporting those many customers (accounts)
> > for now, for which we are considering a simple approach of a single
> > consumer and a thread pool with around 10 threads. So the question was
> > regarding how to manage failed events, should those be retried until
> > successful or sent to a dead letter queue/topic from where they will be
> > processed again until successful.
> >
> >
> > On Mon, Nov 23, 2020 at 10:16 PM Haruki Okada <oc...@gmail.com>
> wrote:
> >
> > > Hi Pushkar.
> > >
> > > Just for your information, https://github.com/line/decaton is a Kafka
> > > consumer framework that supports parallel processing per single
> > partition.
> > >
> > > It manages committable (i.e. the offset that all preceding offsets have
> > > been processed) offset internally so that preserves at-least-once
> > semantics
> > > even when processing in parallel.
> > >
> > >
> > > 2020年11月24日(火) 1:16 Pushkar Deole <pd...@gmail.com>:
> > >
> > > > Thanks Liam!
> > > > We don't have a requirement to maintain order of processing for
> events
> > > even
> > > > within a partition. Essentially, these are events for various
> accounts
> > > > (customers) that we want to support and do necessary database
> > > provisioning
> > > > for those in our database. So they can be processed in parallel.
> > > >
> > > > I think the 2nd option would suit our requirement to have a single
> > > consumer
> > > > and a bound thread pool for processors. However, the issue we may
> face
> > is
> > > > to commit the offsets only after processing an event since we don't
> > want
> > > > the consumer to auto commit offsets before the provisioning done for
> > the
> > > > customer. How can that be achieved with model #2  ?
> > > >
> > > > On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> > > > liam.clarke@adscale.co.nz> wrote:
> > > >
> > > > > Hi Pushkar,
> > > > >
> > > > > No. You'd need to combine a consumer with a thread pool or similar
> as
> > > you
> > > > > prefer. As the docs say (from
> > > > >
> > > > >
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > > > )
> > > > >
> > > > > We have intentionally avoided implementing a particular threading
> > model
> > > > for
> > > > > > processing. This leaves several options for implementing
> > > multi-threaded
> > > > > > processing of records.
> > > > > > 1. One Consumer Per Thread
> > > > > > A simple option is to give each thread its own consumer instance.
> > > Here
> > > > > are
> > > > > > the pros and cons of this approach:
> > > > > >
> > > > > >    - *PRO*: It is the easiest to implement
> > > > > >
> > > > > >
> > > > > >    - *PRO*: It is often the fastest as no inter-thread
> > co-ordination
> > > is
> > > > > >    needed
> > > > > >
> > > > > >
> > > > > >    - *PRO*: It makes in-order processing on a per-partition basis
> > > very
> > > > > >    easy to implement (each thread just processes messages in the
> > > order
> > > > it
> > > > > >    receives them).
> > > > > >
> > > > > >
> > > > > >    - *CON*: More consumers means more TCP connections to the
> > cluster
> > > > (one
> > > > > >    per thread). In general Kafka handles connections very
> > efficiently
> > > > so
> > > > > this
> > > > > >    is generally a small cost.
> > > > > >
> > > > > >
> > > > > >    - *CON*: Multiple consumers means more requests being sent to
> > the
> > > > > >    server and slightly less batching of data which can cause some
> > > drop
> > > > > in I/O
> > > > > >    throughput.
> > > > > >
> > > > > >
> > > > > >    - *CON*: The number of total threads across all processes will
> > be
> > > > > >    limited by the total number of partitions.
> > > > > >
> > > > > > 2. Decouple Consumption and Processing
> > > > > > Another alternative is to have one or more consumer threads that
> do
> > > all
> > > > > > data consumption and hands off ConsumerRecords
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> > > > >
> > > > > instances
> > > > > > to a blocking queue consumed by a pool of processor threads that
> > > > actually
> > > > > > handle the record processing. This option likewise has pros and
> > cons:
> > > > > >
> > > > > >    - *PRO*: This option allows independently scaling the number
> of
> > > > > >    consumers and processors. This makes it possible to have a
> > single
> > > > > consumer
> > > > > >    that feeds many processor threads, avoiding any limitation on
> > > > > partitions.
> > > > > >
> > > > > >
> > > > > >    - *CON*: Guaranteeing order across the processors requires
> > > > particular
> > > > > >    care as the threads will execute independently an earlier
> chunk
> > of
> > > > > data may
> > > > > >    actually be processed after a later chunk of data just due to
> > the
> > > > > luck of
> > > > > >    thread execution timing. For processing that has no ordering
> > > > > requirements
> > > > > >    this is not a problem.
> > > > > >
> > > > > >
> > > > > >    - *CON*: Manually committing the position becomes harder as it
> > > > > >    requires that all threads co-ordinate to ensure that
> processing
> > is
> > > > > complete
> > > > > >    for that partition.
> > > > > >
> > > > > > There are many possible variations on this approach. For example
> > each
> > > > > > processor thread can have its own queue, and the consumer threads
> > can
> > > > > hash
> > > > > > into these queues using the TopicPartition to ensure in-order
> > > > consumption
> > > > > > and simplify commit.
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Liam Clarke-Hutchinson
> > > > >
> > > > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <
> pdeole2015@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is there any configuration in kafka consumer to specify multiple
> > > > threads
> > > > > > the way it is there in kafka streams?
> > > > > > Essentially, can we have a consumer with multiple threads where
> the
> > > > > threads
> > > > > > would divide partitions of topic among them?
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > ========================
> > > Okada Haruki
> > > ocadaruma@gmail.com
> > > ========================
> > >
> >
>
>
> --
> ========================
> Okada Haruki
> ocadaruma@gmail.com
> ========================
>

Re: multi-threaded consumer configuration like stream threads?

Posted by Haruki Okada <oc...@gmail.com>.
I see.

Then I think the appropriate approach depends on your delivery latency
requirements.
Just retrying until success is simpler but it could block subsequent
messages to get processed. (also depends on thread pool size though)

Then another concern when using dead letter topic would be retrying backoff.
If you don't use backoff, the production for dead letter topic could burst
when downstream db experiences transient problems but on the other hand
injecting backoff-delay would require consideration about how to not block
subsequent messages.

(FYI, Decaton provides retry-queueing with backoff out-of-the box. :)
https://github.com/line/decaton/blob/master/docs/retry-queueing.adoc)

2020年11月24日(火) 2:38 Pushkar Deole <pd...@gmail.com>:

> Thanks Haruki... right now the max of such types of events that we would
> have is 100 since we would be supporting those many customers (accounts)
> for now, for which we are considering a simple approach of a single
> consumer and a thread pool with around 10 threads. So the question was
> regarding how to manage failed events, should those be retried until
> successful or sent to a dead letter queue/topic from where they will be
> processed again until successful.
>
>
> On Mon, Nov 23, 2020 at 10:16 PM Haruki Okada <oc...@gmail.com> wrote:
>
> > Hi Pushkar.
> >
> > Just for your information, https://github.com/line/decaton is a Kafka
> > consumer framework that supports parallel processing per single
> partition.
> >
> > It manages committable (i.e. the offset that all preceding offsets have
> > been processed) offset internally so that preserves at-least-once
> semantics
> > even when processing in parallel.
> >
> >
> > 2020年11月24日(火) 1:16 Pushkar Deole <pd...@gmail.com>:
> >
> > > Thanks Liam!
> > > We don't have a requirement to maintain order of processing for events
> > even
> > > within a partition. Essentially, these are events for various accounts
> > > (customers) that we want to support and do necessary database
> > provisioning
> > > for those in our database. So they can be processed in parallel.
> > >
> > > I think the 2nd option would suit our requirement to have a single
> > consumer
> > > and a bound thread pool for processors. However, the issue we may face
> is
> > > to commit the offsets only after processing an event since we don't
> want
> > > the consumer to auto commit offsets before the provisioning done for
> the
> > > customer. How can that be achieved with model #2  ?
> > >
> > > On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> > > liam.clarke@adscale.co.nz> wrote:
> > >
> > > > Hi Pushkar,
> > > >
> > > > No. You'd need to combine a consumer with a thread pool or similar as
> > you
> > > > prefer. As the docs say (from
> > > >
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > > )
> > > >
> > > > We have intentionally avoided implementing a particular threading
> model
> > > for
> > > > > processing. This leaves several options for implementing
> > multi-threaded
> > > > > processing of records.
> > > > > 1. One Consumer Per Thread
> > > > > A simple option is to give each thread its own consumer instance.
> > Here
> > > > are
> > > > > the pros and cons of this approach:
> > > > >
> > > > >    - *PRO*: It is the easiest to implement
> > > > >
> > > > >
> > > > >    - *PRO*: It is often the fastest as no inter-thread
> co-ordination
> > is
> > > > >    needed
> > > > >
> > > > >
> > > > >    - *PRO*: It makes in-order processing on a per-partition basis
> > very
> > > > >    easy to implement (each thread just processes messages in the
> > order
> > > it
> > > > >    receives them).
> > > > >
> > > > >
> > > > >    - *CON*: More consumers means more TCP connections to the
> cluster
> > > (one
> > > > >    per thread). In general Kafka handles connections very
> efficiently
> > > so
> > > > this
> > > > >    is generally a small cost.
> > > > >
> > > > >
> > > > >    - *CON*: Multiple consumers means more requests being sent to
> the
> > > > >    server and slightly less batching of data which can cause some
> > drop
> > > > in I/O
> > > > >    throughput.
> > > > >
> > > > >
> > > > >    - *CON*: The number of total threads across all processes will
> be
> > > > >    limited by the total number of partitions.
> > > > >
> > > > > 2. Decouple Consumption and Processing
> > > > > Another alternative is to have one or more consumer threads that do
> > all
> > > > > data consumption and hands off ConsumerRecords
> > > > > <
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> > > >
> > > > instances
> > > > > to a blocking queue consumed by a pool of processor threads that
> > > actually
> > > > > handle the record processing. This option likewise has pros and
> cons:
> > > > >
> > > > >    - *PRO*: This option allows independently scaling the number of
> > > > >    consumers and processors. This makes it possible to have a
> single
> > > > consumer
> > > > >    that feeds many processor threads, avoiding any limitation on
> > > > partitions.
> > > > >
> > > > >
> > > > >    - *CON*: Guaranteeing order across the processors requires
> > > particular
> > > > >    care as the threads will execute independently an earlier chunk
> of
> > > > data may
> > > > >    actually be processed after a later chunk of data just due to
> the
> > > > luck of
> > > > >    thread execution timing. For processing that has no ordering
> > > > requirements
> > > > >    this is not a problem.
> > > > >
> > > > >
> > > > >    - *CON*: Manually committing the position becomes harder as it
> > > > >    requires that all threads co-ordinate to ensure that processing
> is
> > > > complete
> > > > >    for that partition.
> > > > >
> > > > > There are many possible variations on this approach. For example
> each
> > > > > processor thread can have its own queue, and the consumer threads
> can
> > > > hash
> > > > > into these queues using the TopicPartition to ensure in-order
> > > consumption
> > > > > and simplify commit.
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Liam Clarke-Hutchinson
> > > >
> > > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pd...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is there any configuration in kafka consumer to specify multiple
> > > threads
> > > > > the way it is there in kafka streams?
> > > > > Essentially, can we have a consumer with multiple threads where the
> > > > threads
> > > > > would divide partitions of topic among them?
> > > > >
> > > >
> > >
> >
> >
> > --
> > ========================
> > Okada Haruki
> > ocadaruma@gmail.com
> > ========================
> >
>


-- 
========================
Okada Haruki
ocadaruma@gmail.com
========================

Re: multi-threaded consumer configuration like stream threads?

Posted by Pushkar Deole <pd...@gmail.com>.
Thanks Haruki... right now the max of such types of events that we would
have is 100 since we would be supporting those many customers (accounts)
for now, for which we are considering a simple approach of a single
consumer and a thread pool with around 10 threads. So the question was
regarding how to manage failed events, should those be retried until
successful or sent to a dead letter queue/topic from where they will be
processed again until successful.


On Mon, Nov 23, 2020 at 10:16 PM Haruki Okada <oc...@gmail.com> wrote:

> Hi Pushkar.
>
> Just for your information, https://github.com/line/decaton is a Kafka
> consumer framework that supports parallel processing per single partition.
>
> It manages committable (i.e. the offset that all preceding offsets have
> been processed) offset internally so that preserves at-least-once semantics
> even when processing in parallel.
>
>
> 2020年11月24日(火) 1:16 Pushkar Deole <pd...@gmail.com>:
>
> > Thanks Liam!
> > We don't have a requirement to maintain order of processing for events
> even
> > within a partition. Essentially, these are events for various accounts
> > (customers) that we want to support and do necessary database
> provisioning
> > for those in our database. So they can be processed in parallel.
> >
> > I think the 2nd option would suit our requirement to have a single
> consumer
> > and a bound thread pool for processors. However, the issue we may face is
> > to commit the offsets only after processing an event since we don't want
> > the consumer to auto commit offsets before the provisioning done for the
> > customer. How can that be achieved with model #2  ?
> >
> > On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> > liam.clarke@adscale.co.nz> wrote:
> >
> > > Hi Pushkar,
> > >
> > > No. You'd need to combine a consumer with a thread pool or similar as
> you
> > > prefer. As the docs say (from
> > >
> > >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > )
> > >
> > > We have intentionally avoided implementing a particular threading model
> > for
> > > > processing. This leaves several options for implementing
> multi-threaded
> > > > processing of records.
> > > > 1. One Consumer Per Thread
> > > > A simple option is to give each thread its own consumer instance.
> Here
> > > are
> > > > the pros and cons of this approach:
> > > >
> > > >    - *PRO*: It is the easiest to implement
> > > >
> > > >
> > > >    - *PRO*: It is often the fastest as no inter-thread co-ordination
> is
> > > >    needed
> > > >
> > > >
> > > >    - *PRO*: It makes in-order processing on a per-partition basis
> very
> > > >    easy to implement (each thread just processes messages in the
> order
> > it
> > > >    receives them).
> > > >
> > > >
> > > >    - *CON*: More consumers means more TCP connections to the cluster
> > (one
> > > >    per thread). In general Kafka handles connections very efficiently
> > so
> > > this
> > > >    is generally a small cost.
> > > >
> > > >
> > > >    - *CON*: Multiple consumers means more requests being sent to the
> > > >    server and slightly less batching of data which can cause some
> drop
> > > in I/O
> > > >    throughput.
> > > >
> > > >
> > > >    - *CON*: The number of total threads across all processes will be
> > > >    limited by the total number of partitions.
> > > >
> > > > 2. Decouple Consumption and Processing
> > > > Another alternative is to have one or more consumer threads that do
> all
> > > > data consumption and hands off ConsumerRecords
> > > > <
> > >
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> > >
> > > instances
> > > > to a blocking queue consumed by a pool of processor threads that
> > actually
> > > > handle the record processing. This option likewise has pros and cons:
> > > >
> > > >    - *PRO*: This option allows independently scaling the number of
> > > >    consumers and processors. This makes it possible to have a single
> > > consumer
> > > >    that feeds many processor threads, avoiding any limitation on
> > > partitions.
> > > >
> > > >
> > > >    - *CON*: Guaranteeing order across the processors requires
> > particular
> > > >    care as the threads will execute independently an earlier chunk of
> > > data may
> > > >    actually be processed after a later chunk of data just due to the
> > > luck of
> > > >    thread execution timing. For processing that has no ordering
> > > requirements
> > > >    this is not a problem.
> > > >
> > > >
> > > >    - *CON*: Manually committing the position becomes harder as it
> > > >    requires that all threads co-ordinate to ensure that processing is
> > > complete
> > > >    for that partition.
> > > >
> > > > There are many possible variations on this approach. For example each
> > > > processor thread can have its own queue, and the consumer threads can
> > > hash
> > > > into these queues using the TopicPartition to ensure in-order
> > consumption
> > > > and simplify commit.
> > >
> > >
> > > Cheers,
> > >
> > > Liam Clarke-Hutchinson
> > >
> > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pd...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there any configuration in kafka consumer to specify multiple
> > threads
> > > > the way it is there in kafka streams?
> > > > Essentially, can we have a consumer with multiple threads where the
> > > threads
> > > > would divide partitions of topic among them?
> > > >
> > >
> >
>
>
> --
> ========================
> Okada Haruki
> ocadaruma@gmail.com
> ========================
>

Re: multi-threaded consumer configuration like stream threads?

Posted by Haruki Okada <oc...@gmail.com>.
Hi Pushkar.

Just for your information, https://github.com/line/decaton is a Kafka
consumer framework that supports parallel processing per single partition.

It manages committable (i.e. the offset that all preceding offsets have
been processed) offset internally so that preserves at-least-once semantics
even when processing in parallel.


2020年11月24日(火) 1:16 Pushkar Deole <pd...@gmail.com>:

> Thanks Liam!
> We don't have a requirement to maintain order of processing for events even
> within a partition. Essentially, these are events for various accounts
> (customers) that we want to support and do necessary database provisioning
> for those in our database. So they can be processed in parallel.
>
> I think the 2nd option would suit our requirement to have a single consumer
> and a bound thread pool for processors. However, the issue we may face is
> to commit the offsets only after processing an event since we don't want
> the consumer to auto commit offsets before the provisioning done for the
> customer. How can that be achieved with model #2  ?
>
> On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> liam.clarke@adscale.co.nz> wrote:
>
> > Hi Pushkar,
> >
> > No. You'd need to combine a consumer with a thread pool or similar as you
> > prefer. As the docs say (from
> >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > )
> >
> > We have intentionally avoided implementing a particular threading model
> for
> > > processing. This leaves several options for implementing multi-threaded
> > > processing of records.
> > > 1. One Consumer Per Thread
> > > A simple option is to give each thread its own consumer instance. Here
> > are
> > > the pros and cons of this approach:
> > >
> > >    - *PRO*: It is the easiest to implement
> > >
> > >
> > >    - *PRO*: It is often the fastest as no inter-thread co-ordination is
> > >    needed
> > >
> > >
> > >    - *PRO*: It makes in-order processing on a per-partition basis very
> > >    easy to implement (each thread just processes messages in the order
> it
> > >    receives them).
> > >
> > >
> > >    - *CON*: More consumers means more TCP connections to the cluster
> (one
> > >    per thread). In general Kafka handles connections very efficiently
> so
> > this
> > >    is generally a small cost.
> > >
> > >
> > >    - *CON*: Multiple consumers means more requests being sent to the
> > >    server and slightly less batching of data which can cause some drop
> > in I/O
> > >    throughput.
> > >
> > >
> > >    - *CON*: The number of total threads across all processes will be
> > >    limited by the total number of partitions.
> > >
> > > 2. Decouple Consumption and Processing
> > > Another alternative is to have one or more consumer threads that do all
> > > data consumption and hands off ConsumerRecords
> > > <
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> >
> > instances
> > > to a blocking queue consumed by a pool of processor threads that
> actually
> > > handle the record processing. This option likewise has pros and cons:
> > >
> > >    - *PRO*: This option allows independently scaling the number of
> > >    consumers and processors. This makes it possible to have a single
> > consumer
> > >    that feeds many processor threads, avoiding any limitation on
> > partitions.
> > >
> > >
> > >    - *CON*: Guaranteeing order across the processors requires
> particular
> > >    care as the threads will execute independently an earlier chunk of
> > data may
> > >    actually be processed after a later chunk of data just due to the
> > luck of
> > >    thread execution timing. For processing that has no ordering
> > requirements
> > >    this is not a problem.
> > >
> > >
> > >    - *CON*: Manually committing the position becomes harder as it
> > >    requires that all threads co-ordinate to ensure that processing is
> > complete
> > >    for that partition.
> > >
> > > There are many possible variations on this approach. For example each
> > > processor thread can have its own queue, and the consumer threads can
> > hash
> > > into these queues using the TopicPartition to ensure in-order
> consumption
> > > and simplify commit.
> >
> >
> > Cheers,
> >
> > Liam Clarke-Hutchinson
> >
> > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pd...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Is there any configuration in kafka consumer to specify multiple
> threads
> > > the way it is there in kafka streams?
> > > Essentially, can we have a consumer with multiple threads where the
> > threads
> > > would divide partitions of topic among them?
> > >
> >
>


-- 
========================
Okada Haruki
ocadaruma@gmail.com
========================

Re: multi-threaded consumer configuration like stream threads?

Posted by Pushkar Deole <pd...@gmail.com>.
Thanks Liam!
We don't have a requirement to maintain order of processing for events even
within a partition. Essentially, these are events for various accounts
(customers) that we want to support and do necessary database provisioning
for those in our database. So they can be processed in parallel.

I think the 2nd option would suit our requirement to have a single consumer
and a bound thread pool for processors. However, the issue we may face is
to commit the offsets only after processing an event since we don't want
the consumer to auto commit offsets before the provisioning done for the
customer. How can that be achieved with model #2  ?

On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
liam.clarke@adscale.co.nz> wrote:

> Hi Pushkar,
>
> No. You'd need to combine a consumer with a thread pool or similar as you
> prefer. As the docs say (from
>
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> )
>
> We have intentionally avoided implementing a particular threading model for
> > processing. This leaves several options for implementing multi-threaded
> > processing of records.
> > 1. One Consumer Per Thread
> > A simple option is to give each thread its own consumer instance. Here
> are
> > the pros and cons of this approach:
> >
> >    - *PRO*: It is the easiest to implement
> >
> >
> >    - *PRO*: It is often the fastest as no inter-thread co-ordination is
> >    needed
> >
> >
> >    - *PRO*: It makes in-order processing on a per-partition basis very
> >    easy to implement (each thread just processes messages in the order it
> >    receives them).
> >
> >
> >    - *CON*: More consumers means more TCP connections to the cluster (one
> >    per thread). In general Kafka handles connections very efficiently so
> this
> >    is generally a small cost.
> >
> >
> >    - *CON*: Multiple consumers means more requests being sent to the
> >    server and slightly less batching of data which can cause some drop
> in I/O
> >    throughput.
> >
> >
> >    - *CON*: The number of total threads across all processes will be
> >    limited by the total number of partitions.
> >
> > 2. Decouple Consumption and Processing
> > Another alternative is to have one or more consumer threads that do all
> > data consumption and hands off ConsumerRecords
> > <
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html>
> instances
> > to a blocking queue consumed by a pool of processor threads that actually
> > handle the record processing. This option likewise has pros and cons:
> >
> >    - *PRO*: This option allows independently scaling the number of
> >    consumers and processors. This makes it possible to have a single
> consumer
> >    that feeds many processor threads, avoiding any limitation on
> partitions.
> >
> >
> >    - *CON*: Guaranteeing order across the processors requires particular
> >    care as the threads will execute independently an earlier chunk of
> data may
> >    actually be processed after a later chunk of data just due to the
> luck of
> >    thread execution timing. For processing that has no ordering
> requirements
> >    this is not a problem.
> >
> >
> >    - *CON*: Manually committing the position becomes harder as it
> >    requires that all threads co-ordinate to ensure that processing is
> complete
> >    for that partition.
> >
> > There are many possible variations on this approach. For example each
> > processor thread can have its own queue, and the consumer threads can
> hash
> > into these queues using the TopicPartition to ensure in-order consumption
> > and simplify commit.
>
>
> Cheers,
>
> Liam Clarke-Hutchinson
>
> On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pd...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Is there any configuration in kafka consumer to specify multiple threads
> > the way it is there in kafka streams?
> > Essentially, can we have a consumer with multiple threads where the
> threads
> > would divide partitions of topic among them?
> >
>

Re: multi-threaded consumer configuration like stream threads?

Posted by Liam Clarke-Hutchinson <li...@adscale.co.nz>.
Hi Pushkar,

No. You'd need to combine a consumer with a thread pool or similar as you
prefer. As the docs say (from
https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
)

We have intentionally avoided implementing a particular threading model for
> processing. This leaves several options for implementing multi-threaded
> processing of records.
> 1. One Consumer Per Thread
> A simple option is to give each thread its own consumer instance. Here are
> the pros and cons of this approach:
>
>    - *PRO*: It is the easiest to implement
>
>
>    - *PRO*: It is often the fastest as no inter-thread co-ordination is
>    needed
>
>
>    - *PRO*: It makes in-order processing on a per-partition basis very
>    easy to implement (each thread just processes messages in the order it
>    receives them).
>
>
>    - *CON*: More consumers means more TCP connections to the cluster (one
>    per thread). In general Kafka handles connections very efficiently so this
>    is generally a small cost.
>
>
>    - *CON*: Multiple consumers means more requests being sent to the
>    server and slightly less batching of data which can cause some drop in I/O
>    throughput.
>
>
>    - *CON*: The number of total threads across all processes will be
>    limited by the total number of partitions.
>
> 2. Decouple Consumption and Processing
> Another alternative is to have one or more consumer threads that do all
> data consumption and hands off ConsumerRecords
> <https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html> instances
> to a blocking queue consumed by a pool of processor threads that actually
> handle the record processing. This option likewise has pros and cons:
>
>    - *PRO*: This option allows independently scaling the number of
>    consumers and processors. This makes it possible to have a single consumer
>    that feeds many processor threads, avoiding any limitation on partitions.
>
>
>    - *CON*: Guaranteeing order across the processors requires particular
>    care as the threads will execute independently an earlier chunk of data may
>    actually be processed after a later chunk of data just due to the luck of
>    thread execution timing. For processing that has no ordering requirements
>    this is not a problem.
>
>
>    - *CON*: Manually committing the position becomes harder as it
>    requires that all threads co-ordinate to ensure that processing is complete
>    for that partition.
>
> There are many possible variations on this approach. For example each
> processor thread can have its own queue, and the consumer threads can hash
> into these queues using the TopicPartition to ensure in-order consumption
> and simplify commit.


Cheers,

Liam Clarke-Hutchinson

On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pd...@gmail.com> wrote:

> Hi,
>
> Is there any configuration in kafka consumer to specify multiple threads
> the way it is there in kafka streams?
> Essentially, can we have a consumer with multiple threads where the threads
> would divide partitions of topic among them?
>