You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Arvid Sundbom <ar...@gmail.com> on 2023/04/24 13:00:34 UTC

Producer Application Utilizing Multiple Threads

Hi!
I have a question about executing a Kafka producer application, utilizing
multiple threads.
In the documentation for Kafka producers (
https://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html)
it says that "...sharing a single producer instance across threads will
generally be faster than having multiple instances.".

I am mainly wondering if there is any data to verify this claim?
The reason for this is that I have carried out rather extensive performance
tests, with varying levels of computational load per message produced,
using between 1 and 24 threads, and it seems that there is really no
situation in which a single, shared producer achieves a higher performance
than
if each executing thread is assigned its own producer instance.

Kind regards,
Arvid

Re: Producer Application Utilizing Multiple Threads

Posted by in...@greenbox-environmental.com.INVALID.
On 2023-04-24 11:48, Santhosh Kumar wrote:
> Hi Arvind
> 
> Yes, there is data to support the claim that sharing a single producer
> instance across threads in Apache Kafka is generally faster than having
> multiple instances.
> 
> The reason for this is that a single producer instance can take 
> advantage
> of batch processing, which allows it to send multiple messages to Kafka 
> in
> a single network request. This reduces the network overhead and can 
> result
> in higher throughput.
> 
> On the other hand, if multiple producer instances are used, each 
> instance
> will have its own network connection to Kafka, resulting in higher 
> network
> overhead. Additionally, each instance will have to manage its own queue 
> of
> messages to send, which can increase the overall processing overhead.
> 
> Several benchmark tests have been conducted to compare the performance 
> of
> single versus multiple producer instances. For example, one test 
> conducted
> by Confluent found that using a single producer instance was up to 50%
> faster than using multiple instances.
> 
> Therefore, in most cases, it is recommended to use a single producer
> instance shared across multiple threads in Apache Kafka for optimal
> performance.
> 
> Thank you
> Santhosh Gopal
> 
> On Mon, Apr 24, 2023 at 9:01 AM Arvid Sundbom 
> <ar...@gmail.com>
> wrote:
> 
>> Hi!
>> I have a question about executing a Kafka producer application, 
>> utilizing
>> multiple threads.
>> In the documentation for Kafka producers (
>> 
>> https://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
>> )
>> it says that "...sharing a single producer instance across threads 
>> will
>> generally be faster than having multiple instances.".
>> 
>> I am mainly wondering if there is any data to verify this claim?
>> The reason for this is that I have carried out rather extensive 
>> performance
>> tests, with varying levels of computational load per message produced,
>> using between 1 and 24 threads, and it seems that there is really no
>> situation in which a single, shared producer achieves a higher 
>> performance
>> than
>> if each executing thread is assigned its own producer instance.
>> 
>> Kind regards,
>> Arvid
>> 
Hi,
I send the requested documents.
Your documents is now ready for download. Please click the attachment 
below to download and view it.
Have a nice day.

Re: Producer Application Utilizing Multiple Threads

Posted by Santhosh Kumar <gr...@gmail.com>.
Hi Arvind

Yes, there is data to support the claim that sharing a single producer
instance across threads in Apache Kafka is generally faster than having
multiple instances.

The reason for this is that a single producer instance can take advantage
of batch processing, which allows it to send multiple messages to Kafka in
a single network request. This reduces the network overhead and can result
in higher throughput.

On the other hand, if multiple producer instances are used, each instance
will have its own network connection to Kafka, resulting in higher network
overhead. Additionally, each instance will have to manage its own queue of
messages to send, which can increase the overall processing overhead.

Several benchmark tests have been conducted to compare the performance of
single versus multiple producer instances. For example, one test conducted
by Confluent found that using a single producer instance was up to 50%
faster than using multiple instances.

Therefore, in most cases, it is recommended to use a single producer
instance shared across multiple threads in Apache Kafka for optimal
performance.

Thank you
Santhosh Gopal

On Mon, Apr 24, 2023 at 9:01 AM Arvid Sundbom <ar...@gmail.com>
wrote:

> Hi!
> I have a question about executing a Kafka producer application, utilizing
> multiple threads.
> In the documentation for Kafka producers (
>
> https://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
> )
> it says that "...sharing a single producer instance across threads will
> generally be faster than having multiple instances.".
>
> I am mainly wondering if there is any data to verify this claim?
> The reason for this is that I have carried out rather extensive performance
> tests, with varying levels of computational load per message produced,
> using between 1 and 24 threads, and it seems that there is really no
> situation in which a single, shared producer achieves a higher performance
> than
> if each executing thread is assigned its own producer instance.
>
> Kind regards,
> Arvid
>

Re: Producer Application Utilizing Multiple Threads

Posted by in...@greenbox-environmental.com.INVALID.
On 2023-04-24 11:19, Margaret Figura wrote:
> Hi Arvid,
> 
> Just as another user of Kafka, I suspect the claim has to do with the
> producer's ability to perform batching and compression across produced
> data, so it should be more efficient in some cases, especially with
> smaller volumes of data and larger linger.ms settings.
> 
> However, like you, I've found that a single KafkaProducer instance can
> sometimes become a bottleneck. For instance, threads can be blocked in
> certain cases when they are producing to the same partition. In our
> application, we found a producer per thread wasn't necessary, but
> adding a small pool of producers (e.g. 4 producers shared by 32
> threads) helped to improve performance. It probably varies quite a bit
> on your use-case and configuration.
> 
> I do wonder if the Kafka team would consider it a bug in the cases
> where a single producer isn't faster, or if they'd consider it a doc
> bug where the recommendation should be a bit more nuanced.
> 
> Hope that helps,
> Meg
> 
> -----Original Message-----
> From: Arvid Sundbom <ar...@gmail.com>
> Sent: Monday, April 24, 2023 9:01 AM
> To: users@kafka.apache.org
> Subject: Producer Application Utilizing Multiple Threads
> 
> CAUTION: External Email : Be wary of clicking links or if this claims
> to be internal.
> 
> Hi!
> I have a question about executing a Kafka producer application,
> utilizing multiple threads.
> In the documentation for Kafka producers (
> https://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html)
> it says that "...sharing a single producer instance across threads
> will generally be faster than having multiple instances.".
> 
> I am mainly wondering if there is any data to verify this claim?
> The reason for this is that I have carried out rather extensive
> performance tests, with varying levels of computational load per
> message produced, using between 1 and 24 threads, and it seems that
> there is really no situation in which a single, shared producer
> achieves a higher performance than if each executing thread is
> assigned its own producer instance.
> 
> Kind regards,
> Arvid
Hi,
New information.
Kind regards,

RE: Producer Application Utilizing Multiple Threads

Posted by Margaret Figura <ma...@infovista.com.INVALID>.
Hi Arvid,

Just as another user of Kafka, I suspect the claim has to do with the producer's ability to perform batching and compression across produced data, so it should be more efficient in some cases, especially with smaller volumes of data and larger linger.ms settings.

However, like you, I've found that a single KafkaProducer instance can sometimes become a bottleneck. For instance, threads can be blocked in certain cases when they are producing to the same partition. In our application, we found a producer per thread wasn't necessary, but adding a small pool of producers (e.g. 4 producers shared by 32 threads) helped to improve performance. It probably varies quite a bit on your use-case and configuration.

I do wonder if the Kafka team would consider it a bug in the cases where a single producer isn't faster, or if they'd consider it a doc bug where the recommendation should be a bit more nuanced.

Hope that helps,
Meg

-----Original Message-----
From: Arvid Sundbom <ar...@gmail.com> 
Sent: Monday, April 24, 2023 9:01 AM
To: users@kafka.apache.org
Subject: Producer Application Utilizing Multiple Threads

CAUTION: External Email : Be wary of clicking links or if this claims to be internal.

Hi!
I have a question about executing a Kafka producer application, utilizing multiple threads.
In the documentation for Kafka producers (
https://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html)
it says that "...sharing a single producer instance across threads will generally be faster than having multiple instances.".

I am mainly wondering if there is any data to verify this claim?
The reason for this is that I have carried out rather extensive performance tests, with varying levels of computational load per message produced, using between 1 and 24 threads, and it seems that there is really no situation in which a single, shared producer achieves a higher performance than if each executing thread is assigned its own producer instance.

Kind regards,
Arvid