You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Sunny Kim <su...@ziprecruiter.com> on 2017/08/28 21:59:32 UTC

Strange Kafka throughput issues

Hello,

First here's what I have:

- 5 node (m4.2xlarge,  8 vCPU, 32G RAM) Kafka cluster
- running version: 0.11.0
- each broker has 4 dedicated ebs storage
- a single topic:  40 partitions,  repl-factor=3

I'm using the kafka-producer-perf-test to benchmark it.

---------------------
Test 1:
- record-size: 1000 bytes
- producer setting:  batch.size=40000,  linger.ms=40

Throughput: 53K records/sec (51MBsec)
----------------------
Test 2:
- same as Test 1 but also added:  compression.type=lz4

Throughput: 316K records/sec (300MB/sec)
----------------------

So far so good!  And pretty impressive!  But problem starts when I increase
the record-size to 48K bytes.   Of course, I realize the throughput will
decrease as record size gets bigger.  But what I'm seeing appears to be
beyond the normal expected behavior.   I appear to be hitting on some
resource/blocking issues.   But I can't figure out what.   Here's the
benchmark result.


Test 3:
- record-size: 48000 bytes
- producer setting:  batch.size=40,000, linger.ms=40, compression.type=lz4

Throughput:  1160 records/sec (53MB/sec)

And I've systematically changed tried many different combinations of above
parameters (i.e. increasing batch.size gradually from 40,000 to 4,000,000.
linger.ms from 40 to 4000, etc and their combinations).   For every test,
 the throughput would more or less remain the same at between 1160 to 1350
records/sec.

Now, if I run 2 producers in parallel, each producer will still manage to
produce throughput of 1160 rec/sec.   Even with 3 producers in parallel,
 same thing.

So obviously, it's not the kafka cluster issue.    There appears to be
something blocking on the producer side.   And it doesn't appear to be the
cpu or the memory.     Could it be the kafka-producer-perf-tool script
that's the problem?

If anyone has a suggestion to help me investigate this further, I would
very much appreciate it.   Thank you!

Btw,  was the "--threads" option removed from the tool?   Why?

regards,
Sunny

Re: Strange Kafka throughput issues

Posted by Sunny Kim <su...@ziprecruiter.com>.

Anyone has thought on this?   Anyone producing messages of this size and
can you share your config?  Or I wonder if it would be inappropriate use
case to send record size of 48K at a rate of 400 MB/sec.


On Mon, Aug 28, 2017 at 2:59 PM, Sunny Kim <su...@ziprecruiter.com> wrote:

> Hello,
>
> First here's what I have:
>
> - 5 node (m4.2xlarge,  8 vCPU, 32G RAM) Kafka cluster
> - running version: 0.11.0
> - each broker has 4 dedicated ebs storage
> - a single topic:  40 partitions,  repl-factor=3
>
> I'm using the kafka-producer-perf-test to benchmark it.
>
> ---------------------
> Test 1:
> - record-size: 1000 bytes
> - producer setting:  batch.size=40000,  linger.ms=40
>
> Throughput: 53K records/sec (51MBsec)
> ----------------------
> Test 2:
> - same as Test 1 but also added:  compression.type=lz4
>
> Throughput: 316K records/sec (300MB/sec)
> ----------------------
>
> So far so good!  And pretty impressive!  But problem starts when I
> increase the record-size to 48K bytes.   Of course, I realize the
> throughput will decrease as record size gets bigger.  But what I'm seeing
> appears to be beyond the normal expected behavior.   I appear to be hitting
> on some resource/blocking issues.   But I can't figure out what.   Here's
> the benchmark result.
>
>
> Test 3:
> - record-size: 48000 bytes
> - producer setting:  batch.size=40,000, linger.ms=40, compression.type=lz4
>
> Throughput:  1160 records/sec (53MB/sec)
>
> And I've systematically changed tried many different combinations of above
> parameters (i.e. increasing batch.size gradually from 40,000 to 4,000,000.
> linger.ms from 40 to 4000, etc and their combinations).   For every test,
>  the throughput would more or less remain the same at between 1160 to 1350
> records/sec.
>
> Now, if I run 2 producers in parallel, each producer will still manage to
> produce throughput of 1160 rec/sec.   Even with 3 producers in parallel,
>  same thing.
>
> So obviously, it's not the kafka cluster issue.    There appears to be
> something blocking on the producer side.   And it doesn't appear to be the
> cpu or the memory.     Could it be the kafka-producer-perf-tool script
> that's the problem?
>
> If anyone has a suggestion to help me investigate this further, I would
> very much appreciate it.   Thank you!
>
> Btw,  was the "--threads" option removed from the tool?   Why?
>
> regards,
> Sunny
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Strange Kafka throughput issues

Posted by "Tauzell, Dave" <Da...@surescripts.com>.

Have you tried increasing max.in.flight.requests.per.connection?  I wonder if that would be similar to you having multiple producers.

Dave

Sent using OWA for iPhone
________________________________________
From: Sunny Kim <su...@ziprecruiter.com>
Sent: Wednesday, August 30, 2017 4:55:02 PM
To: users@kafka.apache.org
Subject: Re: Strange Kafka throughput issues

Thanks Boris,

It's already using SSD.  Yes, I understand that about EBS.  But I don't
think that explains the throughput I was getting with 1K message size (300K
msgs/sec @ 300MB/sec) vs 48K message size (1160 msgs/sec @ 53MB/sec).....
and the fact that nothing seems to affect the throughput by changing
parameters like batch.size, linger.ms, and compression.type.   Something
else appears to be blocking the producer.     As noted in the original
email,  if I run multiple copies of producer instances in parallel, each
producer is able to send at the similar rate of 1160 msg/sec and the kafka
cluster does fine to process them.   So it doesn't appear to be the Kafka
cluster issue.



On Wed, Aug 30, 2017 at 2:16 PM, Boris Sorochkin <sb...@gmail.com> wrote:

> Can you please do 2 additional tests:
> 1. Take SSD based instances and put Kafka storage on SSD.
> 2. Run storage benchmark with the same block size.
>
> What throughput do you receive?
> What EBS storage type do you use?
> Remember, EBS network is shared with the instance network so you'd get at
> most half of the nominal throughput with EBS.
>
> On Tue, Aug 29, 2017 at 12:59 AM, Sunny Kim <su...@ziprecruiter.com>
> wrote:
>
> > Hello,
> >
> > First here's what I have:
> >
> > - 5 node (m4.2xlarge,  8 vCPU, 32G RAM) Kafka cluster
> > - running version: 0.11.0
> > - each broker has 4 dedicated ebs storage
> > - a single topic:  40 partitions,  repl-factor=3
> >
> > I'm using the kafka-producer-perf-test to benchmark it.
> >
> > ---------------------
> > Test 1:
> > - record-size: 1000 bytes
> > - producer setting:  batch.size=40000,  linger.ms=40
> >
> > Throughput: 53K records/sec (51MBsec)
> > ----------------------
> > Test 2:
> > - same as Test 1 but also added:  compression.type=lz4
> >
> > Throughput: 316K records/sec (300MB/sec)
> > ----------------------
> >
> > So far so good!  And pretty impressive!  But problem starts when I
> increase
> > the record-size to 48K bytes.   Of course, I realize the throughput will
> > decrease as record size gets bigger.  But what I'm seeing appears to be
> > beyond the normal expected behavior.   I appear to be hitting on some
> > resource/blocking issues.   But I can't figure out what.   Here's the
> > benchmark result.
> >
> >
> > Test 3:
> > - record-size: 48000 bytes
> > - producer setting:  batch.size=40,000, linger.ms=40,
> compression.type=lz4
> >
> > Throughput:  1160 records/sec (53MB/sec)
> >
> > And I've systematically changed tried many different combinations of
> above
> > parameters (i.e. increasing batch.size gradually from 40,000 to
> 4,000,000.
> > linger.ms from 40 to 4000, etc and their combinations).   For every
> test,
> >  the throughput would more or less remain the same at between 1160 to
> 1350
> > records/sec.
> >
> > Now, if I run 2 producers in parallel, each producer will still manage to
> > produce throughput of 1160 rec/sec.   Even with 3 producers in parallel,
> >  same thing.
> >
> > So obviously, it's not the kafka cluster issue.    There appears to be
> > something blocking on the producer side.   And it doesn't appear to be
> the
> > cpu or the memory.     Could it be the kafka-producer-perf-tool script
> > that's the problem?
> >
> > If anyone has a suggestion to help me investigate this further, I would
> > very much appreciate it.   Thank you!
> >
> > Btw,  was the "--threads" option removed from the tool?   Why?
> >
> > regards,
> > Sunny
> >
>
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

Re: Strange Kafka throughput issues

Posted by Sunny Kim <su...@ziprecruiter.com>.

Thanks Boris,

It's already using SSD.  Yes, I understand that about EBS.  But I don't
think that explains the throughput I was getting with 1K message size (300K
msgs/sec @ 300MB/sec) vs 48K message size (1160 msgs/sec @ 53MB/sec).....
and the fact that nothing seems to affect the throughput by changing
parameters like batch.size, linger.ms, and compression.type.   Something
else appears to be blocking the producer.     As noted in the original
email,  if I run multiple copies of producer instances in parallel, each
producer is able to send at the similar rate of 1160 msg/sec and the kafka
cluster does fine to process them.   So it doesn't appear to be the Kafka
cluster issue.



On Wed, Aug 30, 2017 at 2:16 PM, Boris Sorochkin <sb...@gmail.com> wrote:

> Can you please do 2 additional tests:
> 1. Take SSD based instances and put Kafka storage on SSD.
> 2. Run storage benchmark with the same block size.
>
> What throughput do you receive?
> What EBS storage type do you use?
> Remember, EBS network is shared with the instance network so you'd get at
> most half of the nominal throughput with EBS.
>
> On Tue, Aug 29, 2017 at 12:59 AM, Sunny Kim <su...@ziprecruiter.com>
> wrote:
>
> > Hello,
> >
> > First here's what I have:
> >
> > - 5 node (m4.2xlarge,  8 vCPU, 32G RAM) Kafka cluster
> > - running version: 0.11.0
> > - each broker has 4 dedicated ebs storage
> > - a single topic:  40 partitions,  repl-factor=3
> >
> > I'm using the kafka-producer-perf-test to benchmark it.
> >
> > ---------------------
> > Test 1:
> > - record-size: 1000 bytes
> > - producer setting:  batch.size=40000,  linger.ms=40
> >
> > Throughput: 53K records/sec (51MBsec)
> > ----------------------
> > Test 2:
> > - same as Test 1 but also added:  compression.type=lz4
> >
> > Throughput: 316K records/sec (300MB/sec)
> > ----------------------
> >
> > So far so good!  And pretty impressive!  But problem starts when I
> increase
> > the record-size to 48K bytes.   Of course, I realize the throughput will
> > decrease as record size gets bigger.  But what I'm seeing appears to be
> > beyond the normal expected behavior.   I appear to be hitting on some
> > resource/blocking issues.   But I can't figure out what.   Here's the
> > benchmark result.
> >
> >
> > Test 3:
> > - record-size: 48000 bytes
> > - producer setting:  batch.size=40,000, linger.ms=40,
> compression.type=lz4
> >
> > Throughput:  1160 records/sec (53MB/sec)
> >
> > And I've systematically changed tried many different combinations of
> above
> > parameters (i.e. increasing batch.size gradually from 40,000 to
> 4,000,000.
> > linger.ms from 40 to 4000, etc and their combinations).   For every
> test,
> >  the throughput would more or less remain the same at between 1160 to
> 1350
> > records/sec.
> >
> > Now, if I run 2 producers in parallel, each producer will still manage to
> > produce throughput of 1160 rec/sec.   Even with 3 producers in parallel,
> >  same thing.
> >
> > So obviously, it's not the kafka cluster issue.    There appears to be
> > something blocking on the producer side.   And it doesn't appear to be
> the
> > cpu or the memory.     Could it be the kafka-producer-perf-tool script
> > that's the problem?
> >
> > If anyone has a suggestion to help me investigate this further, I would
> > very much appreciate it.   Thank you!
> >
> > Btw,  was the "--threads" option removed from the tool?   Why?
> >
> > regards,
> > Sunny
> >
>

Re: Strange Kafka throughput issues

Posted by Boris Sorochkin <sb...@gmail.com>.

Can you please do 2 additional tests:
1. Take SSD based instances and put Kafka storage on SSD.
2. Run storage benchmark with the same block size.

What throughput do you receive?
What EBS storage type do you use?
Remember, EBS network is shared with the instance network so you'd get at
most half of the nominal throughput with EBS.

On Tue, Aug 29, 2017 at 12:59 AM, Sunny Kim <su...@ziprecruiter.com> wrote:

> Hello,
>
> First here's what I have:
>
> - 5 node (m4.2xlarge,  8 vCPU, 32G RAM) Kafka cluster
> - running version: 0.11.0
> - each broker has 4 dedicated ebs storage
> - a single topic:  40 partitions,  repl-factor=3
>
> I'm using the kafka-producer-perf-test to benchmark it.
>
> ---------------------
> Test 1:
> - record-size: 1000 bytes
> - producer setting:  batch.size=40000,  linger.ms=40
>
> Throughput: 53K records/sec (51MBsec)
> ----------------------
> Test 2:
> - same as Test 1 but also added:  compression.type=lz4
>
> Throughput: 316K records/sec (300MB/sec)
> ----------------------
>
> So far so good!  And pretty impressive!  But problem starts when I increase
> the record-size to 48K bytes.   Of course, I realize the throughput will
> decrease as record size gets bigger.  But what I'm seeing appears to be
> beyond the normal expected behavior.   I appear to be hitting on some
> resource/blocking issues.   But I can't figure out what.   Here's the
> benchmark result.
>
>
> Test 3:
> - record-size: 48000 bytes
> - producer setting:  batch.size=40,000, linger.ms=40, compression.type=lz4
>
> Throughput:  1160 records/sec (53MB/sec)
>
> And I've systematically changed tried many different combinations of above
> parameters (i.e. increasing batch.size gradually from 40,000 to 4,000,000.
> linger.ms from 40 to 4000, etc and their combinations).   For every test,
>  the throughput would more or less remain the same at between 1160 to 1350
> records/sec.
>
> Now, if I run 2 producers in parallel, each producer will still manage to
> produce throughput of 1160 rec/sec.   Even with 3 producers in parallel,
>  same thing.
>
> So obviously, it's not the kafka cluster issue.    There appears to be
> something blocking on the producer side.   And it doesn't appear to be the
> cpu or the memory.     Could it be the kafka-producer-perf-tool script
> that's the problem?
>
> If anyone has a suggestion to help me investigate this further, I would
> very much appreciate it.   Thank you!
>
> Btw,  was the "--threads" option removed from the tool?   Why?
>
> regards,
> Sunny
>