You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by JIEFU GONG <jg...@berkeley.edu> on 2015/07/13 19:08:56 UTC

kafka benchmark tests

Hi all,

I was wondering if any of you guys have done benchmarks on Kafka
performance before, and if they or their details (# nodes in cluster, #
records / size(s) of messages, etc.) could be shared.

For comparison purposes, I am trying to benchmark Kafka against some
similar services such as Kinesis or Scribe. Additionally, I was wondering
if anyone could shed some insight on Jay Kreps' benchmarks that he has
openly published here:
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Specifically, I am unsure of why between his tests of 3x synchronous
replication and 3x async replication he changed the batch.size, as well as
why he is seemingly publishing to incorrect topics:

Configs:
https://gist.github.com/jkreps/c7ddb4041ef62a900e6c

Any help is greatly appreciated!



-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427

Re: kafka benchmark tests

Posted by Yuheng Du <yu...@gmail.com>.

Hi Geoffrey,

Thank you for your detailed explaining. They are really helpful.

I am thinking of going after the second way, since I have bare metal access
to all the nodes in the cluster, it's probably better to run real slave
machines instead of virtual machines. (correct me if I am wrong)

Each of my node has 256 G ram and 2T disk space, how large will the slave
machine virtual machine be and how much memory they will take?

Thank you!

best,
Yuheng


On Wed, Jul 15, 2015 at 4:19 PM, Geoffrey Anderson <ge...@confluent.io>
wrote:

> Hi Yuheng,
>
> Yes, you should be able to run on either mac or linux.
>
> The test cluster consists of a test-driver machine and some number of slave
> machines. Right now, there are roughly two ways to set up the slave
> machines:
>
> 1) Slave machines are virtual machines *on* the test-driver machine.
> 2) Slave machines are external to the test-driver machine.
>
> 1 is the simplest to set up, but yes it does require installation of the
> virtual machines on the test-driver machine.
>
> The installation of these machines is outlined in the quickstart I
> mentioned (here is a better link for the test README:
> https://github.com/confluentinc/kafka/tree/KAFKA-2276/tests).
>
> The tool we're using to bring up the slave virtual machines is called
> vagrant, so the "vagrant" steps in the quickstart are really telling you
> how to install the virtual machines.
>
> Hope that helps!
>
> Cheers,
> Geoff
>
>
>
>
> On Wed, Jul 15, 2015 at 12:13 PM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > Hi Geoffrey,
> >
> > Thank you for your helpful information. Do I have to install the virtual
> > machines? I am using Mac as the testdriver machine or I can use a linux
> > machine to run testdriver too.
> >
> > Thanks.
> >
> > best,
> > Yuheng
> >
> > On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson <ge...@confluent.io>
> > wrote:
> >
> > > Hi Yuheng,
> > >
> > > Running these tests requires a tool we've created at Confluent called
> > > 'ducktape', which you need to install with the command:
> > > pip install ducktape==0.2.0
> > >
> > > Running the tests locally requires some setup (creation of virtual
> > machines
> > > etc.) which is outlined here:
> > >
> > >
> >
> https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1
> > > The instructions in the quickstart show you how to run the tests on
> > cluster
> > > of virtual machines (on a single host)
> > >
> > > Once you have a cluster up and running, you'll be able to run the test
> > > you're interested in:
> > > cd kafka/tests
> > > ducktape kafkatest/tests/benchmark_test.py
> > >
> > > Definitely keep us posted about which parts are difficult, annoying, or
> > > confusing about this process and we'll do our best to help.
> > >
> > > Thanks,
> > > Geoff
> > >
> > >
> > >
> > > On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du <yu...@gmail.com>
> > > wrote:
> > >
> > > > Jiefu,
> > > >
> > > > Have you tried to run benchmark_test.py? I ran it and it asks me for
> > the
> > > > ducktape.services.service
> > > >
> > > > yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python
> > > benchmark_test.py
> > > >
> > > > Traceback (most recent call last):
> > > >
> > > >   File "benchmark_test.py", line 16, in <module>
> > > >
> > > >     from ducktape.services.service import Service
> > > >
> > > > ImportError: No module named ducktape.services.service
> > > >
> > > >
> > > > Can you help me on getting it to work, Ewen? Thanks.
> > > >
> > > >
> > > > best,
> > > >
> > > > Yuheng
> > > >
> > > > On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava <
> > > ewen@confluent.io
> > > > >
> > > > wrote:
> > > >
> > > > > @Jiefu, yes! The patch is functional, I think it's just waiting on
> a
> > > bit
> > > > of
> > > > > final review after the last round of changes. You can definitely
> use
> > it
> > > > for
> > > > > your own benchmarking, and we'd love to see patches for any
> > additional
> > > > > tests we missed in the first pass!
> > > > >
> > > > > -Ewen
> > > > >
> > > > > On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG <jg...@berkeley.edu>
> > > wrote:
> > > > >
> > > > > > Yuheng,
> > > > > > I would recommend looking here:
> > > > > > http://kafka.apache.org/documentation.html#brokerconfigs and
> > > scrolling
> > > > > > down
> > > > > > to get a better understanding of the default settings and what
> they
> > > > mean
> > > > > --
> > > > > > it'll tell you what different options for acks does.
> > > > > >
> > > > > > Ewen,
> > > > > > Thank you immensely for your thoughts, they shed a lot of insight
> > > into
> > > > > the
> > > > > > issue. Though it is understandable that your specific results
> need
> > to
> > > > be
> > > > > > verified, it seems that the KIP-25 patch is functional and I can
> > use
> > > it
> > > > > for
> > > > > > my own benchmarking purposes? Is that correct? Thanks again!
> > > > > >
> > > > > > On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <
> > yuheng.du.hust@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Also, I guess setting the target throughput to -1 means let it
> be
> > > as
> > > > > high
> > > > > > > as possible?
> > > > > > >
> > > > > > > On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <
> > > > yuheng.du.hust@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks. If I set the acks=1 in the producer config options in
> > > > > > > > bin/kafka-run-class.sh
> > > > > > org.apache.kafka.clients.tools.ProducerPerformance
> > > > > > > > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > > > > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > > > > batch.size=8196?
> > > > > > > >
> > > > > > > > Does that mean for each message generated at the producer,
> the
> > > > > producer
> > > > > > > > will wait until the broker sends the ack back, then send
> > another
> > > > > > message?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > Yuheng
> > > > > > > >
> > > > > > > > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <
> > > > > > kumar@nmsworks.co.in>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Yes, A list of  Kafka Server host/port pairs to use for
> > > > establishing
> > > > > > the
> > > > > > > >> initial connection to the Kafka cluster
> > > > > > > >>
> > > > > > > >>
> > https://kafka.apache.org/documentation.html#newproducerconfigs
> > > > > > > >>
> > > > > > > >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <
> > > > > yuheng.du.hust@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Does anyone know what is bootstrap.servers=
> > > > > > > >> > esv4-hcl198.grid.linkedin.com:9092 means in the following
> > > test
> > > > > > > command:
> > > > > > > >> >
> > > > > > > >> > bin/kafka-run-class.sh
> > > > > > > >> org.apache.kafka.clients.tools.ProducerPerformance
> > > > > > > >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > > > > >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > > > > >> batch.size=8196?
> > > > > > > >> >
> > > > > > > >> > what is bootstrap.servers? Is it the kafka server that I
> am
> > > > > running
> > > > > > a
> > > > > > > >> test
> > > > > > > >> > at?
> > > > > > > >> >
> > > > > > > >> > Thanks.
> > > > > > > >> >
> > > > > > > >> > Yuheng
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> > > > > > > >> ewen@confluent.io
> > > > > > > >> > >
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > I implemented (nearly) the same basic set of tests in
> the
> > > > system
> > > > > > > test
> > > > > > > >> > > framework we started at Confluent and that is going to
> > move
> > > > into
> > > > > > > >> Kafka --
> > > > > > > >> > > see the wip patch for KIP-25 here:
> > > > > > > >> > https://github.com/apache/kafka/pull/70
> > > > > > > >> > > In particular, that test is implemented in
> > > benchmark_test.py:
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > > > > > > >> > >
> > > > > > > >> > > Hopefully once that's merged people can reuse that
> > benchmark
> > > > > (and
> > > > > > > add
> > > > > > > >> to
> > > > > > > >> > > it!) so they can easily run the same benchmarks across
> > > > different
> > > > > > > >> > hardware.
> > > > > > > >> > > Here are some results from an older version of that test
> > on
> > > > > > > m3.2xlarge
> > > > > > > >> > > instances on EC2 using local ephemeral storage (I
> think...
> > > > it's
> > > > > > been
> > > > > > > >> > awhile
> > > > > > > >> > > since I ran these numbers and I didn't document
> > methodology
> > > > that
> > > > > > > >> > > carefully):
> > > > > > > >> > >
> > > > > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > > > > >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > > > > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > > > > >> > > INFO:_.KafkaBenchmark:Single producer, no replication:
> > > > > > 684097.470208
> > > > > > > >> > > rec/sec (65.240000 MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Single producer, async 3x
> > replication:
> > > > > > > >> > > 667494.359673 rec/sec (63.660000 MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x
> > replication:
> > > > > > > >> > > 116485.764275 rec/sec (11.110000 MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Three producers, async 3x
> > replication:
> > > > > > > >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Message size:
> > > > > > > >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec
> > (15.620000
> > > > > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec
> > (57.750000
> > > > > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec
> > (86.170000
> > > > > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec
> > (79.210000
> > > > > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec
> > (93.310000
> > > > > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Throughput over long run, data >
> > > memory:
> > > > > > > >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324
> rec/sec
> > > > > > > (65.300000
> > > > > > > >> > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000
> > rec/sec
> > > > > > > >> (56.830500
> > > > > > > >> > > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900
> > > rec/sec
> > > > > > > >> (267.830800
> > > > > > > >> > > MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > > > > > > >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec
> > > > > (59.600000
> > > > > > > >> MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec
> > > > > (59.600000
> > > > > > > >> MB/s)
> > > > > > > >> > > INFO:_.KafkaBenchmark:End-to-end latency: median
> 2.000000
> > > ms,
> > > > > 99%
> > > > > > > >> > > 4.000000 ms, 99.9% 19.000000 ms
> > > > > > > >> > >
> > > > > > > >> > > Don't trust these numbers for anything, the were a quick
> > > > one-off
> > > > > > > test.
> > > > > > > >> > I'm
> > > > > > > >> > > just pasting the output so you get some idea of what the
> > > > results
> > > > > > > might
> > > > > > > >> > look
> > > > > > > >> > > like. Once we merge the KIP-25 patch, Confluent will be
> > > > running
> > > > > > the
> > > > > > > >> tests
> > > > > > > >> > > regularly and results will be available publicly so
> we'll
> > be
> > > > > able
> > > > > > to
> > > > > > > >> keep
> > > > > > > >> > > better tabs on performance, albeit for only a specific
> > class
> > > > of
> > > > > > > >> hardware.
> > > > > > > >> > >
> > > > > > > >> > > For the batch.size question -- I'm not sure the results
> in
> > > the
> > > > > > blog
> > > > > > > >> post
> > > > > > > >> > > actually have different settings, it could be accidental
> > > > > > divergence
> > > > > > > >> > between
> > > > > > > >> > > the script and the blog post. The post specifically
> notes
> > > that
> > > > > > > tuning
> > > > > > > >> the
> > > > > > > >> > > batch size in the synchronous case might help, but that
> he
> > > > > didn't
> > > > > > do
> > > > > > > >> > that.
> > > > > > > >> > > If you're trying to benchmark the *optimal* throughput,
> > > tuning
> > > > > the
> > > > > > > >> batch
> > > > > > > >> > > size would make sense. Since synchronous replication
> will
> > > have
> > > > > > > higher
> > > > > > > >> > > latency and there's a limit to how many requests can be
> in
> > > > > flight
> > > > > > at
> > > > > > > >> > once,
> > > > > > > >> > > you'll want a larger batch size to compensate for the
> > > > additional
> > > > > > > >> latency.
> > > > > > > >> > > However, in practice the increase you see may be
> > negligible.
> > > > > > > Somebody
> > > > > > > >> who
> > > > > > > >> > > has spent more time fiddling with tweaking producer
> > > > performance
> > > > > > may
> > > > > > > >> have
> > > > > > > >> > > more insight.
> > > > > > > >> > >
> > > > > > > >> > > -Ewen
> > > > > > > >> > >
> > > > > > > >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <
> > > > > jgong@berkeley.edu>
> > > > > > > >> wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi all,
> > > > > > > >> > > >
> > > > > > > >> > > > I was wondering if any of you guys have done
> benchmarks
> > on
> > > > > Kafka
> > > > > > > >> > > > performance before, and if they or their details (#
> > nodes
> > > in
> > > > > > > >> cluster, #
> > > > > > > >> > > > records / size(s) of messages, etc.) could be shared.
> > > > > > > >> > > >
> > > > > > > >> > > > For comparison purposes, I am trying to benchmark
> Kafka
> > > > > against
> > > > > > > some
> > > > > > > >> > > > similar services such as Kinesis or Scribe.
> > Additionally,
> > > I
> > > > > was
> > > > > > > >> > wondering
> > > > > > > >> > > > if anyone could shed some insight on Jay Kreps'
> > benchmarks
> > > > > that
> > > > > > he
> > > > > > > >> has
> > > > > > > >> > > > openly published here:
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > > > > > >> > > >
> > > > > > > >> > > > Specifically, I am unsure of why between his tests of
> 3x
> > > > > > > synchronous
> > > > > > > >> > > > replication and 3x async replication he changed the
> > > > > batch.size,
> > > > > > as
> > > > > > > >> well
> > > > > > > >> > > as
> > > > > > > >> > > > why he is seemingly publishing to incorrect topics:
> > > > > > > >> > > >
> > > > > > > >> > > > Configs:
> > > > > > > >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > > > > > > >> > > >
> > > > > > > >> > > > Any help is greatly appreciated!
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > >
> > > > > > > >> > > > Jiefu Gong
> > > > > > > >> > > > University of California, Berkeley | Class of 2017
> > > > > > > >> > > > B.A Computer Science | College of Letters and Sciences
> > > > > > > >> > > >
> > > > > > > >> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925)
> > > 400-3427
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Ewen
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jiefu Gong
> > > > > > University of California, Berkeley | Class of 2017
> > > > > > B.A Computer Science | College of Letters and Sciences
> > > > > >
> > > > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Ewen
> > > > >
> > > >
> > >
> >
>

Re: kafka benchmark tests

Posted by Geoffrey Anderson <ge...@confluent.io>.

Hi Yuheng,

Yes, you should be able to run on either mac or linux.

The test cluster consists of a test-driver machine and some number of slave
machines. Right now, there are roughly two ways to set up the slave
machines:

1) Slave machines are virtual machines *on* the test-driver machine.
2) Slave machines are external to the test-driver machine.

1 is the simplest to set up, but yes it does require installation of the
virtual machines on the test-driver machine.

The installation of these machines is outlined in the quickstart I
mentioned (here is a better link for the test README:
https://github.com/confluentinc/kafka/tree/KAFKA-2276/tests).

The tool we're using to bring up the slave virtual machines is called
vagrant, so the "vagrant" steps in the quickstart are really telling you
how to install the virtual machines.

Hope that helps!

Cheers,
Geoff




On Wed, Jul 15, 2015 at 12:13 PM, Yuheng Du <yu...@gmail.com>
wrote:

> Hi Geoffrey,
>
> Thank you for your helpful information. Do I have to install the virtual
> machines? I am using Mac as the testdriver machine or I can use a linux
> machine to run testdriver too.
>
> Thanks.
>
> best,
> Yuheng
>
> On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson <ge...@confluent.io>
> wrote:
>
> > Hi Yuheng,
> >
> > Running these tests requires a tool we've created at Confluent called
> > 'ducktape', which you need to install with the command:
> > pip install ducktape==0.2.0
> >
> > Running the tests locally requires some setup (creation of virtual
> machines
> > etc.) which is outlined here:
> >
> >
> https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1
> > The instructions in the quickstart show you how to run the tests on
> cluster
> > of virtual machines (on a single host)
> >
> > Once you have a cluster up and running, you'll be able to run the test
> > you're interested in:
> > cd kafka/tests
> > ducktape kafkatest/tests/benchmark_test.py
> >
> > Definitely keep us posted about which parts are difficult, annoying, or
> > confusing about this process and we'll do our best to help.
> >
> > Thanks,
> > Geoff
> >
> >
> >
> > On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du <yu...@gmail.com>
> > wrote:
> >
> > > Jiefu,
> > >
> > > Have you tried to run benchmark_test.py? I ran it and it asks me for
> the
> > > ducktape.services.service
> > >
> > > yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python
> > benchmark_test.py
> > >
> > > Traceback (most recent call last):
> > >
> > >   File "benchmark_test.py", line 16, in <module>
> > >
> > >     from ducktape.services.service import Service
> > >
> > > ImportError: No module named ducktape.services.service
> > >
> > >
> > > Can you help me on getting it to work, Ewen? Thanks.
> > >
> > >
> > > best,
> > >
> > > Yuheng
> > >
> > > On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava <
> > ewen@confluent.io
> > > >
> > > wrote:
> > >
> > > > @Jiefu, yes! The patch is functional, I think it's just waiting on a
> > bit
> > > of
> > > > final review after the last round of changes. You can definitely use
> it
> > > for
> > > > your own benchmarking, and we'd love to see patches for any
> additional
> > > > tests we missed in the first pass!
> > > >
> > > > -Ewen
> > > >
> > > > On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG <jg...@berkeley.edu>
> > wrote:
> > > >
> > > > > Yuheng,
> > > > > I would recommend looking here:
> > > > > http://kafka.apache.org/documentation.html#brokerconfigs and
> > scrolling
> > > > > down
> > > > > to get a better understanding of the default settings and what they
> > > mean
> > > > --
> > > > > it'll tell you what different options for acks does.
> > > > >
> > > > > Ewen,
> > > > > Thank you immensely for your thoughts, they shed a lot of insight
> > into
> > > > the
> > > > > issue. Though it is understandable that your specific results need
> to
> > > be
> > > > > verified, it seems that the KIP-25 patch is functional and I can
> use
> > it
> > > > for
> > > > > my own benchmarking purposes? Is that correct? Thanks again!
> > > > >
> > > > > On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <
> yuheng.du.hust@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Also, I guess setting the target throughput to -1 means let it be
> > as
> > > > high
> > > > > > as possible?
> > > > > >
> > > > > > On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <
> > > yuheng.du.hust@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks. If I set the acks=1 in the producer config options in
> > > > > > > bin/kafka-run-class.sh
> > > > > org.apache.kafka.clients.tools.ProducerPerformance
> > > > > > > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > > > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > > > batch.size=8196?
> > > > > > >
> > > > > > > Does that mean for each message generated at the producer, the
> > > > producer
> > > > > > > will wait until the broker sends the ack back, then send
> another
> > > > > message?
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Yuheng
> > > > > > >
> > > > > > > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <
> > > > > kumar@nmsworks.co.in>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Yes, A list of  Kafka Server host/port pairs to use for
> > > establishing
> > > > > the
> > > > > > >> initial connection to the Kafka cluster
> > > > > > >>
> > > > > > >>
> https://kafka.apache.org/documentation.html#newproducerconfigs
> > > > > > >>
> > > > > > >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <
> > > > yuheng.du.hust@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Does anyone know what is bootstrap.servers=
> > > > > > >> > esv4-hcl198.grid.linkedin.com:9092 means in the following
> > test
> > > > > > command:
> > > > > > >> >
> > > > > > >> > bin/kafka-run-class.sh
> > > > > > >> org.apache.kafka.clients.tools.ProducerPerformance
> > > > > > >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > > > >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > > > >> batch.size=8196?
> > > > > > >> >
> > > > > > >> > what is bootstrap.servers? Is it the kafka server that I am
> > > > running
> > > > > a
> > > > > > >> test
> > > > > > >> > at?
> > > > > > >> >
> > > > > > >> > Thanks.
> > > > > > >> >
> > > > > > >> > Yuheng
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> > > > > > >> ewen@confluent.io
> > > > > > >> > >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > I implemented (nearly) the same basic set of tests in the
> > > system
> > > > > > test
> > > > > > >> > > framework we started at Confluent and that is going to
> move
> > > into
> > > > > > >> Kafka --
> > > > > > >> > > see the wip patch for KIP-25 here:
> > > > > > >> > https://github.com/apache/kafka/pull/70
> > > > > > >> > > In particular, that test is implemented in
> > benchmark_test.py:
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > > > > > >> > >
> > > > > > >> > > Hopefully once that's merged people can reuse that
> benchmark
> > > > (and
> > > > > > add
> > > > > > >> to
> > > > > > >> > > it!) so they can easily run the same benchmarks across
> > > different
> > > > > > >> > hardware.
> > > > > > >> > > Here are some results from an older version of that test
> on
> > > > > > m3.2xlarge
> > > > > > >> > > instances on EC2 using local ephemeral storage (I think...
> > > it's
> > > > > been
> > > > > > >> > awhile
> > > > > > >> > > since I ran these numbers and I didn't document
> methodology
> > > that
> > > > > > >> > > carefully):
> > > > > > >> > >
> > > > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > > > >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > > > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > > > >> > > INFO:_.KafkaBenchmark:Single producer, no replication:
> > > > > 684097.470208
> > > > > > >> > > rec/sec (65.240000 MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Single producer, async 3x
> replication:
> > > > > > >> > > 667494.359673 rec/sec (63.660000 MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x
> replication:
> > > > > > >> > > 116485.764275 rec/sec (11.110000 MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Three producers, async 3x
> replication:
> > > > > > >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Message size:
> > > > > > >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec
> (15.620000
> > > > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec
> (57.750000
> > > > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec
> (86.170000
> > > > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec
> (79.210000
> > > > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec
> (93.310000
> > > > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Throughput over long run, data >
> > memory:
> > > > > > >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec
> > > > > > (65.300000
> > > > > > >> > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000
> rec/sec
> > > > > > >> (56.830500
> > > > > > >> > > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900
> > rec/sec
> > > > > > >> (267.830800
> > > > > > >> > > MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > > > > > >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec
> > > > (59.600000
> > > > > > >> MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec
> > > > (59.600000
> > > > > > >> MB/s)
> > > > > > >> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000
> > ms,
> > > > 99%
> > > > > > >> > > 4.000000 ms, 99.9% 19.000000 ms
> > > > > > >> > >
> > > > > > >> > > Don't trust these numbers for anything, the were a quick
> > > one-off
> > > > > > test.
> > > > > > >> > I'm
> > > > > > >> > > just pasting the output so you get some idea of what the
> > > results
> > > > > > might
> > > > > > >> > look
> > > > > > >> > > like. Once we merge the KIP-25 patch, Confluent will be
> > > running
> > > > > the
> > > > > > >> tests
> > > > > > >> > > regularly and results will be available publicly so we'll
> be
> > > > able
> > > > > to
> > > > > > >> keep
> > > > > > >> > > better tabs on performance, albeit for only a specific
> class
> > > of
> > > > > > >> hardware.
> > > > > > >> > >
> > > > > > >> > > For the batch.size question -- I'm not sure the results in
> > the
> > > > > blog
> > > > > > >> post
> > > > > > >> > > actually have different settings, it could be accidental
> > > > > divergence
> > > > > > >> > between
> > > > > > >> > > the script and the blog post. The post specifically notes
> > that
> > > > > > tuning
> > > > > > >> the
> > > > > > >> > > batch size in the synchronous case might help, but that he
> > > > didn't
> > > > > do
> > > > > > >> > that.
> > > > > > >> > > If you're trying to benchmark the *optimal* throughput,
> > tuning
> > > > the
> > > > > > >> batch
> > > > > > >> > > size would make sense. Since synchronous replication will
> > have
> > > > > > higher
> > > > > > >> > > latency and there's a limit to how many requests can be in
> > > > flight
> > > > > at
> > > > > > >> > once,
> > > > > > >> > > you'll want a larger batch size to compensate for the
> > > additional
> > > > > > >> latency.
> > > > > > >> > > However, in practice the increase you see may be
> negligible.
> > > > > > Somebody
> > > > > > >> who
> > > > > > >> > > has spent more time fiddling with tweaking producer
> > > performance
> > > > > may
> > > > > > >> have
> > > > > > >> > > more insight.
> > > > > > >> > >
> > > > > > >> > > -Ewen
> > > > > > >> > >
> > > > > > >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <
> > > > jgong@berkeley.edu>
> > > > > > >> wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi all,
> > > > > > >> > > >
> > > > > > >> > > > I was wondering if any of you guys have done benchmarks
> on
> > > > Kafka
> > > > > > >> > > > performance before, and if they or their details (#
> nodes
> > in
> > > > > > >> cluster, #
> > > > > > >> > > > records / size(s) of messages, etc.) could be shared.
> > > > > > >> > > >
> > > > > > >> > > > For comparison purposes, I am trying to benchmark Kafka
> > > > against
> > > > > > some
> > > > > > >> > > > similar services such as Kinesis or Scribe.
> Additionally,
> > I
> > > > was
> > > > > > >> > wondering
> > > > > > >> > > > if anyone could shed some insight on Jay Kreps'
> benchmarks
> > > > that
> > > > > he
> > > > > > >> has
> > > > > > >> > > > openly published here:
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > > > > >> > > >
> > > > > > >> > > > Specifically, I am unsure of why between his tests of 3x
> > > > > > synchronous
> > > > > > >> > > > replication and 3x async replication he changed the
> > > > batch.size,
> > > > > as
> > > > > > >> well
> > > > > > >> > > as
> > > > > > >> > > > why he is seemingly publishing to incorrect topics:
> > > > > > >> > > >
> > > > > > >> > > > Configs:
> > > > > > >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > > > > > >> > > >
> > > > > > >> > > > Any help is greatly appreciated!
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > --
> > > > > > >> > > >
> > > > > > >> > > > Jiefu Gong
> > > > > > >> > > > University of California, Berkeley | Class of 2017
> > > > > > >> > > > B.A Computer Science | College of Letters and Sciences
> > > > > > >> > > >
> > > > > > >> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925)
> > 400-3427
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Thanks,
> > > > > > >> > > Ewen
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jiefu Gong
> > > > > University of California, Berkeley | Class of 2017
> > > > > B.A Computer Science | College of Letters and Sciences
> > > > >
> > > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Ewen
> > > >
> > >
> >
>

Re: kafka benchmark tests

Posted by Yuheng Du <yu...@gmail.com>.

Hi Geoffrey,

Thank you for your helpful information. Do I have to install the virtual
machines? I am using Mac as the testdriver machine or I can use a linux
machine to run testdriver too.

Thanks.

best,
Yuheng

On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson <ge...@confluent.io>
wrote:

> Hi Yuheng,
>
> Running these tests requires a tool we've created at Confluent called
> 'ducktape', which you need to install with the command:
> pip install ducktape==0.2.0
>
> Running the tests locally requires some setup (creation of virtual machines
> etc.) which is outlined here:
>
> https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1
> The instructions in the quickstart show you how to run the tests on cluster
> of virtual machines (on a single host)
>
> Once you have a cluster up and running, you'll be able to run the test
> you're interested in:
> cd kafka/tests
> ducktape kafkatest/tests/benchmark_test.py
>
> Definitely keep us posted about which parts are difficult, annoying, or
> confusing about this process and we'll do our best to help.
>
> Thanks,
> Geoff
>
>
>
> On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > Jiefu,
> >
> > Have you tried to run benchmark_test.py? I ran it and it asks me for the
> > ducktape.services.service
> >
> > yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python
> benchmark_test.py
> >
> > Traceback (most recent call last):
> >
> >   File "benchmark_test.py", line 16, in <module>
> >
> >     from ducktape.services.service import Service
> >
> > ImportError: No module named ducktape.services.service
> >
> >
> > Can you help me on getting it to work, Ewen? Thanks.
> >
> >
> > best,
> >
> > Yuheng
> >
> > On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava <
> ewen@confluent.io
> > >
> > wrote:
> >
> > > @Jiefu, yes! The patch is functional, I think it's just waiting on a
> bit
> > of
> > > final review after the last round of changes. You can definitely use it
> > for
> > > your own benchmarking, and we'd love to see patches for any additional
> > > tests we missed in the first pass!
> > >
> > > -Ewen
> > >
> > > On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG <jg...@berkeley.edu>
> wrote:
> > >
> > > > Yuheng,
> > > > I would recommend looking here:
> > > > http://kafka.apache.org/documentation.html#brokerconfigs and
> scrolling
> > > > down
> > > > to get a better understanding of the default settings and what they
> > mean
> > > --
> > > > it'll tell you what different options for acks does.
> > > >
> > > > Ewen,
> > > > Thank you immensely for your thoughts, they shed a lot of insight
> into
> > > the
> > > > issue. Though it is understandable that your specific results need to
> > be
> > > > verified, it seems that the KIP-25 patch is functional and I can use
> it
> > > for
> > > > my own benchmarking purposes? Is that correct? Thanks again!
> > > >
> > > > On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <yuheng.du.hust@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Also, I guess setting the target throughput to -1 means let it be
> as
> > > high
> > > > > as possible?
> > > > >
> > > > > On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <
> > yuheng.du.hust@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks. If I set the acks=1 in the producer config options in
> > > > > > bin/kafka-run-class.sh
> > > > org.apache.kafka.clients.tools.ProducerPerformance
> > > > > > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > > batch.size=8196?
> > > > > >
> > > > > > Does that mean for each message generated at the producer, the
> > > producer
> > > > > > will wait until the broker sends the ack back, then send another
> > > > message?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > Yuheng
> > > > > >
> > > > > > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <
> > > > kumar@nmsworks.co.in>
> > > > > > wrote:
> > > > > >
> > > > > >> Yes, A list of  Kafka Server host/port pairs to use for
> > establishing
> > > > the
> > > > > >> initial connection to the Kafka cluster
> > > > > >>
> > > > > >> https://kafka.apache.org/documentation.html#newproducerconfigs
> > > > > >>
> > > > > >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <
> > > yuheng.du.hust@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Does anyone know what is bootstrap.servers=
> > > > > >> > esv4-hcl198.grid.linkedin.com:9092 means in the following
> test
> > > > > command:
> > > > > >> >
> > > > > >> > bin/kafka-run-class.sh
> > > > > >> org.apache.kafka.clients.tools.ProducerPerformance
> > > > > >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > > >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > > >> batch.size=8196?
> > > > > >> >
> > > > > >> > what is bootstrap.servers? Is it the kafka server that I am
> > > running
> > > > a
> > > > > >> test
> > > > > >> > at?
> > > > > >> >
> > > > > >> > Thanks.
> > > > > >> >
> > > > > >> > Yuheng
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> > > > > >> ewen@confluent.io
> > > > > >> > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > I implemented (nearly) the same basic set of tests in the
> > system
> > > > > test
> > > > > >> > > framework we started at Confluent and that is going to move
> > into
> > > > > >> Kafka --
> > > > > >> > > see the wip patch for KIP-25 here:
> > > > > >> > https://github.com/apache/kafka/pull/70
> > > > > >> > > In particular, that test is implemented in
> benchmark_test.py:
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > > > > >> > >
> > > > > >> > > Hopefully once that's merged people can reuse that benchmark
> > > (and
> > > > > add
> > > > > >> to
> > > > > >> > > it!) so they can easily run the same benchmarks across
> > different
> > > > > >> > hardware.
> > > > > >> > > Here are some results from an older version of that test on
> > > > > m3.2xlarge
> > > > > >> > > instances on EC2 using local ephemeral storage (I think...
> > it's
> > > > been
> > > > > >> > awhile
> > > > > >> > > since I ran these numbers and I didn't document methodology
> > that
> > > > > >> > > carefully):
> > > > > >> > >
> > > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > > >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > > >> > > INFO:_.KafkaBenchmark:Single producer, no replication:
> > > > 684097.470208
> > > > > >> > > rec/sec (65.240000 MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> > > > > >> > > 667494.359673 rec/sec (63.660000 MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> > > > > >> > > 116485.764275 rec/sec (11.110000 MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> > > > > >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Message size:
> > > > > >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000
> > > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000
> > > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000
> > > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000
> > > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000
> > > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Throughput over long run, data >
> memory:
> > > > > >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec
> > > > > (65.300000
> > > > > >> > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec
> > > > > >> (56.830500
> > > > > >> > > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900
> rec/sec
> > > > > >> (267.830800
> > > > > >> > > MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > > > > >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec
> > > (59.600000
> > > > > >> MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec
> > > (59.600000
> > > > > >> MB/s)
> > > > > >> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000
> ms,
> > > 99%
> > > > > >> > > 4.000000 ms, 99.9% 19.000000 ms
> > > > > >> > >
> > > > > >> > > Don't trust these numbers for anything, the were a quick
> > one-off
> > > > > test.
> > > > > >> > I'm
> > > > > >> > > just pasting the output so you get some idea of what the
> > results
> > > > > might
> > > > > >> > look
> > > > > >> > > like. Once we merge the KIP-25 patch, Confluent will be
> > running
> > > > the
> > > > > >> tests
> > > > > >> > > regularly and results will be available publicly so we'll be
> > > able
> > > > to
> > > > > >> keep
> > > > > >> > > better tabs on performance, albeit for only a specific class
> > of
> > > > > >> hardware.
> > > > > >> > >
> > > > > >> > > For the batch.size question -- I'm not sure the results in
> the
> > > > blog
> > > > > >> post
> > > > > >> > > actually have different settings, it could be accidental
> > > > divergence
> > > > > >> > between
> > > > > >> > > the script and the blog post. The post specifically notes
> that
> > > > > tuning
> > > > > >> the
> > > > > >> > > batch size in the synchronous case might help, but that he
> > > didn't
> > > > do
> > > > > >> > that.
> > > > > >> > > If you're trying to benchmark the *optimal* throughput,
> tuning
> > > the
> > > > > >> batch
> > > > > >> > > size would make sense. Since synchronous replication will
> have
> > > > > higher
> > > > > >> > > latency and there's a limit to how many requests can be in
> > > flight
> > > > at
> > > > > >> > once,
> > > > > >> > > you'll want a larger batch size to compensate for the
> > additional
> > > > > >> latency.
> > > > > >> > > However, in practice the increase you see may be negligible.
> > > > > Somebody
> > > > > >> who
> > > > > >> > > has spent more time fiddling with tweaking producer
> > performance
> > > > may
> > > > > >> have
> > > > > >> > > more insight.
> > > > > >> > >
> > > > > >> > > -Ewen
> > > > > >> > >
> > > > > >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <
> > > jgong@berkeley.edu>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > Hi all,
> > > > > >> > > >
> > > > > >> > > > I was wondering if any of you guys have done benchmarks on
> > > Kafka
> > > > > >> > > > performance before, and if they or their details (# nodes
> in
> > > > > >> cluster, #
> > > > > >> > > > records / size(s) of messages, etc.) could be shared.
> > > > > >> > > >
> > > > > >> > > > For comparison purposes, I am trying to benchmark Kafka
> > > against
> > > > > some
> > > > > >> > > > similar services such as Kinesis or Scribe. Additionally,
> I
> > > was
> > > > > >> > wondering
> > > > > >> > > > if anyone could shed some insight on Jay Kreps' benchmarks
> > > that
> > > > he
> > > > > >> has
> > > > > >> > > > openly published here:
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > > > >> > > >
> > > > > >> > > > Specifically, I am unsure of why between his tests of 3x
> > > > > synchronous
> > > > > >> > > > replication and 3x async replication he changed the
> > > batch.size,
> > > > as
> > > > > >> well
> > > > > >> > > as
> > > > > >> > > > why he is seemingly publishing to incorrect topics:
> > > > > >> > > >
> > > > > >> > > > Configs:
> > > > > >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > > > > >> > > >
> > > > > >> > > > Any help is greatly appreciated!
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > >
> > > > > >> > > > Jiefu Gong
> > > > > >> > > > University of California, Berkeley | Class of 2017
> > > > > >> > > > B.A Computer Science | College of Letters and Sciences
> > > > > >> > > >
> > > > > >> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925)
> 400-3427
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > Thanks,
> > > > > >> > > Ewen
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jiefu Gong
> > > > University of California, Berkeley | Class of 2017
> > > > B.A Computer Science | College of Letters and Sciences
> > > >
> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> > >
> >
>

Re: kafka benchmark tests

Posted by Geoffrey Anderson <ge...@confluent.io>.

Hi Yuheng,

Running these tests requires a tool we've created at Confluent called
'ducktape', which you need to install with the command:
pip install ducktape==0.2.0

Running the tests locally requires some setup (creation of virtual machines
etc.) which is outlined here:
https://github.com/apache/kafka/pull/70/files#diff-62f0ff60ede3b78b9c95624e2f61d6c1
The instructions in the quickstart show you how to run the tests on cluster
of virtual machines (on a single host)

Once you have a cluster up and running, you'll be able to run the test
you're interested in:
cd kafka/tests
ducktape kafkatest/tests/benchmark_test.py

Definitely keep us posted about which parts are difficult, annoying, or
confusing about this process and we'll do our best to help.

Thanks,
Geoff



On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du <yu...@gmail.com>
wrote:

> Jiefu,
>
> Have you tried to run benchmark_test.py? I ran it and it asks me for the
> ducktape.services.service
>
> yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python benchmark_test.py
>
> Traceback (most recent call last):
>
>   File "benchmark_test.py", line 16, in <module>
>
>     from ducktape.services.service import Service
>
> ImportError: No module named ducktape.services.service
>
>
> Can you help me on getting it to work, Ewen? Thanks.
>
>
> best,
>
> Yuheng
>
> On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava <ewen@confluent.io
> >
> wrote:
>
> > @Jiefu, yes! The patch is functional, I think it's just waiting on a bit
> of
> > final review after the last round of changes. You can definitely use it
> for
> > your own benchmarking, and we'd love to see patches for any additional
> > tests we missed in the first pass!
> >
> > -Ewen
> >
> > On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG <jg...@berkeley.edu> wrote:
> >
> > > Yuheng,
> > > I would recommend looking here:
> > > http://kafka.apache.org/documentation.html#brokerconfigs and scrolling
> > > down
> > > to get a better understanding of the default settings and what they
> mean
> > --
> > > it'll tell you what different options for acks does.
> > >
> > > Ewen,
> > > Thank you immensely for your thoughts, they shed a lot of insight into
> > the
> > > issue. Though it is understandable that your specific results need to
> be
> > > verified, it seems that the KIP-25 patch is functional and I can use it
> > for
> > > my own benchmarking purposes? Is that correct? Thanks again!
> > >
> > > On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <yu...@gmail.com>
> > > wrote:
> > >
> > > > Also, I guess setting the target throughput to -1 means let it be as
> > high
> > > > as possible?
> > > >
> > > > On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <
> yuheng.du.hust@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks. If I set the acks=1 in the producer config options in
> > > > > bin/kafka-run-class.sh
> > > org.apache.kafka.clients.tools.ProducerPerformance
> > > > > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > batch.size=8196?
> > > > >
> > > > > Does that mean for each message generated at the producer, the
> > producer
> > > > > will wait until the broker sends the ack back, then send another
> > > message?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Yuheng
> > > > >
> > > > > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <
> > > kumar@nmsworks.co.in>
> > > > > wrote:
> > > > >
> > > > >> Yes, A list of  Kafka Server host/port pairs to use for
> establishing
> > > the
> > > > >> initial connection to the Kafka cluster
> > > > >>
> > > > >> https://kafka.apache.org/documentation.html#newproducerconfigs
> > > > >>
> > > > >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <
> > yuheng.du.hust@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Does anyone know what is bootstrap.servers=
> > > > >> > esv4-hcl198.grid.linkedin.com:9092 means in the following test
> > > > command:
> > > > >> >
> > > > >> > bin/kafka-run-class.sh
> > > > >> org.apache.kafka.clients.tools.ProducerPerformance
> > > > >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > > >> batch.size=8196?
> > > > >> >
> > > > >> > what is bootstrap.servers? Is it the kafka server that I am
> > running
> > > a
> > > > >> test
> > > > >> > at?
> > > > >> >
> > > > >> > Thanks.
> > > > >> >
> > > > >> > Yuheng
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> > > > >> ewen@confluent.io
> > > > >> > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > I implemented (nearly) the same basic set of tests in the
> system
> > > > test
> > > > >> > > framework we started at Confluent and that is going to move
> into
> > > > >> Kafka --
> > > > >> > > see the wip patch for KIP-25 here:
> > > > >> > https://github.com/apache/kafka/pull/70
> > > > >> > > In particular, that test is implemented in benchmark_test.py:
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > > > >> > >
> > > > >> > > Hopefully once that's merged people can reuse that benchmark
> > (and
> > > > add
> > > > >> to
> > > > >> > > it!) so they can easily run the same benchmarks across
> different
> > > > >> > hardware.
> > > > >> > > Here are some results from an older version of that test on
> > > > m3.2xlarge
> > > > >> > > instances on EC2 using local ephemeral storage (I think...
> it's
> > > been
> > > > >> > awhile
> > > > >> > > since I ran these numbers and I didn't document methodology
> that
> > > > >> > > carefully):
> > > > >> > >
> > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > > > >> > > INFO:_.KafkaBenchmark:=================
> > > > >> > > INFO:_.KafkaBenchmark:Single producer, no replication:
> > > 684097.470208
> > > > >> > > rec/sec (65.240000 MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> > > > >> > > 667494.359673 rec/sec (63.660000 MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> > > > >> > > 116485.764275 rec/sec (11.110000 MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> > > > >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Message size:
> > > > >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000
> > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000
> > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000
> > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000
> > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000
> > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> > > > >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec
> > > > (65.300000
> > > > >> > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec
> > > > >> (56.830500
> > > > >> > > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
> > > > >> (267.830800
> > > > >> > > MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > > > >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec
> > (59.600000
> > > > >> MB/s)
> > > > >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec
> > (59.600000
> > > > >> MB/s)
> > > > >> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms,
> > 99%
> > > > >> > > 4.000000 ms, 99.9% 19.000000 ms
> > > > >> > >
> > > > >> > > Don't trust these numbers for anything, the were a quick
> one-off
> > > > test.
> > > > >> > I'm
> > > > >> > > just pasting the output so you get some idea of what the
> results
> > > > might
> > > > >> > look
> > > > >> > > like. Once we merge the KIP-25 patch, Confluent will be
> running
> > > the
> > > > >> tests
> > > > >> > > regularly and results will be available publicly so we'll be
> > able
> > > to
> > > > >> keep
> > > > >> > > better tabs on performance, albeit for only a specific class
> of
> > > > >> hardware.
> > > > >> > >
> > > > >> > > For the batch.size question -- I'm not sure the results in the
> > > blog
> > > > >> post
> > > > >> > > actually have different settings, it could be accidental
> > > divergence
> > > > >> > between
> > > > >> > > the script and the blog post. The post specifically notes that
> > > > tuning
> > > > >> the
> > > > >> > > batch size in the synchronous case might help, but that he
> > didn't
> > > do
> > > > >> > that.
> > > > >> > > If you're trying to benchmark the *optimal* throughput, tuning
> > the
> > > > >> batch
> > > > >> > > size would make sense. Since synchronous replication will have
> > > > higher
> > > > >> > > latency and there's a limit to how many requests can be in
> > flight
> > > at
> > > > >> > once,
> > > > >> > > you'll want a larger batch size to compensate for the
> additional
> > > > >> latency.
> > > > >> > > However, in practice the increase you see may be negligible.
> > > > Somebody
> > > > >> who
> > > > >> > > has spent more time fiddling with tweaking producer
> performance
> > > may
> > > > >> have
> > > > >> > > more insight.
> > > > >> > >
> > > > >> > > -Ewen
> > > > >> > >
> > > > >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <
> > jgong@berkeley.edu>
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > > I was wondering if any of you guys have done benchmarks on
> > Kafka
> > > > >> > > > performance before, and if they or their details (# nodes in
> > > > >> cluster, #
> > > > >> > > > records / size(s) of messages, etc.) could be shared.
> > > > >> > > >
> > > > >> > > > For comparison purposes, I am trying to benchmark Kafka
> > against
> > > > some
> > > > >> > > > similar services such as Kinesis or Scribe. Additionally, I
> > was
> > > > >> > wondering
> > > > >> > > > if anyone could shed some insight on Jay Kreps' benchmarks
> > that
> > > he
> > > > >> has
> > > > >> > > > openly published here:
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > > >> > > >
> > > > >> > > > Specifically, I am unsure of why between his tests of 3x
> > > > synchronous
> > > > >> > > > replication and 3x async replication he changed the
> > batch.size,
> > > as
> > > > >> well
> > > > >> > > as
> > > > >> > > > why he is seemingly publishing to incorrect topics:
> > > > >> > > >
> > > > >> > > > Configs:
> > > > >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > > > >> > > >
> > > > >> > > > Any help is greatly appreciated!
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > Jiefu Gong
> > > > >> > > > University of California, Berkeley | Class of 2017
> > > > >> > > > B.A Computer Science | College of Letters and Sciences
> > > > >> > > >
> > > > >> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Thanks,
> > > > >> > > Ewen
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Jiefu Gong
> > > University of California, Berkeley | Class of 2017
> > > B.A Computer Science | College of Letters and Sciences
> > >
> > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>

Re: kafka benchmark tests

Posted by Yuheng Du <yu...@gmail.com>.

Jiefu,

Have you tried to run benchmark_test.py? I ran it and it asks me for the
ducktape.services.service

yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python benchmark_test.py

Traceback (most recent call last):

  File "benchmark_test.py", line 16, in <module>

    from ducktape.services.service import Service

ImportError: No module named ducktape.services.service


Can you help me on getting it to work, Ewen? Thanks.


best,

Yuheng

On Tue, Jul 14, 2015 at 11:28 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> @Jiefu, yes! The patch is functional, I think it's just waiting on a bit of
> final review after the last round of changes. You can definitely use it for
> your own benchmarking, and we'd love to see patches for any additional
> tests we missed in the first pass!
>
> -Ewen
>
> On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG <jg...@berkeley.edu> wrote:
>
> > Yuheng,
> > I would recommend looking here:
> > http://kafka.apache.org/documentation.html#brokerconfigs and scrolling
> > down
> > to get a better understanding of the default settings and what they mean
> --
> > it'll tell you what different options for acks does.
> >
> > Ewen,
> > Thank you immensely for your thoughts, they shed a lot of insight into
> the
> > issue. Though it is understandable that your specific results need to be
> > verified, it seems that the KIP-25 patch is functional and I can use it
> for
> > my own benchmarking purposes? Is that correct? Thanks again!
> >
> > On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <yu...@gmail.com>
> > wrote:
> >
> > > Also, I guess setting the target throughput to -1 means let it be as
> high
> > > as possible?
> > >
> > > On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <yu...@gmail.com>
> > > wrote:
> > >
> > > > Thanks. If I set the acks=1 in the producer config options in
> > > > bin/kafka-run-class.sh
> > org.apache.kafka.clients.tools.ProducerPerformance
> > > > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > batch.size=8196?
> > > >
> > > > Does that mean for each message generated at the producer, the
> producer
> > > > will wait until the broker sends the ack back, then send another
> > message?
> > > >
> > > > Thanks.
> > > >
> > > > Yuheng
> > > >
> > > > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <
> > kumar@nmsworks.co.in>
> > > > wrote:
> > > >
> > > >> Yes, A list of  Kafka Server host/port pairs to use for establishing
> > the
> > > >> initial connection to the Kafka cluster
> > > >>
> > > >> https://kafka.apache.org/documentation.html#newproducerconfigs
> > > >>
> > > >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <
> yuheng.du.hust@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Does anyone know what is bootstrap.servers=
> > > >> > esv4-hcl198.grid.linkedin.com:9092 means in the following test
> > > command:
> > > >> >
> > > >> > bin/kafka-run-class.sh
> > > >> org.apache.kafka.clients.tools.ProducerPerformance
> > > >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > > >> batch.size=8196?
> > > >> >
> > > >> > what is bootstrap.servers? Is it the kafka server that I am
> running
> > a
> > > >> test
> > > >> > at?
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > Yuheng
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> > > >> ewen@confluent.io
> > > >> > >
> > > >> > wrote:
> > > >> >
> > > >> > > I implemented (nearly) the same basic set of tests in the system
> > > test
> > > >> > > framework we started at Confluent and that is going to move into
> > > >> Kafka --
> > > >> > > see the wip patch for KIP-25 here:
> > > >> > https://github.com/apache/kafka/pull/70
> > > >> > > In particular, that test is implemented in benchmark_test.py:
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > > >> > >
> > > >> > > Hopefully once that's merged people can reuse that benchmark
> (and
> > > add
> > > >> to
> > > >> > > it!) so they can easily run the same benchmarks across different
> > > >> > hardware.
> > > >> > > Here are some results from an older version of that test on
> > > m3.2xlarge
> > > >> > > instances on EC2 using local ephemeral storage (I think... it's
> > been
> > > >> > awhile
> > > >> > > since I ran these numbers and I didn't document methodology that
> > > >> > > carefully):
> > > >> > >
> > > >> > > INFO:_.KafkaBenchmark:=================
> > > >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > > >> > > INFO:_.KafkaBenchmark:=================
> > > >> > > INFO:_.KafkaBenchmark:Single producer, no replication:
> > 684097.470208
> > > >> > > rec/sec (65.240000 MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> > > >> > > 667494.359673 rec/sec (63.660000 MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> > > >> > > 116485.764275 rec/sec (11.110000 MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> > > >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Message size:
> > > >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000
> MB/s)
> > > >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000
> MB/s)
> > > >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000
> MB/s)
> > > >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000
> MB/s)
> > > >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000
> MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> > > >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec
> > > (65.300000
> > > >> > MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec
> > > >> (56.830500
> > > >> > > MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
> > > >> (267.830800
> > > >> > > MB/s)
> > > >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > > >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec
> (59.600000
> > > >> MB/s)
> > > >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec
> (59.600000
> > > >> MB/s)
> > > >> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms,
> 99%
> > > >> > > 4.000000 ms, 99.9% 19.000000 ms
> > > >> > >
> > > >> > > Don't trust these numbers for anything, the were a quick one-off
> > > test.
> > > >> > I'm
> > > >> > > just pasting the output so you get some idea of what the results
> > > might
> > > >> > look
> > > >> > > like. Once we merge the KIP-25 patch, Confluent will be running
> > the
> > > >> tests
> > > >> > > regularly and results will be available publicly so we'll be
> able
> > to
> > > >> keep
> > > >> > > better tabs on performance, albeit for only a specific class of
> > > >> hardware.
> > > >> > >
> > > >> > > For the batch.size question -- I'm not sure the results in the
> > blog
> > > >> post
> > > >> > > actually have different settings, it could be accidental
> > divergence
> > > >> > between
> > > >> > > the script and the blog post. The post specifically notes that
> > > tuning
> > > >> the
> > > >> > > batch size in the synchronous case might help, but that he
> didn't
> > do
> > > >> > that.
> > > >> > > If you're trying to benchmark the *optimal* throughput, tuning
> the
> > > >> batch
> > > >> > > size would make sense. Since synchronous replication will have
> > > higher
> > > >> > > latency and there's a limit to how many requests can be in
> flight
> > at
> > > >> > once,
> > > >> > > you'll want a larger batch size to compensate for the additional
> > > >> latency.
> > > >> > > However, in practice the increase you see may be negligible.
> > > Somebody
> > > >> who
> > > >> > > has spent more time fiddling with tweaking producer performance
> > may
> > > >> have
> > > >> > > more insight.
> > > >> > >
> > > >> > > -Ewen
> > > >> > >
> > > >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <
> jgong@berkeley.edu>
> > > >> wrote:
> > > >> > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > > I was wondering if any of you guys have done benchmarks on
> Kafka
> > > >> > > > performance before, and if they or their details (# nodes in
> > > >> cluster, #
> > > >> > > > records / size(s) of messages, etc.) could be shared.
> > > >> > > >
> > > >> > > > For comparison purposes, I am trying to benchmark Kafka
> against
> > > some
> > > >> > > > similar services such as Kinesis or Scribe. Additionally, I
> was
> > > >> > wondering
> > > >> > > > if anyone could shed some insight on Jay Kreps' benchmarks
> that
> > he
> > > >> has
> > > >> > > > openly published here:
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > >> > > >
> > > >> > > > Specifically, I am unsure of why between his tests of 3x
> > > synchronous
> > > >> > > > replication and 3x async replication he changed the
> batch.size,
> > as
> > > >> well
> > > >> > > as
> > > >> > > > why he is seemingly publishing to incorrect topics:
> > > >> > > >
> > > >> > > > Configs:
> > > >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > > >> > > >
> > > >> > > > Any help is greatly appreciated!
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > >
> > > >> > > > Jiefu Gong
> > > >> > > > University of California, Berkeley | Class of 2017
> > > >> > > > B.A Computer Science | College of Letters and Sciences
> > > >> > > >
> > > >> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Thanks,
> > > >> > > Ewen
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Jiefu Gong
> > University of California, Berkeley | Class of 2017
> > B.A Computer Science | College of Letters and Sciences
> >
> > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: kafka benchmark tests

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

@Jiefu, yes! The patch is functional, I think it's just waiting on a bit of
final review after the last round of changes. You can definitely use it for
your own benchmarking, and we'd love to see patches for any additional
tests we missed in the first pass!

-Ewen

On Tue, Jul 14, 2015 at 10:53 AM, JIEFU GONG <jg...@berkeley.edu> wrote:

> Yuheng,
> I would recommend looking here:
> http://kafka.apache.org/documentation.html#brokerconfigs and scrolling
> down
> to get a better understanding of the default settings and what they mean --
> it'll tell you what different options for acks does.
>
> Ewen,
> Thank you immensely for your thoughts, they shed a lot of insight into the
> issue. Though it is understandable that your specific results need to be
> verified, it seems that the KIP-25 patch is functional and I can use it for
> my own benchmarking purposes? Is that correct? Thanks again!
>
> On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > Also, I guess setting the target throughput to -1 means let it be as high
> > as possible?
> >
> > On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <yu...@gmail.com>
> > wrote:
> >
> > > Thanks. If I set the acks=1 in the producer config options in
> > > bin/kafka-run-class.sh
> org.apache.kafka.clients.tools.ProducerPerformance
> > > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > batch.size=8196?
> > >
> > > Does that mean for each message generated at the producer, the producer
> > > will wait until the broker sends the ack back, then send another
> message?
> > >
> > > Thanks.
> > >
> > > Yuheng
> > >
> > > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <
> kumar@nmsworks.co.in>
> > > wrote:
> > >
> > >> Yes, A list of  Kafka Server host/port pairs to use for establishing
> the
> > >> initial connection to the Kafka cluster
> > >>
> > >> https://kafka.apache.org/documentation.html#newproducerconfigs
> > >>
> > >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <yu...@gmail.com>
> > >> wrote:
> > >>
> > >> > Does anyone know what is bootstrap.servers=
> > >> > esv4-hcl198.grid.linkedin.com:9092 means in the following test
> > command:
> > >> >
> > >> > bin/kafka-run-class.sh
> > >> org.apache.kafka.clients.tools.ProducerPerformance
> > >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> > >> batch.size=8196?
> > >> >
> > >> > what is bootstrap.servers? Is it the kafka server that I am running
> a
> > >> test
> > >> > at?
> > >> >
> > >> > Thanks.
> > >> >
> > >> > Yuheng
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> > >> ewen@confluent.io
> > >> > >
> > >> > wrote:
> > >> >
> > >> > > I implemented (nearly) the same basic set of tests in the system
> > test
> > >> > > framework we started at Confluent and that is going to move into
> > >> Kafka --
> > >> > > see the wip patch for KIP-25 here:
> > >> > https://github.com/apache/kafka/pull/70
> > >> > > In particular, that test is implemented in benchmark_test.py:
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > >> > >
> > >> > > Hopefully once that's merged people can reuse that benchmark (and
> > add
> > >> to
> > >> > > it!) so they can easily run the same benchmarks across different
> > >> > hardware.
> > >> > > Here are some results from an older version of that test on
> > m3.2xlarge
> > >> > > instances on EC2 using local ephemeral storage (I think... it's
> been
> > >> > awhile
> > >> > > since I ran these numbers and I didn't document methodology that
> > >> > > carefully):
> > >> > >
> > >> > > INFO:_.KafkaBenchmark:=================
> > >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > >> > > INFO:_.KafkaBenchmark:=================
> > >> > > INFO:_.KafkaBenchmark:Single producer, no replication:
> 684097.470208
> > >> > > rec/sec (65.240000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> > >> > > 667494.359673 rec/sec (63.660000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> > >> > > 116485.764275 rec/sec (11.110000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> > >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Message size:
> > >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
> > >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
> > >> > > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> > >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec
> > (65.300000
> > >> > MB/s)
> > >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec
> > >> (56.830500
> > >> > > MB/s)
> > >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
> > >> (267.830800
> > >> > > MB/s)
> > >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000
> > >> MB/s)
> > >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000
> > >> MB/s)
> > >> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
> > >> > > 4.000000 ms, 99.9% 19.000000 ms
> > >> > >
> > >> > > Don't trust these numbers for anything, the were a quick one-off
> > test.
> > >> > I'm
> > >> > > just pasting the output so you get some idea of what the results
> > might
> > >> > look
> > >> > > like. Once we merge the KIP-25 patch, Confluent will be running
> the
> > >> tests
> > >> > > regularly and results will be available publicly so we'll be able
> to
> > >> keep
> > >> > > better tabs on performance, albeit for only a specific class of
> > >> hardware.
> > >> > >
> > >> > > For the batch.size question -- I'm not sure the results in the
> blog
> > >> post
> > >> > > actually have different settings, it could be accidental
> divergence
> > >> > between
> > >> > > the script and the blog post. The post specifically notes that
> > tuning
> > >> the
> > >> > > batch size in the synchronous case might help, but that he didn't
> do
> > >> > that.
> > >> > > If you're trying to benchmark the *optimal* throughput, tuning the
> > >> batch
> > >> > > size would make sense. Since synchronous replication will have
> > higher
> > >> > > latency and there's a limit to how many requests can be in flight
> at
> > >> > once,
> > >> > > you'll want a larger batch size to compensate for the additional
> > >> latency.
> > >> > > However, in practice the increase you see may be negligible.
> > Somebody
> > >> who
> > >> > > has spent more time fiddling with tweaking producer performance
> may
> > >> have
> > >> > > more insight.
> > >> > >
> > >> > > -Ewen
> > >> > >
> > >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu>
> > >> wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > I was wondering if any of you guys have done benchmarks on Kafka
> > >> > > > performance before, and if they or their details (# nodes in
> > >> cluster, #
> > >> > > > records / size(s) of messages, etc.) could be shared.
> > >> > > >
> > >> > > > For comparison purposes, I am trying to benchmark Kafka against
> > some
> > >> > > > similar services such as Kinesis or Scribe. Additionally, I was
> > >> > wondering
> > >> > > > if anyone could shed some insight on Jay Kreps' benchmarks that
> he
> > >> has
> > >> > > > openly published here:
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > >> > > >
> > >> > > > Specifically, I am unsure of why between his tests of 3x
> > synchronous
> > >> > > > replication and 3x async replication he changed the batch.size,
> as
> > >> well
> > >> > > as
> > >> > > > why he is seemingly publishing to incorrect topics:
> > >> > > >
> > >> > > > Configs:
> > >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > >> > > >
> > >> > > > Any help is greatly appreciated!
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > Jiefu Gong
> > >> > > > University of California, Berkeley | Class of 2017
> > >> > > > B.A Computer Science | College of Letters and Sciences
> > >> > > >
> > >> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Thanks,
> > >> > > Ewen
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>
>
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
>



-- 
Thanks,
Ewen

Re: kafka benchmark tests

Posted by JIEFU GONG <jg...@berkeley.edu>.

Yuheng,
I would recommend looking here:
http://kafka.apache.org/documentation.html#brokerconfigs and scrolling down
to get a better understanding of the default settings and what they mean --
it'll tell you what different options for acks does.

Ewen,
Thank you immensely for your thoughts, they shed a lot of insight into the
issue. Though it is understandable that your specific results need to be
verified, it seems that the KIP-25 patch is functional and I can use it for
my own benchmarking purposes? Is that correct? Thanks again!

On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du <yu...@gmail.com> wrote:

> Also, I guess setting the target throughput to -1 means let it be as high
> as possible?
>
> On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > Thanks. If I set the acks=1 in the producer config options in
> > bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> batch.size=8196?
> >
> > Does that mean for each message generated at the producer, the producer
> > will wait until the broker sends the ack back, then send another message?
> >
> > Thanks.
> >
> > Yuheng
> >
> > On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <ku...@nmsworks.co.in>
> > wrote:
> >
> >> Yes, A list of  Kafka Server host/port pairs to use for establishing the
> >> initial connection to the Kafka cluster
> >>
> >> https://kafka.apache.org/documentation.html#newproducerconfigs
> >>
> >> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <yu...@gmail.com>
> >> wrote:
> >>
> >> > Does anyone know what is bootstrap.servers=
> >> > esv4-hcl198.grid.linkedin.com:9092 means in the following test
> command:
> >> >
> >> > bin/kafka-run-class.sh
> >> org.apache.kafka.clients.tools.ProducerPerformance
> >> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> >> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> >> batch.size=8196?
> >> >
> >> > what is bootstrap.servers? Is it the kafka server that I am running a
> >> test
> >> > at?
> >> >
> >> > Thanks.
> >> >
> >> > Yuheng
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> >> ewen@confluent.io
> >> > >
> >> > wrote:
> >> >
> >> > > I implemented (nearly) the same basic set of tests in the system
> test
> >> > > framework we started at Confluent and that is going to move into
> >> Kafka --
> >> > > see the wip patch for KIP-25 here:
> >> > https://github.com/apache/kafka/pull/70
> >> > > In particular, that test is implemented in benchmark_test.py:
> >> > >
> >> > >
> >> >
> >>
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> >> > >
> >> > > Hopefully once that's merged people can reuse that benchmark (and
> add
> >> to
> >> > > it!) so they can easily run the same benchmarks across different
> >> > hardware.
> >> > > Here are some results from an older version of that test on
> m3.2xlarge
> >> > > instances on EC2 using local ephemeral storage (I think... it's been
> >> > awhile
> >> > > since I ran these numbers and I didn't document methodology that
> >> > > carefully):
> >> > >
> >> > > INFO:_.KafkaBenchmark:=================
> >> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> >> > > INFO:_.KafkaBenchmark:=================
> >> > > INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
> >> > > rec/sec (65.240000 MB/s)
> >> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> >> > > 667494.359673 rec/sec (63.660000 MB/s)
> >> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> >> > > 116485.764275 rec/sec (11.110000 MB/s)
> >> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> >> > > 1696519.022182 rec/sec (161.790000 MB/s)
> >> > > INFO:_.KafkaBenchmark:Message size:
> >> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
> >> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
> >> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
> >> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
> >> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
> >> > > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> >> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec
> (65.300000
> >> > MB/s)
> >> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec
> >> (56.830500
> >> > > MB/s)
> >> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
> >> (267.830800
> >> > > MB/s)
> >> > > INFO:_.KafkaBenchmark:Producer + consumer:
> >> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000
> >> MB/s)
> >> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000
> >> MB/s)
> >> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
> >> > > 4.000000 ms, 99.9% 19.000000 ms
> >> > >
> >> > > Don't trust these numbers for anything, the were a quick one-off
> test.
> >> > I'm
> >> > > just pasting the output so you get some idea of what the results
> might
> >> > look
> >> > > like. Once we merge the KIP-25 patch, Confluent will be running the
> >> tests
> >> > > regularly and results will be available publicly so we'll be able to
> >> keep
> >> > > better tabs on performance, albeit for only a specific class of
> >> hardware.
> >> > >
> >> > > For the batch.size question -- I'm not sure the results in the blog
> >> post
> >> > > actually have different settings, it could be accidental divergence
> >> > between
> >> > > the script and the blog post. The post specifically notes that
> tuning
> >> the
> >> > > batch size in the synchronous case might help, but that he didn't do
> >> > that.
> >> > > If you're trying to benchmark the *optimal* throughput, tuning the
> >> batch
> >> > > size would make sense. Since synchronous replication will have
> higher
> >> > > latency and there's a limit to how many requests can be in flight at
> >> > once,
> >> > > you'll want a larger batch size to compensate for the additional
> >> latency.
> >> > > However, in practice the increase you see may be negligible.
> Somebody
> >> who
> >> > > has spent more time fiddling with tweaking producer performance may
> >> have
> >> > > more insight.
> >> > >
> >> > > -Ewen
> >> > >
> >> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu>
> >> wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > I was wondering if any of you guys have done benchmarks on Kafka
> >> > > > performance before, and if they or their details (# nodes in
> >> cluster, #
> >> > > > records / size(s) of messages, etc.) could be shared.
> >> > > >
> >> > > > For comparison purposes, I am trying to benchmark Kafka against
> some
> >> > > > similar services such as Kinesis or Scribe. Additionally, I was
> >> > wondering
> >> > > > if anyone could shed some insight on Jay Kreps' benchmarks that he
> >> has
> >> > > > openly published here:
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> >> > > >
> >> > > > Specifically, I am unsure of why between his tests of 3x
> synchronous
> >> > > > replication and 3x async replication he changed the batch.size, as
> >> well
> >> > > as
> >> > > > why he is seemingly publishing to incorrect topics:
> >> > > >
> >> > > > Configs:
> >> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> >> > > >
> >> > > > Any help is greatly appreciated!
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Jiefu Gong
> >> > > > University of California, Berkeley | Class of 2017
> >> > > > B.A Computer Science | College of Letters and Sciences
> >> > > >
> >> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Thanks,
> >> > > Ewen
> >> > >
> >> >
> >>
> >
> >
>



-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427

Re: kafka benchmark tests

Posted by Yuheng Du <yu...@gmail.com>.

Also, I guess setting the target throughput to -1 means let it be as high
as possible?

On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du <yu...@gmail.com>
wrote:

> Thanks. If I set the acks=1 in the producer config options in
> bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> test7 50000000 100 -1 acks=1 bootstrap.servers=
> esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196?
>
> Does that mean for each message generated at the producer, the producer
> will wait until the broker sends the ack back, then send another message?
>
> Thanks.
>
> Yuheng
>
> On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <ku...@nmsworks.co.in>
> wrote:
>
>> Yes, A list of  Kafka Server host/port pairs to use for establishing the
>> initial connection to the Kafka cluster
>>
>> https://kafka.apache.org/documentation.html#newproducerconfigs
>>
>> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <yu...@gmail.com>
>> wrote:
>>
>> > Does anyone know what is bootstrap.servers=
>> > esv4-hcl198.grid.linkedin.com:9092 means in the following test command:
>> >
>> > bin/kafka-run-class.sh
>> org.apache.kafka.clients.tools.ProducerPerformance
>> > test7 50000000 100 -1 acks=1 bootstrap.servers=
>> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
>> batch.size=8196?
>> >
>> > what is bootstrap.servers? Is it the kafka server that I am running a
>> test
>> > at?
>> >
>> > Thanks.
>> >
>> > Yuheng
>> >
>> >
>> >
>> >
>> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
>> ewen@confluent.io
>> > >
>> > wrote:
>> >
>> > > I implemented (nearly) the same basic set of tests in the system test
>> > > framework we started at Confluent and that is going to move into
>> Kafka --
>> > > see the wip patch for KIP-25 here:
>> > https://github.com/apache/kafka/pull/70
>> > > In particular, that test is implemented in benchmark_test.py:
>> > >
>> > >
>> >
>> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
>> > >
>> > > Hopefully once that's merged people can reuse that benchmark (and add
>> to
>> > > it!) so they can easily run the same benchmarks across different
>> > hardware.
>> > > Here are some results from an older version of that test on m3.2xlarge
>> > > instances on EC2 using local ephemeral storage (I think... it's been
>> > awhile
>> > > since I ran these numbers and I didn't document methodology that
>> > > carefully):
>> > >
>> > > INFO:_.KafkaBenchmark:=================
>> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
>> > > INFO:_.KafkaBenchmark:=================
>> > > INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
>> > > rec/sec (65.240000 MB/s)
>> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
>> > > 667494.359673 rec/sec (63.660000 MB/s)
>> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
>> > > 116485.764275 rec/sec (11.110000 MB/s)
>> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
>> > > 1696519.022182 rec/sec (161.790000 MB/s)
>> > > INFO:_.KafkaBenchmark:Message size:
>> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
>> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
>> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
>> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
>> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
>> > > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
>> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.300000
>> > MB/s)
>> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec
>> (56.830500
>> > > MB/s)
>> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
>> (267.830800
>> > > MB/s)
>> > > INFO:_.KafkaBenchmark:Producer + consumer:
>> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000
>> MB/s)
>> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000
>> MB/s)
>> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
>> > > 4.000000 ms, 99.9% 19.000000 ms
>> > >
>> > > Don't trust these numbers for anything, the were a quick one-off test.
>> > I'm
>> > > just pasting the output so you get some idea of what the results might
>> > look
>> > > like. Once we merge the KIP-25 patch, Confluent will be running the
>> tests
>> > > regularly and results will be available publicly so we'll be able to
>> keep
>> > > better tabs on performance, albeit for only a specific class of
>> hardware.
>> > >
>> > > For the batch.size question -- I'm not sure the results in the blog
>> post
>> > > actually have different settings, it could be accidental divergence
>> > between
>> > > the script and the blog post. The post specifically notes that tuning
>> the
>> > > batch size in the synchronous case might help, but that he didn't do
>> > that.
>> > > If you're trying to benchmark the *optimal* throughput, tuning the
>> batch
>> > > size would make sense. Since synchronous replication will have higher
>> > > latency and there's a limit to how many requests can be in flight at
>> > once,
>> > > you'll want a larger batch size to compensate for the additional
>> latency.
>> > > However, in practice the increase you see may be negligible. Somebody
>> who
>> > > has spent more time fiddling with tweaking producer performance may
>> have
>> > > more insight.
>> > >
>> > > -Ewen
>> > >
>> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu>
>> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I was wondering if any of you guys have done benchmarks on Kafka
>> > > > performance before, and if they or their details (# nodes in
>> cluster, #
>> > > > records / size(s) of messages, etc.) could be shared.
>> > > >
>> > > > For comparison purposes, I am trying to benchmark Kafka against some
>> > > > similar services such as Kinesis or Scribe. Additionally, I was
>> > wondering
>> > > > if anyone could shed some insight on Jay Kreps' benchmarks that he
>> has
>> > > > openly published here:
>> > > >
>> > > >
>> > >
>> >
>> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
>> > > >
>> > > > Specifically, I am unsure of why between his tests of 3x synchronous
>> > > > replication and 3x async replication he changed the batch.size, as
>> well
>> > > as
>> > > > why he is seemingly publishing to incorrect topics:
>> > > >
>> > > > Configs:
>> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
>> > > >
>> > > > Any help is greatly appreciated!
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Jiefu Gong
>> > > > University of California, Berkeley | Class of 2017
>> > > > B.A Computer Science | College of Letters and Sciences
>> > > >
>> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks,
>> > > Ewen
>> > >
>> >
>>
>
>

Re: kafka benchmark tests

Posted by Yuheng Du <yu...@gmail.com>.

Thanks. If I set the acks=1 in the producer config options in
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 50000000 100 -1 acks=1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196?

Does that mean for each message generated at the producer, the producer
will wait until the broker sends the ack back, then send another message?

Thanks.

Yuheng

On Tue, Jul 14, 2015 at 10:06 AM, Manikumar Reddy <ku...@nmsworks.co.in>
wrote:

> Yes, A list of  Kafka Server host/port pairs to use for establishing the
> initial connection to the Kafka cluster
>
> https://kafka.apache.org/documentation.html#newproducerconfigs
>
> On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <yu...@gmail.com>
> wrote:
>
> > Does anyone know what is bootstrap.servers=
> > esv4-hcl198.grid.linkedin.com:9092 means in the following test command:
> >
> > bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> > test7 50000000 100 -1 acks=1 bootstrap.servers=
> > esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
> batch.size=8196?
> >
> > what is bootstrap.servers? Is it the kafka server that I am running a
> test
> > at?
> >
> > Thanks.
> >
> > Yuheng
> >
> >
> >
> >
> > On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <
> ewen@confluent.io
> > >
> > wrote:
> >
> > > I implemented (nearly) the same basic set of tests in the system test
> > > framework we started at Confluent and that is going to move into Kafka
> --
> > > see the wip patch for KIP-25 here:
> > https://github.com/apache/kafka/pull/70
> > > In particular, that test is implemented in benchmark_test.py:
> > >
> > >
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> > >
> > > Hopefully once that's merged people can reuse that benchmark (and add
> to
> > > it!) so they can easily run the same benchmarks across different
> > hardware.
> > > Here are some results from an older version of that test on m3.2xlarge
> > > instances on EC2 using local ephemeral storage (I think... it's been
> > awhile
> > > since I ran these numbers and I didn't document methodology that
> > > carefully):
> > >
> > > INFO:_.KafkaBenchmark:=================
> > > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > > INFO:_.KafkaBenchmark:=================
> > > INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
> > > rec/sec (65.240000 MB/s)
> > > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> > > 667494.359673 rec/sec (63.660000 MB/s)
> > > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> > > 116485.764275 rec/sec (11.110000 MB/s)
> > > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> > > 1696519.022182 rec/sec (161.790000 MB/s)
> > > INFO:_.KafkaBenchmark:Message size:
> > > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
> > > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
> > > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
> > > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
> > > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
> > > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> > > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.300000
> > MB/s)
> > > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec (56.830500
> > > MB/s)
> > > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec
> (267.830800
> > > MB/s)
> > > INFO:_.KafkaBenchmark:Producer + consumer:
> > > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000 MB/s)
> > > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000 MB/s)
> > > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
> > > 4.000000 ms, 99.9% 19.000000 ms
> > >
> > > Don't trust these numbers for anything, the were a quick one-off test.
> > I'm
> > > just pasting the output so you get some idea of what the results might
> > look
> > > like. Once we merge the KIP-25 patch, Confluent will be running the
> tests
> > > regularly and results will be available publicly so we'll be able to
> keep
> > > better tabs on performance, albeit for only a specific class of
> hardware.
> > >
> > > For the batch.size question -- I'm not sure the results in the blog
> post
> > > actually have different settings, it could be accidental divergence
> > between
> > > the script and the blog post. The post specifically notes that tuning
> the
> > > batch size in the synchronous case might help, but that he didn't do
> > that.
> > > If you're trying to benchmark the *optimal* throughput, tuning the
> batch
> > > size would make sense. Since synchronous replication will have higher
> > > latency and there's a limit to how many requests can be in flight at
> > once,
> > > you'll want a larger batch size to compensate for the additional
> latency.
> > > However, in practice the increase you see may be negligible. Somebody
> who
> > > has spent more time fiddling with tweaking producer performance may
> have
> > > more insight.
> > >
> > > -Ewen
> > >
> > > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I was wondering if any of you guys have done benchmarks on Kafka
> > > > performance before, and if they or their details (# nodes in
> cluster, #
> > > > records / size(s) of messages, etc.) could be shared.
> > > >
> > > > For comparison purposes, I am trying to benchmark Kafka against some
> > > > similar services such as Kinesis or Scribe. Additionally, I was
> > wondering
> > > > if anyone could shed some insight on Jay Kreps' benchmarks that he
> has
> > > > openly published here:
> > > >
> > > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > > >
> > > > Specifically, I am unsure of why between his tests of 3x synchronous
> > > > replication and 3x async replication he changed the batch.size, as
> well
> > > as
> > > > why he is seemingly publishing to incorrect topics:
> > > >
> > > > Configs:
> > > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > > >
> > > > Any help is greatly appreciated!
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jiefu Gong
> > > > University of California, Berkeley | Class of 2017
> > > > B.A Computer Science | College of Letters and Sciences
> > > >
> > > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> > >
> >
>

Re: kafka benchmark tests

Posted by Manikumar Reddy <ku...@nmsworks.co.in>.

Yes, A list of  Kafka Server host/port pairs to use for establishing the
initial connection to the Kafka cluster

https://kafka.apache.org/documentation.html#newproducerconfigs

On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du <yu...@gmail.com> wrote:

> Does anyone know what is bootstrap.servers=
> esv4-hcl198.grid.linkedin.com:9092 means in the following test command:
>
> bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> test7 50000000 100 -1 acks=1 bootstrap.servers=
> esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196?
>
> what is bootstrap.servers? Is it the kafka server that I am running a test
> at?
>
> Thanks.
>
> Yuheng
>
>
>
>
> On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <ewen@confluent.io
> >
> wrote:
>
> > I implemented (nearly) the same basic set of tests in the system test
> > framework we started at Confluent and that is going to move into Kafka --
> > see the wip patch for KIP-25 here:
> https://github.com/apache/kafka/pull/70
> > In particular, that test is implemented in benchmark_test.py:
> >
> >
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
> >
> > Hopefully once that's merged people can reuse that benchmark (and add to
> > it!) so they can easily run the same benchmarks across different
> hardware.
> > Here are some results from an older version of that test on m3.2xlarge
> > instances on EC2 using local ephemeral storage (I think... it's been
> awhile
> > since I ran these numbers and I didn't document methodology that
> > carefully):
> >
> > INFO:_.KafkaBenchmark:=================
> > INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> > INFO:_.KafkaBenchmark:=================
> > INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
> > rec/sec (65.240000 MB/s)
> > INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> > 667494.359673 rec/sec (63.660000 MB/s)
> > INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> > 116485.764275 rec/sec (11.110000 MB/s)
> > INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> > 1696519.022182 rec/sec (161.790000 MB/s)
> > INFO:_.KafkaBenchmark:Message size:
> > INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
> > INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
> > INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
> > INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
> > INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
> > INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> > INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.300000
> MB/s)
> > INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec (56.830500
> > MB/s)
> > INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec (267.830800
> > MB/s)
> > INFO:_.KafkaBenchmark:Producer + consumer:
> > INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000 MB/s)
> > INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000 MB/s)
> > INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
> > 4.000000 ms, 99.9% 19.000000 ms
> >
> > Don't trust these numbers for anything, the were a quick one-off test.
> I'm
> > just pasting the output so you get some idea of what the results might
> look
> > like. Once we merge the KIP-25 patch, Confluent will be running the tests
> > regularly and results will be available publicly so we'll be able to keep
> > better tabs on performance, albeit for only a specific class of hardware.
> >
> > For the batch.size question -- I'm not sure the results in the blog post
> > actually have different settings, it could be accidental divergence
> between
> > the script and the blog post. The post specifically notes that tuning the
> > batch size in the synchronous case might help, but that he didn't do
> that.
> > If you're trying to benchmark the *optimal* throughput, tuning the batch
> > size would make sense. Since synchronous replication will have higher
> > latency and there's a limit to how many requests can be in flight at
> once,
> > you'll want a larger batch size to compensate for the additional latency.
> > However, in practice the increase you see may be negligible. Somebody who
> > has spent more time fiddling with tweaking producer performance may have
> > more insight.
> >
> > -Ewen
> >
> > On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu> wrote:
> >
> > > Hi all,
> > >
> > > I was wondering if any of you guys have done benchmarks on Kafka
> > > performance before, and if they or their details (# nodes in cluster, #
> > > records / size(s) of messages, etc.) could be shared.
> > >
> > > For comparison purposes, I am trying to benchmark Kafka against some
> > > similar services such as Kinesis or Scribe. Additionally, I was
> wondering
> > > if anyone could shed some insight on Jay Kreps' benchmarks that he has
> > > openly published here:
> > >
> > >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> > >
> > > Specifically, I am unsure of why between his tests of 3x synchronous
> > > replication and 3x async replication he changed the batch.size, as well
> > as
> > > why he is seemingly publishing to incorrect topics:
> > >
> > > Configs:
> > > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> > >
> > > Any help is greatly appreciated!
> > >
> > >
> > >
> > > --
> > >
> > > Jiefu Gong
> > > University of California, Berkeley | Class of 2017
> > > B.A Computer Science | College of Letters and Sciences
> > >
> > > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>

Re: kafka benchmark tests

Posted by Yuheng Du <yu...@gmail.com>.

Does anyone know what is bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 means in the following test command:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 50000000 100 -1 acks=1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196?

what is bootstrap.servers? Is it the kafka server that I am running a test
at?

Thanks.

Yuheng




On Tue, Jul 14, 2015 at 12:18 AM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> I implemented (nearly) the same basic set of tests in the system test
> framework we started at Confluent and that is going to move into Kafka --
> see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70
> In particular, that test is implemented in benchmark_test.py:
>
> https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8
>
> Hopefully once that's merged people can reuse that benchmark (and add to
> it!) so they can easily run the same benchmarks across different hardware.
> Here are some results from an older version of that test on m3.2xlarge
> instances on EC2 using local ephemeral storage (I think... it's been awhile
> since I ran these numbers and I didn't document methodology that
> carefully):
>
> INFO:_.KafkaBenchmark:=================
> INFO:_.KafkaBenchmark:BENCHMARK RESULTS
> INFO:_.KafkaBenchmark:=================
> INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
> rec/sec (65.240000 MB/s)
> INFO:_.KafkaBenchmark:Single producer, async 3x replication:
> 667494.359673 rec/sec (63.660000 MB/s)
> INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
> 116485.764275 rec/sec (11.110000 MB/s)
> INFO:_.KafkaBenchmark:Three producers, async 3x replication:
> 1696519.022182 rec/sec (161.790000 MB/s)
> INFO:_.KafkaBenchmark:Message size:
> INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
> INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
> INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
> INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
> INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
> INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
> INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.300000 MB/s)
> INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec (56.830500
> MB/s)
> INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec (267.830800
> MB/s)
> INFO:_.KafkaBenchmark:Producer + consumer:
> INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000 MB/s)
> INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000 MB/s)
> INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
> 4.000000 ms, 99.9% 19.000000 ms
>
> Don't trust these numbers for anything, the were a quick one-off test. I'm
> just pasting the output so you get some idea of what the results might look
> like. Once we merge the KIP-25 patch, Confluent will be running the tests
> regularly and results will be available publicly so we'll be able to keep
> better tabs on performance, albeit for only a specific class of hardware.
>
> For the batch.size question -- I'm not sure the results in the blog post
> actually have different settings, it could be accidental divergence between
> the script and the blog post. The post specifically notes that tuning the
> batch size in the synchronous case might help, but that he didn't do that.
> If you're trying to benchmark the *optimal* throughput, tuning the batch
> size would make sense. Since synchronous replication will have higher
> latency and there's a limit to how many requests can be in flight at once,
> you'll want a larger batch size to compensate for the additional latency.
> However, in practice the increase you see may be negligible. Somebody who
> has spent more time fiddling with tweaking producer performance may have
> more insight.
>
> -Ewen
>
> On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu> wrote:
>
> > Hi all,
> >
> > I was wondering if any of you guys have done benchmarks on Kafka
> > performance before, and if they or their details (# nodes in cluster, #
> > records / size(s) of messages, etc.) could be shared.
> >
> > For comparison purposes, I am trying to benchmark Kafka against some
> > similar services such as Kinesis or Scribe. Additionally, I was wondering
> > if anyone could shed some insight on Jay Kreps' benchmarks that he has
> > openly published here:
> >
> >
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> >
> > Specifically, I am unsure of why between his tests of 3x synchronous
> > replication and 3x async replication he changed the batch.size, as well
> as
> > why he is seemingly publishing to incorrect topics:
> >
> > Configs:
> > https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
> >
> > Any help is greatly appreciated!
> >
> >
> >
> > --
> >
> > Jiefu Gong
> > University of California, Berkeley | Class of 2017
> > B.A Computer Science | College of Letters and Sciences
> >
> > jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: kafka benchmark tests

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

I implemented (nearly) the same basic set of tests in the system test
framework we started at Confluent and that is going to move into Kafka --
see the wip patch for KIP-25 here: https://github.com/apache/kafka/pull/70
In particular, that test is implemented in benchmark_test.py:
https://github.com/apache/kafka/pull/70/files#diff-ca984778cf9943407645eb6784f19dc8

Hopefully once that's merged people can reuse that benchmark (and add to
it!) so they can easily run the same benchmarks across different hardware.
Here are some results from an older version of that test on m3.2xlarge
instances on EC2 using local ephemeral storage (I think... it's been awhile
since I ran these numbers and I didn't document methodology that carefully):

INFO:_.KafkaBenchmark:=================
INFO:_.KafkaBenchmark:BENCHMARK RESULTS
INFO:_.KafkaBenchmark:=================
INFO:_.KafkaBenchmark:Single producer, no replication: 684097.470208
rec/sec (65.240000 MB/s)
INFO:_.KafkaBenchmark:Single producer, async 3x replication:
667494.359673 rec/sec (63.660000 MB/s)
INFO:_.KafkaBenchmark:Single producer, sync 3x replication:
116485.764275 rec/sec (11.110000 MB/s)
INFO:_.KafkaBenchmark:Three producers, async 3x replication:
1696519.022182 rec/sec (161.790000 MB/s)
INFO:_.KafkaBenchmark:Message size:
INFO:_.KafkaBenchmark: 10: 1637825.195625 rec/sec (15.620000 MB/s)
INFO:_.KafkaBenchmark: 100: 605504.877911 rec/sec (57.750000 MB/s)
INFO:_.KafkaBenchmark: 1000: 90351.817570 rec/sec (86.170000 MB/s)
INFO:_.KafkaBenchmark: 10000: 8306.180862 rec/sec (79.210000 MB/s)
INFO:_.KafkaBenchmark: 100000: 978.403499 rec/sec (93.310000 MB/s)
INFO:_.KafkaBenchmark:Throughput over long run, data > memory:
INFO:_.KafkaBenchmark: Time block 0: 684725.151324 rec/sec (65.300000 MB/s)
INFO:_.KafkaBenchmark:Single consumer: 701031.140000 rec/sec (56.830500 MB/s)
INFO:_.KafkaBenchmark:Three consumers: 3304011.014900 rec/sec (267.830800 MB/s)
INFO:_.KafkaBenchmark:Producer + consumer:
INFO:_.KafkaBenchmark: Producer: 624984.375391 rec/sec (59.600000 MB/s)
INFO:_.KafkaBenchmark: Consumer: 624984.375391 rec/sec (59.600000 MB/s)
INFO:_.KafkaBenchmark:End-to-end latency: median 2.000000 ms, 99%
4.000000 ms, 99.9% 19.000000 ms

Don't trust these numbers for anything, the were a quick one-off test. I'm
just pasting the output so you get some idea of what the results might look
like. Once we merge the KIP-25 patch, Confluent will be running the tests
regularly and results will be available publicly so we'll be able to keep
better tabs on performance, albeit for only a specific class of hardware.

For the batch.size question -- I'm not sure the results in the blog post
actually have different settings, it could be accidental divergence between
the script and the blog post. The post specifically notes that tuning the
batch size in the synchronous case might help, but that he didn't do that.
If you're trying to benchmark the *optimal* throughput, tuning the batch
size would make sense. Since synchronous replication will have higher
latency and there's a limit to how many requests can be in flight at once,
you'll want a larger batch size to compensate for the additional latency.
However, in practice the increase you see may be negligible. Somebody who
has spent more time fiddling with tweaking producer performance may have
more insight.

-Ewen

On Mon, Jul 13, 2015 at 10:08 AM, JIEFU GONG <jg...@berkeley.edu> wrote:

> Hi all,
>
> I was wondering if any of you guys have done benchmarks on Kafka
> performance before, and if they or their details (# nodes in cluster, #
> records / size(s) of messages, etc.) could be shared.
>
> For comparison purposes, I am trying to benchmark Kafka against some
> similar services such as Kinesis or Scribe. Additionally, I was wondering
> if anyone could shed some insight on Jay Kreps' benchmarks that he has
> openly published here:
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
>
> Specifically, I am unsure of why between his tests of 3x synchronous
> replication and 3x async replication he changed the batch.size, as well as
> why he is seemingly publishing to incorrect topics:
>
> Configs:
> https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
>
> Any help is greatly appreciated!
>
>
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
>

-- 
Thanks,
Ewen