You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Tom Crayford <tc...@heroku.com> on 2016/05/17 15:15:57 UTC

Perf producer/consumers for compacted topics

Hi there,

As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been
doing extensive benchmarking of Kafka. In our case this is to help give
customers a good idea of the performance of our various configurations. For
this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh`
across multiple machines, which was relatively easy to do and very
successful (recently leading to a doc change and a good lesson about 0.10).

However, we're finding one thing missing from the current producer/consumer
perf tests, which is that there's no good perf testing on compacted topics.
Some folk will undoubtedly use compacted topics, so it would be extremely
helpful (I think) for the community to have benchmarks that test
performance on compacted topics. We're interested in working on this and
contributing it upstream, but are pretty unsure what such a test should
look like. One straw proposal is to adapt the existing producer/consumer
perf tests to work on a compacted topic, likely with an additional flag on
the producer that lets you choose how wide a key range to emit, if it
should emit deletes (and how often to do so) and so on. Is there anything
more we could or should do there?

We're happy writing the code here, and want to continue contributing back,
I'd just love a hand thinking about what perf tests for compacted topics
should look like.

Thanks

Tom Crayford
Heroku Kafka

Re: Perf producer/consumers for compacted topics

Posted by Tom Crayford <tc...@heroku.com>.
Hi,

I'm interested in benchmarking the impact of compaction on producers and
consumers and long term cluster stability. That's not *quite* the impact of
it on the server side, but it certainly plays into it. For example, I'd
like to be able to answer "in configuration X, if we write N messages into
a compacted topic with a certain key range, a certain number of deletes
etc, *then* replay that into a consumer that does nothing. How long does
that consumer take? What happens if we're continually running that
compaction process, along with restarting consumers once an hour or two,
*and* producing a lot of messages. What happens to perf on compacted topics
with different disk configurations (e.g. magnetic vs ssd, RAID vs JBOD).

I certainly welcome some topic/partition specific compaction metrics, and
would be willing to contribute there.

Thanks

Tom

On Wed, May 18, 2016 at 1:32 PM, Manikumar Reddy <ma...@gmail.com>
wrote:

> Hi,
>
> There is a kafka.tools.TestLogCleaning tool, which is used to stress test
> the compaction feature.
> This tool validates the correctness of compaction process. This tool can be
> improved for perf testing.
>
> I think you want to benchmark server side compaction process.  Currently we
> have few compaction
> related metrics. We may need to add few more topic specific metrics for
> better analysis.
>
> log compaction related JMX metrics:
> kafka.log:type=LogCleaner,name=cleaner-recopy-percent
> kafka.log:type=LogCleaner,name=max-buffer-utilization-percent
> kafka.log:type=LogCleaner,name=max-clean-time-secs
> kafka.log:type=LogCleanerManager,name=max-dirty-percent
>
> Manikumar
>
> On Tue, May 17, 2016 at 8:45 PM, Tom Crayford <tc...@heroku.com>
> wrote:
>
> > Hi there,
> >
> > As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been
> > doing extensive benchmarking of Kafka. In our case this is to help give
> > customers a good idea of the performance of our various configurations.
> For
> > this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh`
> > across multiple machines, which was relatively easy to do and very
> > successful (recently leading to a doc change and a good lesson about
> 0.10).
> >
> > However, we're finding one thing missing from the current
> producer/consumer
> > perf tests, which is that there's no good perf testing on compacted
> topics.
> > Some folk will undoubtedly use compacted topics, so it would be extremely
> > helpful (I think) for the community to have benchmarks that test
> > performance on compacted topics. We're interested in working on this and
> > contributing it upstream, but are pretty unsure what such a test should
> > look like. One straw proposal is to adapt the existing producer/consumer
> > perf tests to work on a compacted topic, likely with an additional flag
> on
> > the producer that lets you choose how wide a key range to emit, if it
> > should emit deletes (and how often to do so) and so on. Is there anything
> > more we could or should do there?
> >
> > We're happy writing the code here, and want to continue contributing
> back,
> > I'd just love a hand thinking about what perf tests for compacted topics
> > should look like.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
>

Re: Perf producer/consumers for compacted topics

Posted by Manikumar Reddy <ma...@gmail.com>.
Hi,

There is a kafka.tools.TestLogCleaning tool, which is used to stress test
the compaction feature.
This tool validates the correctness of compaction process. This tool can be
improved for perf testing.

I think you want to benchmark server side compaction process.  Currently we
have few compaction
related metrics. We may need to add few more topic specific metrics for
better analysis.

log compaction related JMX metrics:
kafka.log:type=LogCleaner,name=cleaner-recopy-percent
kafka.log:type=LogCleaner,name=max-buffer-utilization-percent
kafka.log:type=LogCleaner,name=max-clean-time-secs
kafka.log:type=LogCleanerManager,name=max-dirty-percent

Manikumar

On Tue, May 17, 2016 at 8:45 PM, Tom Crayford <tc...@heroku.com> wrote:

> Hi there,
>
> As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been
> doing extensive benchmarking of Kafka. In our case this is to help give
> customers a good idea of the performance of our various configurations. For
> this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh`
> across multiple machines, which was relatively easy to do and very
> successful (recently leading to a doc change and a good lesson about 0.10).
>
> However, we're finding one thing missing from the current producer/consumer
> perf tests, which is that there's no good perf testing on compacted topics.
> Some folk will undoubtedly use compacted topics, so it would be extremely
> helpful (I think) for the community to have benchmarks that test
> performance on compacted topics. We're interested in working on this and
> contributing it upstream, but are pretty unsure what such a test should
> look like. One straw proposal is to adapt the existing producer/consumer
> perf tests to work on a compacted topic, likely with an additional flag on
> the producer that lets you choose how wide a key range to emit, if it
> should emit deletes (and how often to do so) and so on. Is there anything
> more we could or should do there?
>
> We're happy writing the code here, and want to continue contributing back,
> I'd just love a hand thinking about what perf tests for compacted topics
> should look like.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>