You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Giselle van Dongen <Gi...@UGent.be> on 2017/03/23 10:09:01 UTC

Benchmarking streaming frameworks

Dear users of Streaming Technologies,

As a PhD student in big data analytics, I am currently in the process of
compiling a list of benchmarks (to test multiple streaming frameworks) in
order to create an expanded benchmarking suite. The benchmark suite is being
developed as a part of my current work at Ghent University.

The included frameworks at this time are, in no particular order, Spark,
Flink, Kafka (Streams), Storm (Trident) and Drizzle. Any pointers to
previous work or relevant benchmarks would be appreciated.

Best regards,
Giselle van Dongen

Re: Benchmarking streaming frameworks

Posted by David Garcia <da...@spiceworks.com>.
I don’t think “benchmarking” frameworks WRT Kafka is a particularly informative.  The various frameworks available are better compared WRT their features and processing limitations.  For example, Akka-streams for kafka effects a more intuitive way to express asynchronous operations.  If you were to benchmark each framework with a simple poll-transformation-publish workload, I think you would find very little difference between them (assuming that they were all configured appropriately…minimum consumer bytes setting for instance).  I think each framework would be better evaluated according to it’s features….just my thoughts.

-David

On 3/23/17, 9:38 AM, "Eno Thereska" <en...@gmail.com> wrote:

    Hi Giselle,
    
    Great idea! In Kafka Streams we have a few micro-benchmarks we run nightly. They are at: https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java <https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java> 
    
    It's mostly simple stuff (aggregations, joins) and we are continuously updating them and adding more. 
    The nightly results are kept publicly at http://testing.confluent.io/confluent-kafka-system-test-results/ <http://testing.confluent.io/confluent-kafka-system-test-results/>, e.g., see report on 2017-03-21: http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-03-21--001.1490119830--apache--trunk--05690f0/report.html <http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-03-21--001.1490119830--apache--trunk--05690f0/report.html> (search for "simple_benchmark_test").
    
    Your feedback and input is always appreciated.
    
    Thanks,
    Eno
    
    > On 23 Mar 2017, at 10:09, Giselle van Dongen <Gi...@UGent.be> wrote:
    > 
    > Dear users of Streaming Technologies,
    > 
    > As a PhD student in big data analytics, I am currently in the process of
    > compiling a list of benchmarks (to test multiple streaming frameworks) in
    > order to create an expanded benchmarking suite. The benchmark suite is being
    > developed as a part of my current work at Ghent University.
    > 
    > The included frameworks at this time are, in no particular order, Spark,
    > Flink, Kafka (Streams), Storm (Trident) and Drizzle. Any pointers to
    > previous work or relevant benchmarks would be appreciated.
    > 
    > Best regards,
    > Giselle van Dongen
    
    


Re: Benchmarking streaming frameworks

Posted by Eno Thereska <en...@gmail.com>.
Hi Giselle,

Great idea! In Kafka Streams we have a few micro-benchmarks we run nightly. They are at: https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java <https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java> 

It's mostly simple stuff (aggregations, joins) and we are continuously updating them and adding more. 
The nightly results are kept publicly at http://testing.confluent.io/confluent-kafka-system-test-results/ <http://testing.confluent.io/confluent-kafka-system-test-results/>, e.g., see report on 2017-03-21: http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-03-21--001.1490119830--apache--trunk--05690f0/report.html <http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-03-21--001.1490119830--apache--trunk--05690f0/report.html> (search for "simple_benchmark_test").

Your feedback and input is always appreciated.

Thanks,
Eno

> On 23 Mar 2017, at 10:09, Giselle van Dongen <Gi...@UGent.be> wrote:
> 
> Dear users of Streaming Technologies,
> 
> As a PhD student in big data analytics, I am currently in the process of
> compiling a list of benchmarks (to test multiple streaming frameworks) in
> order to create an expanded benchmarking suite. The benchmark suite is being
> developed as a part of my current work at Ghent University.
> 
> The included frameworks at this time are, in no particular order, Spark,
> Flink, Kafka (Streams), Storm (Trident) and Drizzle. Any pointers to
> previous work or relevant benchmarks would be appreciated.
> 
> Best regards,
> Giselle van Dongen