You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by le...@tutanota.com on 2016/05/09 09:06:14 UTC
Key factors for Flink's performance
Hello Flink team,
i am currently playing around with Storm and Flink in the context of a smart
home. The primary functional requirement is to quickly react to certain
properties in stream tuples.
I was looking at some benchmarks from the two systems, and generally Flink
has the upper hand, in both throughput and latency. I do not really
understand how Flink achieves better latency than Storm, which is driven by
one-at-at-time tuples.
From what i understood in the documentation, Flink performs micro batching
when transferring data across the network to downstream operators located on
other nodes. Perhaps this achieves a better average latency.
Surely the bigger factor however is that Flink can completely bypass internal
operator queues with operator chaining, which Storm cannot do.
Kind regardsLeon
Re: Key factors for Flink's performance
Posted by Stephan Ewen <se...@apache.org>.
Hi Leon!
I agree with Aljoscha that the term "microbatches" is confusing in that
context. Flink's network layer is "buffer" oriented rather than "record
oriented". Buffering it is a best effort to gather some elements in case
where they come fast enough that this would not add much latency anyways.
Concerning the latency: Chaining has a positive effect on latency. Some of
the benchmarks show how Flink needs to communicate less with external
systems (like Redis) - that is another source of reducing latency.
For very simple programs that have no external communication and no
chaining, I would expect Flink and Storm to be not very different in
latency.
Greetings,
Stephan
On Wed, May 11, 2016 at 9:24 AM, Aljoscha Krettek <al...@apache.org>
wrote:
> Hi,
> latency for Flink and Storm are pretty similar. The only reason I could
> see for Flink having the slight upper hand there is the fact that Storm
> tracks the progress of every tuple throughout the topology and requires
> ACKs that have to go back to the sinks.
>
> As for throughput you are right that Flink sends elements in batches. The
> size of these batches can be controlled, even be reduced to 1, which yields
> best latency. The fact that there are these batches not not visible
> anywhere in the model, so calling them micro batches is problematic, since
> that already refers to a very different concept in Spark Streaming.
>
> Cheers,
> Aljoscha
>
> On Mon, 9 May 2016 at 11:06 <le...@tutanota.com> wrote:
>
>> Hello Flink team,
>>
>> i am currently playing around with Storm and Flink in the context of a
>> smart home. The primary functional requirement is to quickly react to
>> certain properties in stream tuples.
>>
>> I was looking at some benchmarks from the two systems, and generally
>> Flink has the upper hand, in both throughput and latency. I do not really
>> understand how Flink achieves better latency than Storm, which is driven by
>> one-at-at-time tuples.
>>
>> From what i understood in the documentation, Flink performs micro
>> batching when transferring data across the network to downstream operators
>> located on other nodes. Perhaps this achieves a better average latency.
>>
>> Surely the bigger factor however is that Flink can completely bypass
>> internal operator queues with operator chaining, which Storm cannot do.
>>
>> Kind regards
>> Leon
>> <https://tutanota.com>
>>
>
Re: Key factors for Flink's performance
Posted by Aljoscha Krettek <al...@apache.org>.
Hi,
latency for Flink and Storm are pretty similar. The only reason I could see
for Flink having the slight upper hand there is the fact that Storm tracks
the progress of every tuple throughout the topology and requires ACKs that
have to go back to the sinks.
As for throughput you are right that Flink sends elements in batches. The
size of these batches can be controlled, even be reduced to 1, which yields
best latency. The fact that there are these batches not not visible
anywhere in the model, so calling them micro batches is problematic, since
that already refers to a very different concept in Spark Streaming.
Cheers,
Aljoscha
On Mon, 9 May 2016 at 11:06 <le...@tutanota.com> wrote:
> Hello Flink team,
>
> i am currently playing around with Storm and Flink in the context of a
> smart home. The primary functional requirement is to quickly react to
> certain properties in stream tuples.
>
> I was looking at some benchmarks from the two systems, and generally Flink
> has the upper hand, in both throughput and latency. I do not really
> understand how Flink achieves better latency than Storm, which is driven by
> one-at-at-time tuples.
>
> From what i understood in the documentation, Flink performs micro batching
> when transferring data across the network to downstream operators located
> on other nodes. Perhaps this achieves a better average latency.
>
> Surely the bigger factor however is that Flink can completely bypass
> internal operator queues with operator chaining, which Storm cannot do.
>
> Kind regards
> Leon
> <https://tutanota.com>
>