You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Adrienne Kole <ad...@gmail.com> on 2017/05/07 21:23:45 UTC

AllWindowed vs Windowed with 1 key

Hi,

I am doing simple aggregation with a keyed and global windows in flink.
When I compare the keyed window aggregation with 1 key and global window
(which has parallelism 1) I would expect that both of them would have
similar performance.

However, keyed stream with 1 key performs with 2x more throughput than
global window.
My configuration is 8 node cluster, 16 core in each node, parallelism = 128.

AFAIK, Flink doesn't manage skew by default and uses hash function to
assign keys to partitions. So if I have 1 key only, it should go to only
one partition always, which is semantically similar to global windows in
flink.

What can be the reason behind this difference in performance?

Thanks,
Adrienne

Re: AllWindowed vs Windowed with 1 key

Posted by Stefan Richter <s....@data-artisans.com>.
That is interesting, because already in Flink 1.1.x, windowAll() is implemented as input.keyBy(new DummyKeySelector()).window(). Are you using event time or processing or event time, and most important, do the execution graphs in the web frontend look different in both variants?

> Am 08.05.2017 um 16:51 schrieb Adrienne Kole <ad...@gmail.com>:
> 
> Hi,
> 
> Thanks for the reply. So I have 2 cases:
> 
> 1. timeWindowAll (length, slide).reduce (...) (with parallelism = 1)
> 2. groupby(someField).timeWindow(length, slide). reduce(...)
> 
> Lets say case-1 global window, case-2 partitioned window. If I have only one key (for case-2) and I set parallelism=1  for case-1, I would expect that both cases have similar performance both in terms of latency and throughput. However, partitioned windows outperform global ones by orders of magnitude in terms of throughput. 
> I am using Flink 1.1.3.
> 
> 
> Thanks,
> Adrienne
> 
> 
>  
> 
> On Mon, May 8, 2017 at 3:55 PM, Stefan Richter <s.richter@data-artisans.com <ma...@data-artisans.com>> wrote:
> Hi,
> 
> to answer this question, we would first need to know what you mean by „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer might depend on the Flink version that you are using.
> 
> Best,
> Stefan
> 
> > Am 07.05.2017 um 23:23 schrieb Adrienne Kole <adriennekole1@gmail.com <ma...@gmail.com>>:
> >
> > Hi,
> >
> > I am doing simple aggregation with a keyed and global windows in flink.
> > When I compare the keyed window aggregation with 1 key and global window (which has parallelism 1) I would expect that both of them would have similar performance.
> >
> > However, keyed stream with 1 key performs with 2x more throughput than global window.
> > My configuration is 8 node cluster, 16 core in each node, parallelism = 128.
> >
> > AFAIK, Flink doesn't manage skew by default and uses hash function to assign keys to partitions. So if I have 1 key only, it should go to only one partition always, which is semantically similar to global windows in flink.
> >
> > What can be the reason behind this difference in performance?
> >
> > Thanks,
> > Adrienne
> 
> 


Re: AllWindowed vs Windowed with 1 key

Posted by Adrienne Kole <ad...@gmail.com>.
Hi,

Thanks for the reply. So I have 2 cases:

1. timeWindowAll (length, slide).reduce (...) (with parallelism = 1)
2. groupby(someField).timeWindow(length, slide). reduce(...)

Lets say case-1 global window, case-2 partitioned window. If I have only
one key (for case-2) and I set parallelism=1  for case-1, I would expect
that both cases have similar performance both in terms of latency and
throughput. However, partitioned windows outperform global ones by orders
of magnitude in terms of throughput.
I am using Flink 1.1.3.


Thanks,
Adrienne




On Mon, May 8, 2017 at 3:55 PM, Stefan Richter <s....@data-artisans.com>
wrote:

> Hi,
>
> to answer this question, we would first need to know what you mean by
> „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer
> might depend on the Flink version that you are using.
>
> Best,
> Stefan
>
> > Am 07.05.2017 um 23:23 schrieb Adrienne Kole <ad...@gmail.com>:
> >
> > Hi,
> >
> > I am doing simple aggregation with a keyed and global windows in flink.
> > When I compare the keyed window aggregation with 1 key and global window
> (which has parallelism 1) I would expect that both of them would have
> similar performance.
> >
> > However, keyed stream with 1 key performs with 2x more throughput than
> global window.
> > My configuration is 8 node cluster, 16 core in each node, parallelism =
> 128.
> >
> > AFAIK, Flink doesn't manage skew by default and uses hash function to
> assign keys to partitions. So if I have 1 key only, it should go to only
> one partition always, which is semantically similar to global windows in
> flink.
> >
> > What can be the reason behind this difference in performance?
> >
> > Thanks,
> > Adrienne
>
>

Re: AllWindowed vs Windowed with 1 key

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

to answer this question, we would first need to know what you mean by „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer might depend on the Flink version that you are using.

Best,
Stefan

> Am 07.05.2017 um 23:23 schrieb Adrienne Kole <ad...@gmail.com>:
> 
> Hi,
> 
> I am doing simple aggregation with a keyed and global windows in flink. 
> When I compare the keyed window aggregation with 1 key and global window (which has parallelism 1) I would expect that both of them would have similar performance. 
> 
> However, keyed stream with 1 key performs with 2x more throughput than global window. 
> My configuration is 8 node cluster, 16 core in each node, parallelism = 128.
> 
> AFAIK, Flink doesn't manage skew by default and uses hash function to assign keys to partitions. So if I have 1 key only, it should go to only one partition always, which is semantically similar to global windows in flink.
> 
> What can be the reason behind this difference in performance?
> 
> Thanks,
> Adrienne