You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Sonex <al...@gmail.com> on 2017/03/20 14:29:09 UTC

load balancing of keys to operators

I am using a simple streaming job where I use keyBy on the stream to process
events per key. The keys may vary in number (few keys to thousands). I have
noticed a behavior of Flink and I need clarification on that. When we use
keyBy on the stream, flink assigns keys to parallel operators so each
operator can handle events per key independently. Once a key is assigned to
an operator, can the key change the operator on which it is assigned? From
what I`ve seen the answer is no.

For example, let`s assume that keys 1 and 2 are assigned to operator A and
keys 3 and 4 are assigned to operator B. If there is a burst of data for key
1 at some later time point, but keys 2,3 and 4 have only few data will key 2
be assigned to operator B to balance the load? If not is there a way to do
that? And again if not, why flink does not do that? 



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/load-balancing-of-keys-to-operators-tp12303.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: load balancing of keys to operators

Posted by Timo Walther <tw...@apache.org>.
I think it very depends on your use case, maybe you can use combiner 
first to reduce the amount of records per key. Maybe you can explain 
your application a bit more (which window, type of aggregations).

It often helps e.g. to introduce an artifical key und merge the result 
of multiple windows in a downstream operator.


Am 20/03/17 um 18:25 schrieb Sonex:
> Thanx for your response.
>
> When using time windows, doesn`t flink know the load per window? I have
> observed this behavior in windows as well.
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/load-balancing-of-keys-to-operators-tp12303p12308.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.



Re: load balancing of keys to operators

Posted by Sonex <al...@gmail.com>.
Thanx for your response.

When using time windows, doesn`t flink know the load per window? I have
observed this behavior in windows as well.



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/load-balancing-of-keys-to-operators-tp12303p12308.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: load balancing of keys to operators

Posted by Timo Walther <tw...@apache.org>.
Hi,

using keyBy Flink ensures that every set of records with same key is 
send to the same operator, otherwise it would not be possible to process 
them as a whole. It depends on your use case if it is also ok that 
another operator processes parts of this set of records. You can 
implement you own partition strategy to split your data more evenly 
(https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#physical-partitioning). 
But this depends on your knowledge of key spaces and load, Flink can not 
know this in advance.

I hope that helps.

Regards,
Timo


Am 20/03/17 um 15:29 schrieb Sonex:
> I am using a simple streaming job where I use keyBy on the stream to process
> events per key. The keys may vary in number (few keys to thousands). I have
> noticed a behavior of Flink and I need clarification on that. When we use
> keyBy on the stream, flink assigns keys to parallel operators so each
> operator can handle events per key independently. Once a key is assigned to
> an operator, can the key change the operator on which it is assigned? From
> what I`ve seen the answer is no.
>
> For example, let`s assume that keys 1 and 2 are assigned to operator A and
> keys 3 and 4 are assigned to operator B. If there is a burst of data for key
> 1 at some later time point, but keys 2,3 and 4 have only few data will key 2
> be assigned to operator B to balance the load? If not is there a way to do
> that? And again if not, why flink does not do that?
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/load-balancing-of-keys-to-operators-tp12303.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.