You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by antonio saldivar <an...@gmail.com> on 2018/08/09 21:14:38 UTC

Flink Rebalance

Hello

Does anyone know why when I add "rebalance()" to my .map steps is adding a
lot of latency rather than not having rebalance.


I have kafka partitions in my topic 44 and 44 flink task manager

execution plan looks like this when I add rebalance but it is adding a lot
of latency

kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
kafka-sink

Thank you
regards

Re: Flink Rebalance

Posted by antonio saldivar <an...@gmail.com>.
Hi Fabian

Thank you, yes there are just map functions, i will do it that way with
methods to get it faster

On Fri, Aug 10, 2018, 5:58 AM Fabian Hueske <fh...@gmail.com> wrote:

> Hi,
>
> Elias and Paul have good points.
> I think the performance degradation is mostly to the lack of function
> chaining in the rebalance case.
>
> If all steps are just map functions, they can be chained in the
> no-rebalance case.
> That means, records are passed via function calls.
> If you add rebalancing, records will be passed between map functions via
> serialization, network transfer, and deserialization.
> This is of course much more expensive than calling a method.
>
> Best, Fabian
>
> 2018-08-10 4:25 GMT+02:00 Paul Lam <pa...@gmail.com>:
>
>> Hi Antonio,
>>
>> AFAIK, there are two reasons for this:
>>
>> 1. Rebalancing itself brings latency because it takes time to
>> redistribute the elements.
>> 2. Rebalancing also messes up the order in the Kafka topic partitions,
>> and often makes a event-time window wait longer to trigger in case you’re
>> using event time characteristic.
>>
>> Best Regards,
>> Paul Lam
>>
>>
>>
>> 在 2018年8月10日,05:49,antonio saldivar <an...@gmail.com> 写道:
>>
>> Hello
>>
>> Sending ~450 elements per second ( the values are in milliseconds start
>> to end)
>> I went from:
>> with Rebalance
>> *+------------+*
>> *| **AVGWINDOW ** |*
>> *+------------+*
>> *| *32131.0853  * |*
>> *+------------+*
>>
>> to this without rebalance
>>
>> *+------------+*
>> *| **AVGWINDOW ** |*
>> *+------------+*
>> *| *70.2077   * |*
>> *+------------+*
>>
>> El jue., 9 ago. 2018 a las 17:42, Elias Levy (<
>> fearsome.lucidity@gmail.com>) escribió:
>>
>>> What do you consider a lot of latency?  The rebalance will require
>>> serializing / deserializing the data as it gets distributed.  Depending on
>>> the complexity of your records and the efficiency of your serializers, that
>>> could have a significant impact on your performance.
>>>
>>> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <an...@gmail.com>
>>> wrote:
>>>
>>>> Hello
>>>>
>>>> Does anyone know why when I add "rebalance()" to my .map steps is
>>>> adding a lot of latency rather than not having rebalance.
>>>>
>>>>
>>>> I have kafka partitions in my topic 44 and 44 flink task manager
>>>>
>>>> execution plan looks like this when I add rebalance but it is adding a
>>>> lot of latency
>>>>
>>>> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
>>>> kafka-sink
>>>>
>>>> Thank you
>>>> regards
>>>>
>>>>
>>
>

Re: Flink Rebalance

Posted by Fabian Hueske <fh...@gmail.com>.
Hi,

Elias and Paul have good points.
I think the performance degradation is mostly to the lack of function
chaining in the rebalance case.

If all steps are just map functions, they can be chained in the
no-rebalance case.
That means, records are passed via function calls.
If you add rebalancing, records will be passed between map functions via
serialization, network transfer, and deserialization.
This is of course much more expensive than calling a method.

Best, Fabian

2018-08-10 4:25 GMT+02:00 Paul Lam <pa...@gmail.com>:

> Hi Antonio,
>
> AFAIK, there are two reasons for this:
>
> 1. Rebalancing itself brings latency because it takes time to redistribute
> the elements.
> 2. Rebalancing also messes up the order in the Kafka topic partitions, and
> often makes a event-time window wait longer to trigger in case you’re using
> event time characteristic.
>
> Best Regards,
> Paul Lam
>
>
>
> 在 2018年8月10日,05:49,antonio saldivar <an...@gmail.com> 写道:
>
> Hello
>
> Sending ~450 elements per second ( the values are in milliseconds start to
> end)
> I went from:
> with Rebalance
> *+------------+*
> *| **AVGWINDOW ** |*
> *+------------+*
> *| *32131.0853  * |*
> *+------------+*
>
> to this without rebalance
>
> *+------------+*
> *| **AVGWINDOW ** |*
> *+------------+*
> *| *70.2077   * |*
> *+------------+*
>
> El jue., 9 ago. 2018 a las 17:42, Elias Levy (<fearsome.lucidity@gmail.com
> >) escribió:
>
>> What do you consider a lot of latency?  The rebalance will require
>> serializing / deserializing the data as it gets distributed.  Depending on
>> the complexity of your records and the efficiency of your serializers, that
>> could have a significant impact on your performance.
>>
>> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <an...@gmail.com>
>> wrote:
>>
>>> Hello
>>>
>>> Does anyone know why when I add "rebalance()" to my .map steps is adding
>>> a lot of latency rather than not having rebalance.
>>>
>>>
>>> I have kafka partitions in my topic 44 and 44 flink task manager
>>>
>>> execution plan looks like this when I add rebalance but it is adding a
>>> lot of latency
>>>
>>> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
>>> kafka-sink
>>>
>>> Thank you
>>> regards
>>>
>>>
>

Re: Flink Rebalance

Posted by Paul Lam <pa...@gmail.com>.
Hi Antonio, 

AFAIK, there are two reasons for this: 

1. Rebalancing itself brings latency because it takes time to redistribute the elements. 
2. Rebalancing also messes up the order in the Kafka topic partitions, and often makes a event-time window wait longer to trigger in case you’re using event time characteristic. 

Best Regards,
Paul Lam


> 在 2018年8月10日,05:49,antonio saldivar <an...@gmail.com> 写道:
> 
> Hello
> 
> Sending ~450 elements per second ( the values are in milliseconds start to end)
> I went from:
> with Rebalance
> +------------+
> | AVGWINDOW  |
> +------------+
> | 32131.0853   |
> +------------+
> 
> to this without rebalance
> 
> +------------+
> | AVGWINDOW  |
> +------------+
> | 70.2077    |
> +------------+
> 
> El jue., 9 ago. 2018 a las 17:42, Elias Levy (<fearsome.lucidity@gmail.com <ma...@gmail.com>>) escribió:
> What do you consider a lot of latency?  The rebalance will require serializing / deserializing the data as it gets distributed.  Depending on the complexity of your records and the efficiency of your serializers, that could have a significant impact on your performance.
> 
> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <ansale10@gmail.com <ma...@gmail.com>> wrote:
> Hello
> 
> Does anyone know why when I add "rebalance()" to my .map steps is adding a lot of latency rather than not having rebalance.
> 
> 
> I have kafka partitions in my topic 44 and 44 flink task manager
> 
> execution plan looks like this when I add rebalance but it is adding a lot of latency
> 
> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance -> kafka-sink
> 
> Thank you 
> regards
> 


Re: Flink Rebalance

Posted by antonio saldivar <an...@gmail.com>.
Hello

Sending ~450 elements per second ( the values are in milliseconds start to
end)
I went from:
with Rebalance

*+------------+*

*| **AVGWINDOW ** |*

*+------------+*

*| *32131.0853  * |*

*+------------+*

to this without rebalance

*+------------+*

*| **AVGWINDOW ** |*

*+------------+*

*| *70.2077   * |*

*+------------+*

El jue., 9 ago. 2018 a las 17:42, Elias Levy (<fe...@gmail.com>)
escribió:

> What do you consider a lot of latency?  The rebalance will require
> serializing / deserializing the data as it gets distributed.  Depending on
> the complexity of your records and the efficiency of your serializers, that
> could have a significant impact on your performance.
>
> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <an...@gmail.com>
> wrote:
>
>> Hello
>>
>> Does anyone know why when I add "rebalance()" to my .map steps is adding
>> a lot of latency rather than not having rebalance.
>>
>>
>> I have kafka partitions in my topic 44 and 44 flink task manager
>>
>> execution plan looks like this when I add rebalance but it is adding a
>> lot of latency
>>
>> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
>> kafka-sink
>>
>> Thank you
>> regards
>>
>>

Re: Flink Rebalance

Posted by Elias Levy <fe...@gmail.com>.
What do you consider a lot of latency?  The rebalance will require
serializing / deserializing the data as it gets distributed.  Depending on
the complexity of your records and the efficiency of your serializers, that
could have a significant impact on your performance.

On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <an...@gmail.com> wrote:

> Hello
>
> Does anyone know why when I add "rebalance()" to my .map steps is adding a
> lot of latency rather than not having rebalance.
>
>
> I have kafka partitions in my topic 44 and 44 flink task manager
>
> execution plan looks like this when I add rebalance but it is adding a lot
> of latency
>
> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance ->
> kafka-sink
>
> Thank you
> regards
>
>