You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Wei Hou via user <us...@flink.apache.org> on 2023/04/25 22:17:02 UTC

Can I setup standby taskmanagers while using reactive mode?

Hi Flink community,

We are trying to use Flink’s reactive mode with Kubernetes HPA for autoscaling, however since the reactive mode will always use all available resources, it causes a problem when we need standby task managers for fast failure recover: The job will always use these extra standby task managers as active task manager to process data.

I wonder if you have any suggestion on this, should we avoid using Flink reactive mode together with standby task managers?

Best,
Wei



Re: Can I setup standby taskmanagers while using reactive mode?

Posted by Gyula Fóra <gy...@gmail.com>.
You could also check out the Autoscaler logic in the Flink Kubernetes
Operator (
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/
)
On the current main and in the upcoming 1.5.0 release the mechanism is
pretty nice and solid :)

It works with the native integration so you can also set standby TMs with a
simple config.

Cheers,
Gyula

On Fri, Apr 28, 2023 at 7:31 AM Wei Hou <we...@airbnb.com> wrote:

> Thank you for all your responses! I think Gyula is right, simply do a MAX -
> some_offset is not ideal as it can make the standby TM useless.
> It is difficult for the scheduler to determine whether a pod has been lost
> or scaled down when we enable autoscaling, which affects its decision to
> utilize standby TMs. We probably need to monitor the HPA events in order to
> get this information.
> I will wait to see if there is a solution for this problem in the future.
>
>
> On Wed, Apr 26, 2023 at 7:20 AM Gyula Fóra <gy...@gmail.com> wrote:
>
>> I think the behaviour is going to get a little weird because this would
>> actually defeat the purpose of the standby TM.
>> MAX - some offset will decrease once you lose a TM so in this case we
>> would scale down to again have a spare (which we never actually use.)
>>
>> Gyula
>>
>> On Wed, Apr 26, 2023 at 4:02 PM Chesnay Schepler <ch...@apache.org>
>> wrote:
>>
>>> Reactive mode doesn't support standby taskmanagers. As you said it
>>> always uses all available resources in the cluster.
>>>
>>> I can see it being useful though to not always scale to MAX but (MAX -
>>> some_offset).
>>>
>>> I'd suggest to file a ticket.
>>>
>>> On 26/04/2023 00:17, Wei Hou via user wrote:
>>> > Hi Flink community,
>>> >
>>> > We are trying to use Flink’s reactive mode with Kubernetes HPA for
>>> autoscaling, however since the reactive mode will always use all available
>>> resources, it causes a problem when we need standby task managers for fast
>>> failure recover: The job will always use these extra standby task managers
>>> as active task manager to process data.
>>> >
>>> > I wonder if you have any suggestion on this, should we avoid using
>>> Flink reactive mode together with standby task managers?
>>> >
>>> > Best,
>>> > Wei
>>> >
>>> >
>>>
>>>

Re: Can I setup standby taskmanagers while using reactive mode?

Posted by Wei Hou via user <us...@flink.apache.org>.
Thank you for all your responses! I think Gyula is right, simply do a MAX -
some_offset is not ideal as it can make the standby TM useless.
It is difficult for the scheduler to determine whether a pod has been lost
or scaled down when we enable autoscaling, which affects its decision to
utilize standby TMs. We probably need to monitor the HPA events in order to
get this information.
I will wait to see if there is a solution for this problem in the future.


On Wed, Apr 26, 2023 at 7:20 AM Gyula Fóra <gy...@gmail.com> wrote:

> I think the behaviour is going to get a little weird because this would
> actually defeat the purpose of the standby TM.
> MAX - some offset will decrease once you lose a TM so in this case we
> would scale down to again have a spare (which we never actually use.)
>
> Gyula
>
> On Wed, Apr 26, 2023 at 4:02 PM Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> Reactive mode doesn't support standby taskmanagers. As you said it
>> always uses all available resources in the cluster.
>>
>> I can see it being useful though to not always scale to MAX but (MAX -
>> some_offset).
>>
>> I'd suggest to file a ticket.
>>
>> On 26/04/2023 00:17, Wei Hou via user wrote:
>> > Hi Flink community,
>> >
>> > We are trying to use Flink’s reactive mode with Kubernetes HPA for
>> autoscaling, however since the reactive mode will always use all available
>> resources, it causes a problem when we need standby task managers for fast
>> failure recover: The job will always use these extra standby task managers
>> as active task manager to process data.
>> >
>> > I wonder if you have any suggestion on this, should we avoid using
>> Flink reactive mode together with standby task managers?
>> >
>> > Best,
>> > Wei
>> >
>> >
>>
>>

Re: Can I setup standby taskmanagers while using reactive mode?

Posted by Gyula Fóra <gy...@gmail.com>.
I think the behaviour is going to get a little weird because this would
actually defeat the purpose of the standby TM.
MAX - some offset will decrease once you lose a TM so in this case we would
scale down to again have a spare (which we never actually use.)

Gyula

On Wed, Apr 26, 2023 at 4:02 PM Chesnay Schepler <ch...@apache.org> wrote:

> Reactive mode doesn't support standby taskmanagers. As you said it
> always uses all available resources in the cluster.
>
> I can see it being useful though to not always scale to MAX but (MAX -
> some_offset).
>
> I'd suggest to file a ticket.
>
> On 26/04/2023 00:17, Wei Hou via user wrote:
> > Hi Flink community,
> >
> > We are trying to use Flink’s reactive mode with Kubernetes HPA for
> autoscaling, however since the reactive mode will always use all available
> resources, it causes a problem when we need standby task managers for fast
> failure recover: The job will always use these extra standby task managers
> as active task manager to process data.
> >
> > I wonder if you have any suggestion on this, should we avoid using Flink
> reactive mode together with standby task managers?
> >
> > Best,
> > Wei
> >
> >
>
>

Re: Can I setup standby taskmanagers while using reactive mode?

Posted by Chesnay Schepler <ch...@apache.org>.
Reactive mode doesn't support standby taskmanagers. As you said it 
always uses all available resources in the cluster.

I can see it being useful though to not always scale to MAX but (MAX - 
some_offset).

I'd suggest to file a ticket.

On 26/04/2023 00:17, Wei Hou via user wrote:
> Hi Flink community,
>
> We are trying to use Flink’s reactive mode with Kubernetes HPA for autoscaling, however since the reactive mode will always use all available resources, it causes a problem when we need standby task managers for fast failure recover: The job will always use these extra standby task managers as active task manager to process data.
>
> I wonder if you have any suggestion on this, should we avoid using Flink reactive mode together with standby task managers?
>
> Best,
> Wei
>
>