You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Denis Cosmin NUTIU <dn...@bitdefender.com> on 2021/08/26 09:16:59 UTC

Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Hello,

I've developed a Flink job and I'm trying to deploy it on a Kubernetes
cluster using Flink Native.

Setting kubernetes.taskmanager.cpu=0.5 and
kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
which is correct, but I'd like to set the requests and limits to
different values, something like:

resources:
  requests:
    memory: "1048Mi"
    cpu: "100m"
  limits:
    memory: "2096Mi"
    cpu: "1000m"

I've tried using pod templates from Flink 1.13 and manually patching
the Kubernetes deployment file, the jobmanager gets spawned with the
correct reousrce requests and limits but the taskmanagers get spawned
with the defaults:   

Limits:
      cpu:     1
      memory:  1728Mi
    Requests:
      cpu:     1
      memory:  1728Mi

Is there any way I could set the requests/limits for the CPU/Memory to
different values when deploying Flink in Kubernetes? If not, would it
make sense to request this as a feature?

Thanks in advance! 

Denis




Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by 971066723 <97...@qq.com>.
  

Hi,everyone

I have some other ideas for kubernetes resource Settings, as described by
WangYang in [flink-15648], which increase the CPU limit by a certain
percentage to provide more computational performance for jobs. Should we
consider the alternative of shrinking the request to start more jobs, which
would improve cluster resource utilization? For example, for some low-traffic
tasks, we can even set the CPU request to 0 in extreme cases. Both limit
enlargement and Request shrinkage may be required

  

Best,

Lz

On 09/1/2021 16:06,[Denis Cosmin
NUTIU<dn...@bitdefender.com>](mailto:dnutiu@bitdefender.com) wrote:

> Hi Yang,

>

>  
>

>

> I have limited Flink internals knowledge, but I can try to implement
FLINK-15648 and open up a PR on GitHub or send the patch via email. How does
that sound?

>

> I'll sign the ICLA and switch to my personal address.

>

>  
>

>

> Sincerely,

>

> Denis

>

>  
>

>

> On Wed, 2021-09-01 at 13:48 +0800, Yang Wang wrote:

>

>> Great. If no one wants to work on this ticket FLINK-15648, I will try to
get this done in the next major release cycle(1.15).

>>

>>  
>

>>

>> Best,

>>

>> Yang

>>

>>  
>

>>

>> Denis Cosmin NUTIU
<[dnutiu@bitdefender.com](mailto:dnutiu@bitdefender.com)> 于2021年8月31日周二
下午4:59写道:  
>

>>

>>> Hi everyone,

>>>

>>>  
>

>>>

>>> Thanks for getting back to me!

>>>

>>>  
>

>>>

>>> >  I think it would be nice if the task manager pods get their values from
the configuration file only if the pod templates don’t specify any resources.
That was the goal of supporting pod templates, right? Allowing more custom
scenarios without letting the configuration options get bloated.

>>>

>>>  
>

>>>

>>> I think that's correct. In the current behavior Flink will override the
resources settings "The memory and cpu resources(including requests and
limits) will be overwritten by Flink configuration options. All other
resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the
comments from FLINK-15648[2], I'm not sure that it can be done in a clean
manner with pod templates.

>>>

>>>  
>

>>>

>>> > I think it is a good improvement to support different resource requests
and limits. And it is very useful especially for the CPU resource since it
heavily depends on the upstream workloads.

>>>

>>>  
>

>>>

>>> I agree with you! I have limited knowledge of Flink internals but the
kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor
seems to be the right way to do it.

>>>

>>>  
>

>>>

>>> [1] [Native Kubernetes | Apache
Flink](https://ci.apache.org/projects/flink/flink-docs-
master/docs/deployment/resource-providers/native_kubernetes/#pod-template)

>>>

>>> [2] [[FLINK-15648] Support to configure limit for CPU and memory
requirement - ASF JIRA
(apache.org)](https://issues.apache.org/jira/browse/FLINK-15648)

>>>

>>>  
>

>>>

>>> * * *

>>>

>>> **From:** Yang Wang
<[danrtsey.wy@gmail.com](mailto:danrtsey.wy@gmail.com)>  
>  **Sent:** Tuesday, August 31, 2021 6:04 AM  
>  **To:** Alexis Sarda-Espinosa  <[alexis.sarda-
espinosa@microfocus.com](mailto:alexis.sarda-espinosa@microfocus.com)>  
>  **Cc:** Denis Cosmin NUTIU
<[dnutiu@bitdefender.com](mailto:dnutiu@bitdefender.com)>;
[matthias@ververica.com](mailto:matthias@ververica.com)
<[matthias@ververica.com](mailto:matthias@ververica.com)>;
[user@flink.apache.org](mailto:user@flink.apache.org)
<[user@flink.apache.org](mailto:user@flink.apache.org)>  
>  **Subject:** Re: Deploying Flink on Kubernetes with fractional CPU and
different limits and requests

>>>

>>>  
>>>

>>> Hi all,

>>>

>>>  
>

>>>

>>> I think it is a good improvement to support different resource requests
and limits. And it is very useful

>>>

>>> especially for the CPU resource since it heavily depends on the upstream
workloads.

>>>

>>>  
>

>>>

>>> Actually, we(alibaba) have introduced some internal config options to
support this feature. WDYT?

>>>  
>>>  
>>>     // The prefix of Kubernetes resource limit factor. It should not be
less than 1. The resource  
>>>     > // could be cpu, memory, ephemeral-storage and all other types
supported by Kubernetes.  
>>>     > public static final String
KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =  
>>>     >         "kubernetes.jobmanager.limit-factor.";  
>>>     > public static final String
KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =  
>>>     >         "kubernetes.taskmanager.limit-factor.";

>>>

>>>  
>

>>>

>>> BTW, we already have an old ticket for this feature[1].

>>>

>>>  
>

>>>

>>>  
>

>>>

>>> [1]. <https://issues.apache.org/jira/browse/FLINK-15648>

>>>

>>>  
>

>>>

>>> Best,

>>>

>>> Yang

>>>

>>>  
>

>>>

>>> Alexis Sarda-Espinosa <[alexis.sarda-
espinosa@microfocus.com](mailto:alexis.sarda-espinosa@microfocus.com)>
于2021年8月26日周四 下午10:04写道:  
>

>>>

>>>> I think it would be nice if the task manager pods get their values from
the configuration file only if the pod templates don’t specify any resources.
That was the goal of supporting pod templates, right? Allowing more custom
scenarios without letting the configuration options get bloated. __ __

>>>>

>>>> __  __

>>>>

>>>> Regards, __ __

>>>>

>>>> Alexis. __ __

>>>>

>>>> __  __

>>>>

>>>> **From:** Denis Cosmin NUTIU
<[dnutiu@bitdefender.com](mailto:dnutiu@bitdefender.com)>  
>  **Sent:** Donnerstag, 26. August 2021 15:55  
>  **To:** [matthias@ververica.com](mailto:matthias@ververica.com)  
>  **Cc:** [user@flink.apache.org](mailto:user@flink.apache.org);
[danrtsey.wy@gmail.com](mailto:danrtsey.wy@gmail.com)  
>  **Subject:** Re: Deploying Flink on Kubernetes with fractional CPU and
different limits and requests __ __

>>>>

>>>> __  __

>>>>

>>>> Hi Matthias, __ __

>>>>

>>>> __  __

>>>>

>>>> Thanks for getting back to me and for your time! __ __

>>>>

>>>> __  __

>>>>

>>>> We have some Flink jobs deployed on Kubernetes and running kubectl top
pod gives the following result: __ __

>>>>

>>>> __  __

>>>>

>>>> NAME
CPU(cores)   MEMORY(bytes)  
>  aa-78c8cb77d4-zlmpg                  8m           1410Mi  
>  aa-taskmanager-2-2                   32m          1066Mi  
>  bb-5f7b65f95c-jwb7t          7m           1445Mi  
>  bb-taskmanager-2-2           32m          1016Mi  
>  cc-54d967b55d-b567x       11m          514Mi  
>  cc-taskmanager-4-1        11m          496Mi  
>  dd-6fbc6b8666-krhlx   10m          535Mi  
>  dd-taskmanager-2-2    12m          522Mi  
>  xx-6845cf7986-p45lq     53m          526Mi  
>  xx-taskmanager-5-2      11m          507Mi __ __

>>>>

>>>> __  __

>>>>

>>>> During low workloads the jobs consume just about 100m CPU and during high
workloads the CPU consumption increases to 500m-1000m. Having the ability to
specify requests and limit separately would give us more deployment
flexibility. __ __

>>>>

>>>> __  __

>>>>

>>>> Sincerely, __ __

>>>>

>>>> Denis __ __

>>>>

>>>> __  __

>>>>

>>>> On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote: __ __

>>>>

>>>>> Hi Denis, ____

>>>>>

>>>>> I did a bit of digging: It looks like there is no way to specify them
independently. You can find documentation about pod templates for TaskManager
and JobManager [1]. But even there it states that for cpu and memory, the
resource specs are overwritten by the Flink configuration. The code also
reveals that limit and requests are set using the same value [2]. __ __

>>>>>

>>>>> __  __

>>>>>

>>>>> I'm going to pull Yang Wang into this thread. I'm wondering whether
there is a reason for that or whether it makes sense to create a Jira issue
introducing more specific configuration parameters for limit and requests. __
__

>>>>>

>>>>> __  __

>>>>>

>>>>> Best,  
>  Matthias __ __

>>>>>

>>>>> __  __

>>>>>

>>>>> [1] <https://ci.apache.org/projects/flink/flink-docs-
master/docs/deployment/resource-providers/native_kubernetes/#fields-
overwritten-by-flink> __ __

>>>>>

>>>>> [2]
<https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-
kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332>
__ __

>>>>>

>>>>> __  __

>>>>>

>>>>> On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU
<[dnutiu@bitdefender.com](mailto:dnutiu@bitdefender.com)> wrote: __ __

>>>>>

>>>>>> Hello,  
>  
>  I've developed a Flink job and I'm trying to deploy it on a Kubernetes  
>  cluster using Flink Native.  
>  
>  Setting kubernetes.taskmanager.cpu=0.5 and  
>  kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,  
>  which is correct, but I'd like to set the requests and limits to  
>  different values, something like:  
>  
>  resources:  
>    requests:  
>      memory: "1048Mi"  
>      cpu: "100m"  
>    limits:  
>      memory: "2096Mi"  
>      cpu: "1000m"  
>  
>  I've tried using pod templates from Flink 1.13 and manually patching  
>  the Kubernetes deployment file, the jobmanager gets spawned with the  
>  correct reousrce requests and limits but the taskmanagers get spawned  
>  with the defaults:  
>  
>  Limits:  
>        cpu:     1  
>        memory:  1728Mi  
>      Requests:  
>        cpu:     1  
>        memory:  1728Mi  
>  
>  Is there any way I could set the requests/limits for the CPU/Memory to  
>  different values when deploying Flink in Kubernetes? If not, would it  
>  make sense to request this as a feature?  
>  
>  Thanks in advance!  
>  
>  Denis __ __

>>>

>>>  
>


Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by Denis Cosmin NUTIU <dn...@bitdefender.com>.
Hi Yang,

I have limited Flink internals knowledge, but I can try to implement FLINK-15648 and open up a PR on GitHub or send the patch via email. How does that sound?
I'll sign the ICLA and switch to my personal address.

Sincerely,
Denis

On Wed, 2021-09-01 at 13:48 +0800, Yang Wang wrote:
Great. If no one wants to work on this ticket FLINK-15648, I will try to get this done in the next major release cycle(1.15).

Best,
Yang

Denis Cosmin NUTIU <dn...@bitdefender.com>> 于2021年8月31日周二 下午4:59写道:
Hi everyone,

Thanks for getting back to me!

>  I think it would be nice if the task manager pods get their values from the configuration file only if the pod templates don’t specify any resources. That was the goal of supporting pod templates, right? Allowing more custom scenarios without letting the configuration options get bloated.

I think that's correct. In the current behavior Flink will override the resources settings "The memory and cpu resources(including requests and limits) will be overwritten by Flink configuration options. All other resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the comments from FLINK-15648[2], I'm not sure that it can be done in a clean manner with pod templates.

> I think it is a good improvement to support different resource requests and limits. And it is very useful especially for the CPU resource since it heavily depends on the upstream workloads.

I agree with you! I have limited knowledge of Flink internals but the kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor seems to be the right way to do it.

[1] Native Kubernetes | Apache Flink<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template>
[2] [FLINK-15648] Support to configure limit for CPU and memory requirement - ASF JIRA (apache.org)<https://issues.apache.org/jira/browse/FLINK-15648>

________________________________
From: Yang Wang <da...@gmail.com>>
Sent: Tuesday, August 31, 2021 6:04 AM
To: Alexis Sarda-Espinosa <al...@microfocus.com>>
Cc: Denis Cosmin NUTIU <dn...@bitdefender.com>>; matthias@ververica.com<ma...@ververica.com> <ma...@ververica.com>>; user@flink.apache.org<ma...@flink.apache.org> <us...@flink.apache.org>>
Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Hi all,

I think it is a good improvement to support different resource requests and limits. And it is very useful
especially for the CPU resource since it heavily depends on the upstream workloads.

Actually, we(alibaba) have introduced some internal config options to support this feature. WDYT?

// The prefix of Kubernetes resource limit factor. It should not be less than 1. The resource
// could be cpu, memory, ephemeral-storage and all other types supported by Kubernetes.
public static final String KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
        "kubernetes.jobmanager.limit-factor.";
public static final String KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
        "kubernetes.taskmanager.limit-factor.";

BTW, we already have an old ticket for this feature[1].


[1]. https://issues.apache.org/jira/browse/FLINK-15648

Best,
Yang

Alexis Sarda-Espinosa <al...@microfocus.com>> 于2021年8月26日周四 下午10:04写道:

I think it would be nice if the task manager pods get their values from the configuration file only if the pod templates don’t specify any resources. That was the goal of supporting pod templates, right? Allowing more custom scenarios without letting the configuration options get bloated.



Regards,

Alexis.



From: Denis Cosmin NUTIU <dn...@bitdefender.com>>
Sent: Donnerstag, 26. August 2021 15:55
To: matthias@ververica.com<ma...@ververica.com>
Cc: user@flink.apache.org<ma...@flink.apache.org>; danrtsey.wy@gmail.com<ma...@gmail.com>
Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests



Hi Matthias,



Thanks for getting back to me and for your time!



We have some Flink jobs deployed on Kubernetes and running kubectl top pod gives the following result:



NAME                                                            CPU(cores)   MEMORY(bytes)
aa-78c8cb77d4-zlmpg                  8m           1410Mi
aa-taskmanager-2-2                   32m          1066Mi
bb-5f7b65f95c-jwb7t          7m           1445Mi
bb-taskmanager-2-2           32m          1016Mi
cc-54d967b55d-b567x       11m          514Mi
cc-taskmanager-4-1        11m          496Mi
dd-6fbc6b8666-krhlx   10m          535Mi
dd-taskmanager-2-2    12m          522Mi
xx-6845cf7986-p45lq     53m          526Mi
xx-taskmanager-5-2      11m          507Mi



During low workloads the jobs consume just about 100m CPU and during high workloads the CPU consumption increases to 500m-1000m. Having the ability to specify requests and limit separately would give us more deployment flexibility.



Sincerely,

Denis



On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:

Hi Denis,

I did a bit of digging: It looks like there is no way to specify them independently. You can find documentation about pod templates for TaskManager and JobManager [1]. But even there it states that for cpu and memory, the resource specs are overwritten by the Flink configuration. The code also reveals that limit and requests are set using the same value [2].



I'm going to pull Yang Wang into this thread. I'm wondering whether there is a reason for that or whether it makes sense to create a Jira issue introducing more specific configuration parameters for limit and requests.



Best,
Matthias



[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink

[2] https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332



On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <dn...@bitdefender.com>> wrote:

Hello,

I've developed a Flink job and I'm trying to deploy it on a Kubernetes
cluster using Flink Native.

Setting kubernetes.taskmanager.cpu=0.5 and
kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
which is correct, but I'd like to set the requests and limits to
different values, something like:

resources:
  requests:
    memory: "1048Mi"
    cpu: "100m"
  limits:
    memory: "2096Mi"
    cpu: "1000m"

I've tried using pod templates from Flink 1.13 and manually patching
the Kubernetes deployment file, the jobmanager gets spawned with the
correct reousrce requests and limits but the taskmanagers get spawned
with the defaults:

Limits:
      cpu:     1
      memory:  1728Mi
    Requests:
      cpu:     1
      memory:  1728Mi

Is there any way I could set the requests/limits for the CPU/Memory to
different values when deploying Flink in Kubernetes? If not, would it
make sense to request this as a feature?

Thanks in advance!

Denis


Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by Yang Wang <da...@gmail.com>.
Great. If no one wants to work on this ticket FLINK-15648, I will try to
get this done in the next major release cycle(1.15).

Best,
Yang

Denis Cosmin NUTIU <dn...@bitdefender.com> 于2021年8月31日周二 下午4:59写道:

> Hi everyone,
>
> Thanks for getting back to me!
>
> >  I think it would be nice if the task manager pods get their values from
> the configuration file only if the pod templates don’t specify any
> resources. That was the goal of supporting pod templates, right? Allowing
> more custom scenarios without letting the configuration options get bloated.
>
> I think that's correct. In the current behavior Flink will override the
> resources settings "The memory and cpu resources(including requests and
> limits) will be overwritten by Flink configuration options. All other
> resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the
> comments from FLINK-15648[2], I'm not sure that it can be done in a clean
> manner with pod templates.
>
> > I think it is a good improvement to support different resource requests
> and limits. And it is very useful especially for the CPU resource since
> it heavily depends on the upstream workloads.
>
> I agree with you! I have limited knowledge of Flink internals but the
> kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor
> seems to be the right way to do it.
>
> [1] Native Kubernetes | Apache Flink
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template>
> [2] [FLINK-15648] Support to configure limit for CPU and memory
> requirement - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/FLINK-15648>
>
> ------------------------------
> *From:* Yang Wang <da...@gmail.com>
> *Sent:* Tuesday, August 31, 2021 6:04 AM
> *To:* Alexis Sarda-Espinosa <al...@microfocus.com>
> *Cc:* Denis Cosmin NUTIU <dn...@bitdefender.com>; matthias@ververica.com
> <ma...@ververica.com>; user@flink.apache.org <us...@flink.apache.org>
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
> Hi all,
>
> I think it is a good improvement to support different resource requests
> and limits. And it is very useful
> especially for the CPU resource since it heavily depends on the upstream
> workloads.
>
> Actually, we(alibaba) have introduced some internal config options to
> support this feature. WDYT?
>
> // The prefix of Kubernetes resource limit factor. It should not be less than 1. The resource
> // could be cpu, memory, ephemeral-storage and all other types supported by Kubernetes.
> public static final String KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
>         "kubernetes.jobmanager.limit-factor.";
> public static final String KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
>         "kubernetes.taskmanager.limit-factor.";
>
>
> BTW, we already have an old ticket for this feature[1].
>
>
> [1]. https://issues.apache.org/jira/browse/FLINK-15648
>
> Best,
> Yang
>
> Alexis Sarda-Espinosa <al...@microfocus.com>
> 于2021年8月26日周四 下午10:04写道:
>
> I think it would be nice if the task manager pods get their values from
> the configuration file only if the pod templates don’t specify any
> resources. That was the goal of supporting pod templates, right? Allowing
> more custom scenarios without letting the configuration options get bloated.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Denis Cosmin NUTIU <dn...@bitdefender.com>
> *Sent:* Donnerstag, 26. August 2021 15:55
> *To:* matthias@ververica.com
> *Cc:* user@flink.apache.org; danrtsey.wy@gmail.com
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi Matthias,
>
>
>
> Thanks for getting back to me and for your time!
>
>
>
> We have some Flink jobs deployed on Kubernetes and running kubectl top pod
> gives the following result:
>
>
>
>
> NAME                                                            CPU(cores)   MEMORY(bytes)
> aa-78c8cb77d4-zlmpg                  8m           1410Mi
> aa-taskmanager-2-2                   32m          1066Mi
> bb-5f7b65f95c-jwb7t          7m           1445Mi
> bb-taskmanager-2-2           32m          1016Mi
> cc-54d967b55d-b567x       11m          514Mi
> cc-taskmanager-4-1        11m          496Mi
> dd-6fbc6b8666-krhlx   10m          535Mi
> dd-taskmanager-2-2    12m          522Mi
> xx-6845cf7986-p45lq     53m          526Mi
> xx-taskmanager-5-2      11m          507Mi
>
>
>
> During low workloads the jobs consume just about 100m CPU and during high
> workloads the CPU consumption increases to 500m-1000m. Having the ability
> to specify requests and limit separately would give us more deployment
> flexibility.
>
>
>
> Sincerely,
>
> Denis
>
>
>
> On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:
>
> Hi Denis,
>
> I did a bit of digging: It looks like there is no way to specify them
> independently. You can find documentation about pod templates for
> TaskManager and JobManager [1]. But even there it states that for cpu and
> memory, the resource specs are overwritten by the Flink configuration. The
> code also reveals that limit and requests are set using the same value [2].
>
>
>
> I'm going to pull Yang Wang into this thread. I'm wondering whether there
> is a reason for that or whether it makes sense to create a Jira issue
> introducing more specific configuration parameters for limit and requests.
>
>
>
> Best,
> Matthias
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink
>
> [2]
> https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332
>
>
>
> On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <
> dnutiu@bitdefender.com> wrote:
>
> Hello,
>
> I've developed a Flink job and I'm trying to deploy it on a Kubernetes
> cluster using Flink Native.
>
> Setting kubernetes.taskmanager.cpu=0.5 and
> kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
> which is correct, but I'd like to set the requests and limits to
> different values, something like:
>
> resources:
>   requests:
>     memory: "1048Mi"
>     cpu: "100m"
>   limits:
>     memory: "2096Mi"
>     cpu: "1000m"
>
> I've tried using pod templates from Flink 1.13 and manually patching
> the Kubernetes deployment file, the jobmanager gets spawned with the
> correct reousrce requests and limits but the taskmanagers get spawned
> with the defaults:
>
> Limits:
>       cpu:     1
>       memory:  1728Mi
>     Requests:
>       cpu:     1
>       memory:  1728Mi
>
> Is there any way I could set the requests/limits for the CPU/Memory to
> different values when deploying Flink in Kubernetes? If not, would it
> make sense to request this as a feature?
>
> Thanks in advance!
>
> Denis
>
>

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by Denis Cosmin NUTIU <dn...@bitdefender.com>.
Hi everyone,

Thanks for getting back to me!

>  I think it would be nice if the task manager pods get their values from the configuration file only if the pod templates don’t specify any resources. That was the goal of supporting pod templates, right? Allowing more custom scenarios without letting the configuration options get bloated.

I think that's correct. In the current behavior Flink will override the resources settings "The memory and cpu resources(including requests and limits) will be overwritten by Flink configuration options. All other resources(e.g. ephemeral-storage) will be retained.'[1]. After reading the comments from FLINK-15648[2], I'm not sure that it can be done in a clean manner with pod templates.

> I think it is a good improvement to support different resource requests and limits. And it is very useful especially for the CPU resource since it heavily depends on the upstream workloads.

I agree with you! I have limited knowledge of Flink internals but the kubernetes.jobmanager.limit-factor and kubernetes.taskmanager.limit-factor seems to be the right way to do it.

[1] Native Kubernetes | Apache Flink<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template>
[2] [FLINK-15648] Support to configure limit for CPU and memory requirement - ASF JIRA (apache.org)<https://issues.apache.org/jira/browse/FLINK-15648>

________________________________
From: Yang Wang <da...@gmail.com>
Sent: Tuesday, August 31, 2021 6:04 AM
To: Alexis Sarda-Espinosa <al...@microfocus.com>
Cc: Denis Cosmin NUTIU <dn...@bitdefender.com>; matthias@ververica.com <ma...@ververica.com>; user@flink.apache.org <us...@flink.apache.org>
Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Hi all,

I think it is a good improvement to support different resource requests and limits. And it is very useful
especially for the CPU resource since it heavily depends on the upstream workloads.

Actually, we(alibaba) have introduced some internal config options to support this feature. WDYT?

// The prefix of Kubernetes resource limit factor. It should not be less than 1. The resource
// could be cpu, memory, ephemeral-storage and all other types supported by Kubernetes.
public static final String KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
        "kubernetes.jobmanager.limit-factor.";
public static final String KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
        "kubernetes.taskmanager.limit-factor.";

BTW, we already have an old ticket for this feature[1].


[1]. https://issues.apache.org/jira/browse/FLINK-15648

Best,
Yang

Alexis Sarda-Espinosa <al...@microfocus.com>> 于2021年8月26日周四 下午10:04写道:

I think it would be nice if the task manager pods get their values from the configuration file only if the pod templates don’t specify any resources. That was the goal of supporting pod templates, right? Allowing more custom scenarios without letting the configuration options get bloated.



Regards,

Alexis.



From: Denis Cosmin NUTIU <dn...@bitdefender.com>>
Sent: Donnerstag, 26. August 2021 15:55
To: matthias@ververica.com<ma...@ververica.com>
Cc: user@flink.apache.org<ma...@flink.apache.org>; danrtsey.wy@gmail.com<ma...@gmail.com>
Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests



Hi Matthias,



Thanks for getting back to me and for your time!



We have some Flink jobs deployed on Kubernetes and running kubectl top pod gives the following result:



NAME                                                            CPU(cores)   MEMORY(bytes)
aa-78c8cb77d4-zlmpg                  8m           1410Mi
aa-taskmanager-2-2                   32m          1066Mi
bb-5f7b65f95c-jwb7t          7m           1445Mi
bb-taskmanager-2-2           32m          1016Mi
cc-54d967b55d-b567x       11m          514Mi
cc-taskmanager-4-1        11m          496Mi
dd-6fbc6b8666-krhlx   10m          535Mi
dd-taskmanager-2-2    12m          522Mi
xx-6845cf7986-p45lq     53m          526Mi
xx-taskmanager-5-2      11m          507Mi



During low workloads the jobs consume just about 100m CPU and during high workloads the CPU consumption increases to 500m-1000m. Having the ability to specify requests and limit separately would give us more deployment flexibility.



Sincerely,

Denis



On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:

Hi Denis,

I did a bit of digging: It looks like there is no way to specify them independently. You can find documentation about pod templates for TaskManager and JobManager [1]. But even there it states that for cpu and memory, the resource specs are overwritten by the Flink configuration. The code also reveals that limit and requests are set using the same value [2].



I'm going to pull Yang Wang into this thread. I'm wondering whether there is a reason for that or whether it makes sense to create a Jira issue introducing more specific configuration parameters for limit and requests.



Best,
Matthias



[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink

[2] https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332



On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <dn...@bitdefender.com>> wrote:

Hello,

I've developed a Flink job and I'm trying to deploy it on a Kubernetes
cluster using Flink Native.

Setting kubernetes.taskmanager.cpu=0.5 and
kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
which is correct, but I'd like to set the requests and limits to
different values, something like:

resources:
  requests:
    memory: "1048Mi"
    cpu: "100m"
  limits:
    memory: "2096Mi"
    cpu: "1000m"

I've tried using pod templates from Flink 1.13 and manually patching
the Kubernetes deployment file, the jobmanager gets spawned with the
correct reousrce requests and limits but the taskmanagers get spawned
with the defaults:

Limits:
      cpu:     1
      memory:  1728Mi
    Requests:
      cpu:     1
      memory:  1728Mi

Is there any way I could set the requests/limits for the CPU/Memory to
different values when deploying Flink in Kubernetes? If not, would it
make sense to request this as a feature?

Thanks in advance!

Denis

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by Yang Wang <da...@gmail.com>.
Hi all,

I think it is a good improvement to support different resource requests and
limits. And it is very useful
especially for the CPU resource since it heavily depends on the upstream
workloads.

Actually, we(alibaba) have introduced some internal config options to
support this feature. WDYT?

// The prefix of Kubernetes resource limit factor. It should not be
less than 1. The resource
// could be cpu, memory, ephemeral-storage and all other types
supported by Kubernetes.
public static final String KUBERNETES_JOBMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
        "kubernetes.jobmanager.limit-factor.";
public static final String KUBERNETES_TASKMANAGER_RESOURCE_LIMIT_FACTOR_PREFIX =
        "kubernetes.taskmanager.limit-factor.";


BTW, we already have an old ticket for this feature[1].


[1]. https://issues.apache.org/jira/browse/FLINK-15648

Best,
Yang

Alexis Sarda-Espinosa <al...@microfocus.com> 于2021年8月26日周四
下午10:04写道:

> I think it would be nice if the task manager pods get their values from
> the configuration file only if the pod templates don’t specify any
> resources. That was the goal of supporting pod templates, right? Allowing
> more custom scenarios without letting the configuration options get bloated.
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Denis Cosmin NUTIU <dn...@bitdefender.com>
> *Sent:* Donnerstag, 26. August 2021 15:55
> *To:* matthias@ververica.com
> *Cc:* user@flink.apache.org; danrtsey.wy@gmail.com
> *Subject:* Re: Deploying Flink on Kubernetes with fractional CPU and
> different limits and requests
>
>
>
> Hi Matthias,
>
>
>
> Thanks for getting back to me and for your time!
>
>
>
> We have some Flink jobs deployed on Kubernetes and running kubectl top pod
> gives the following result:
>
>
>
>
> NAME                                                            CPU(cores)   MEMORY(bytes)
> aa-78c8cb77d4-zlmpg                  8m           1410Mi
> aa-taskmanager-2-2                   32m          1066Mi
> bb-5f7b65f95c-jwb7t          7m           1445Mi
> bb-taskmanager-2-2           32m          1016Mi
> cc-54d967b55d-b567x       11m          514Mi
> cc-taskmanager-4-1        11m          496Mi
> dd-6fbc6b8666-krhlx   10m          535Mi
> dd-taskmanager-2-2    12m          522Mi
> xx-6845cf7986-p45lq     53m          526Mi
> xx-taskmanager-5-2      11m          507Mi
>
>
>
> During low workloads the jobs consume just about 100m CPU and during high
> workloads the CPU consumption increases to 500m-1000m. Having the ability
> to specify requests and limit separately would give us more deployment
> flexibility.
>
>
>
> Sincerely,
>
> Denis
>
>
>
> On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:
>
> *CAUTION:* This email originated from outside of our organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
> Hi Denis,
>
> I did a bit of digging: It looks like there is no way to specify them
> independently. You can find documentation about pod templates for
> TaskManager and JobManager [1]. But even there it states that for cpu and
> memory, the resource specs are overwritten by the Flink configuration. The
> code also reveals that limit and requests are set using the same value [2].
>
>
>
> I'm going to pull Yang Wang into this thread. I'm wondering whether there
> is a reason for that or whether it makes sense to create a Jira issue
> introducing more specific configuration parameters for limit and requests.
>
>
>
> Best,
> Matthias
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink
>
> [2]
> https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332
>
>
>
> On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <
> dnutiu@bitdefender.com> wrote:
>
> Hello,
>
> I've developed a Flink job and I'm trying to deploy it on a Kubernetes
> cluster using Flink Native.
>
> Setting kubernetes.taskmanager.cpu=0.5 and
> kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
> which is correct, but I'd like to set the requests and limits to
> different values, something like:
>
> resources:
>   requests:
>     memory: "1048Mi"
>     cpu: "100m"
>   limits:
>     memory: "2096Mi"
>     cpu: "1000m"
>
> I've tried using pod templates from Flink 1.13 and manually patching
> the Kubernetes deployment file, the jobmanager gets spawned with the
> correct reousrce requests and limits but the taskmanagers get spawned
> with the defaults:
>
> Limits:
>       cpu:     1
>       memory:  1728Mi
>     Requests:
>       cpu:     1
>       memory:  1728Mi
>
> Is there any way I could set the requests/limits for the CPU/Memory to
> different values when deploying Flink in Kubernetes? If not, would it
> make sense to request this as a feature?
>
> Thanks in advance!
>
> Denis
>
>

RE: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by Alexis Sarda-Espinosa <al...@microfocus.com>.
I think it would be nice if the task manager pods get their values from the configuration file only if the pod templates don’t specify any resources. That was the goal of supporting pod templates, right? Allowing more custom scenarios without letting the configuration options get bloated.

Regards,
Alexis.

From: Denis Cosmin NUTIU <dn...@bitdefender.com>
Sent: Donnerstag, 26. August 2021 15:55
To: matthias@ververica.com
Cc: user@flink.apache.org; danrtsey.wy@gmail.com
Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Hi Matthias,

Thanks for getting back to me and for your time!

We have some Flink jobs deployed on Kubernetes and running kubectl top pod gives the following result:

NAME                                                            CPU(cores)   MEMORY(bytes)
aa-78c8cb77d4-zlmpg                  8m           1410Mi
aa-taskmanager-2-2                   32m          1066Mi
bb-5f7b65f95c-jwb7t          7m           1445Mi
bb-taskmanager-2-2           32m          1016Mi
cc-54d967b55d-b567x       11m          514Mi
cc-taskmanager-4-1        11m          496Mi
dd-6fbc6b8666-krhlx   10m          535Mi
dd-taskmanager-2-2    12m          522Mi
xx-6845cf7986-p45lq     53m          526Mi
xx-taskmanager-5-2      11m          507Mi

During low workloads the jobs consume just about 100m CPU and during high workloads the CPU consumption increases to 500m-1000m. Having the ability to specify requests and limit separately would give us more deployment flexibility.

Sincerely,
Denis

On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:

CAUTION: This email originated from outside of our organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Denis,
I did a bit of digging: It looks like there is no way to specify them independently. You can find documentation about pod templates for TaskManager and JobManager [1]. But even there it states that for cpu and memory, the resource specs are overwritten by the Flink configuration. The code also reveals that limit and requests are set using the same value [2].

I'm going to pull Yang Wang into this thread. I'm wondering whether there is a reason for that or whether it makes sense to create a Jira issue introducing more specific configuration parameters for limit and requests.

Best,
Matthias

[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink
[2] https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332

On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <dn...@bitdefender.com>> wrote:
Hello,

I've developed a Flink job and I'm trying to deploy it on a Kubernetes
cluster using Flink Native.

Setting kubernetes.taskmanager.cpu=0.5 and
kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
which is correct, but I'd like to set the requests and limits to
different values, something like:

resources:
  requests:
    memory: "1048Mi"
    cpu: "100m"
  limits:
    memory: "2096Mi"
    cpu: "1000m"

I've tried using pod templates from Flink 1.13 and manually patching
the Kubernetes deployment file, the jobmanager gets spawned with the
correct reousrce requests and limits but the taskmanagers get spawned
with the defaults:

Limits:
      cpu:     1
      memory:  1728Mi
    Requests:
      cpu:     1
      memory:  1728Mi

Is there any way I could set the requests/limits for the CPU/Memory to
different values when deploying Flink in Kubernetes? If not, would it
make sense to request this as a feature?

Thanks in advance!

Denis

Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by Denis Cosmin NUTIU <dn...@bitdefender.com>.
Hi Matthias,

Thanks for getting back to me and for your time!

We have some Flink jobs deployed on Kubernetes and running kubectl top pod gives the following result:

NAME                                                            CPU(cores)   MEMORY(bytes)
aa-78c8cb77d4-zlmpg                  8m           1410Mi
aa-taskmanager-2-2                   32m          1066Mi
bb-5f7b65f95c-jwb7t          7m           1445Mi
bb-taskmanager-2-2           32m          1016Mi
cc-54d967b55d-b567x       11m          514Mi
cc-taskmanager-4-1        11m          496Mi
dd-6fbc6b8666-krhlx   10m          535Mi
dd-taskmanager-2-2    12m          522Mi
xx-6845cf7986-p45lq     53m          526Mi
xx-taskmanager-5-2      11m          507Mi

During low workloads the jobs consume just about 100m CPU and during high workloads the CPU consumption increases to 500m-1000m. Having the ability to specify requests and limit separately would give us more deployment flexibility.

Sincerely,
Denis

On Thu, 2021-08-26 at 14:22 +0200, Matthias Pohl wrote:

CAUTION: This email originated from outside of our organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Denis,
I did a bit of digging: It looks like there is no way to specify them independently. You can find documentation about pod templates for TaskManager and JobManager [1]. But even there it states that for cpu and memory, the resource specs are overwritten by the Flink configuration. The code also reveals that limit and requests are set using the same value [2].

I'm going to pull Yang Wang into this thread. I'm wondering whether there is a reason for that or whether it makes sense to create a Jira issue introducing more specific configuration parameters for limit and requests.

Best,
Matthias

[1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink
[2] https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332

On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <dn...@bitdefender.com>> wrote:
Hello,

I've developed a Flink job and I'm trying to deploy it on a Kubernetes
cluster using Flink Native.

Setting kubernetes.taskmanager.cpu=0.5 and
kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
which is correct, but I'd like to set the requests and limits to
different values, something like:

resources:
  requests:
    memory: "1048Mi"
    cpu: "100m"
  limits:
    memory: "2096Mi"
    cpu: "1000m"

I've tried using pod templates from Flink 1.13 and manually patching
the Kubernetes deployment file, the jobmanager gets spawned with the
correct reousrce requests and limits but the taskmanagers get spawned
with the defaults:

Limits:
      cpu:     1
      memory:  1728Mi
    Requests:
      cpu:     1
      memory:  1728Mi

Is there any way I could set the requests/limits for the CPU/Memory to
different values when deploying Flink in Kubernetes? If not, would it
make sense to request this as a feature?

Thanks in advance!

Denis


Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

Posted by Matthias Pohl <ma...@ververica.com>.
Hi Denis,
I did a bit of digging: It looks like there is no way to specify them
independently. You can find documentation about pod templates for
TaskManager and JobManager [1]. But even there it states that for cpu and
memory, the resource specs are overwritten by the Flink configuration. The
code also reveals that limit and requests are set using the same value [2].

I'm going to pull Yang Wang into this thread. I'm wondering whether there
is a reason for that or whether it makes sense to create a Jira issue
introducing more specific configuration parameters for limit and requests.

Best,
Matthias

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#fields-overwritten-by-flink
[2]
https://github.com/apache/flink/blob/f64261c91b195ecdcd99975b51de540db89a3f48/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesUtils.java#L324-L332

On Thu, Aug 26, 2021 at 11:17 AM Denis Cosmin NUTIU <dn...@bitdefender.com>
wrote:

> Hello,
>
> I've developed a Flink job and I'm trying to deploy it on a Kubernetes
> cluster using Flink Native.
>
> Setting kubernetes.taskmanager.cpu=0.5 and
> kubernetes.jobmanager.cpu=0.5 sets the requests and limits to 500m,
> which is correct, but I'd like to set the requests and limits to
> different values, something like:
>
> resources:
>   requests:
>     memory: "1048Mi"
>     cpu: "100m"
>   limits:
>     memory: "2096Mi"
>     cpu: "1000m"
>
> I've tried using pod templates from Flink 1.13 and manually patching
> the Kubernetes deployment file, the jobmanager gets spawned with the
> correct reousrce requests and limits but the taskmanagers get spawned
> with the defaults:
>
> Limits:
>       cpu:     1
>       memory:  1728Mi
>     Requests:
>       cpu:     1
>       memory:  1728Mi
>
> Is there any way I could set the requests/limits for the CPU/Memory to
> different values when deploying Flink in Kubernetes? If not, would it
> make sense to request this as a feature?
>
> Thanks in advance!
>
> Denis
>