You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Tang (Jira)" <ji...@apache.org> on 2022/07/22 09:29:00 UTC

[jira] [Commented] (FLINK-24819) Higher APIServer cpu load after using SharedIndexInformer replaced naked Kubernetes watch

    [ https://issues.apache.org/jira/browse/FLINK-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569899#comment-17569899 ] 

Yun Tang commented on FLINK-24819:
----------------------------------

Is this a real bug? I found this ticket has been stale for more than half a year.

> Higher APIServer cpu load after using SharedIndexInformer replaced naked Kubernetes watch
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-24819
>                 URL: https://issues.apache.org/jira/browse/FLINK-24819
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.14.0
>            Reporter: Yang Wang
>            Priority: Major
>
> In FLINK-22054, Flink has used a shared informer for ConfigMap to replace the naked K8s watch. After then, each Flink JVM process(JM/TM) only needs one connection to APIServer for ConfigMap watching. It aims to reduce the network pressure on K8s APIServer.
>  
> However, in our recent tests, we found that the CPU and memory cost of APIServer have been doubled while running same Flink workloads. After digging more details in the K8s, I think the root cause might be that ETCD does not have indexes for labels. It means APIServer need to pull all the events from ETCD for each watch and then filter with specified labels(e.g. app=xxx,type=flink-native-kubernetes,configmap-type=high-availability) internally. Before FLINK-22054, we started a dedicated connection for each ConfigMap watching. And it seems that APIServer only need to pull the events for the specified ConfigMap name.
>  
> Watch URL example(Before):
> [https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?metadata.name=job-009d4f51-ca02-4793-a49b-a3344538719b-resourcemanager-leader&watch=true|https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability&resourceVersion=1153687321&watch=true]
>  
> Watch URL example(After):
> [https://kubernetes.default:6443/api/v1/namespaces/vvp-workload/configmaps?labelSelector=app%3Dk8s-ha-app-1-1636077491-23461%2Ctype%3Dflink-native-kubernetes%2Cconfigmap-type%3Dhigh-availability&resourceVersion=1153687321&watch=true]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)