You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Canbin Zheng (Jira)" <ji...@apache.org> on 2020/05/08 08:21:00 UTC

[jira] [Comment Edited] (FLINK-17566) Fix potential K8s resources leak after JobManager finishes in Applicaion mode

    [ https://issues.apache.org/jira/browse/FLINK-17566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102365#comment-17102365 ] 

Canbin Zheng edited comment on FLINK-17566 at 5/8/20, 8:20 AM:
---------------------------------------------------------------

Not exactly. Version 4.5.2 doesn't send one API request when deleting a Deployment since it introduces the so-called {{Reaper}} which I think is for the old Kubernetes version. You can refer to class of {{DeploymentOperationsImpl}} at v4.5.2.

And I find a new PR for removing the Reaper: https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/apps/v1/DeploymentOperationsImpl.java. That's a good change that makes deletion one single request to the K8s API Server now.

So we should bump the fabric8 version then check if the problem still exists.



was (Author: felixzheng):
Not exactly. Version 4.5.2 doesn't send one API request when deleting a Deployment since it introduces the so-called {{Reaper}} which I think is for the old Kubernetes version. You can refer to class of {{DeploymentOperationsImpl}} at v4.5.2.

And I find a new PR for removing the Reaper which is what we need: https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/apps/v1/DeploymentOperationsImpl.java

So we should bump the fabric8 version then check if the problem still exists.


> Fix potential K8s resources leak after JobManager finishes in Applicaion mode
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-17566
>                 URL: https://issues.apache.org/jira/browse/FLINK-17566
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>            Reporter: Canbin Zheng
>            Priority: Major
>
> FLINK-10934 introduces applicaion mode support in the native K8s setups., but as the discussion in [https://github.com/apache/flink/pull/12003|https://github.com/apache/flink/pull/12003,], there's large probability that all the K8s resources leak after the JobManager finishes except that the replica of Deployment is scaled down to 0. We need to find out the root cause and fix it.
> This may be related to the way fabric8 SDK deletes a Deployment. It splits the procedure into three steps as follows:
>  # Scales down the replica to 0
>  # Wait until the scaling down succeed
>  # Delete the ReplicaSet
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)