You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/10/25 05:44:52 UTC

[GitHub] [flink] Aitozi opened a new pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Aitozi opened a new pull request #17554:
URL: https://github.com/apache/flink/pull/17554


   ## What is the purpose of the change
   
   Make the exception try-catch cover the whole `deploySessionCluster` and `deployApplicationCluster` to make sure the kubernetes objects will be cleaned up after submit failed.
   
   
   ## Verifying this change
   
   This change is already covered by existing tests.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909",
       "triggerID" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae6c6bc42146b7a72215b4ceda8108fdc21b41f6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26972",
       "triggerID" : "ae6c6bc42146b7a72215b4ceda8108fdc21b41f6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909) 
   * ae6c6bc42146b7a72215b4ceda8108fdc21b41f6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26972) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909",
       "triggerID" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a6445f822c2261c2fe578845ebd93c333f8c689 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793) 
   * cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] cc13ny commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
cc13ny commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r755719881



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       @Aitozi I think that's the key part of this change except other code re-organization. But I don't understand the PR title for the following
   
   - Is failing to get the cluster client **same as** failing to start that k8s cluster?
   - Which code to kill the cluster **OR** just failing to deploy means killing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi closed pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi closed pull request #17554:
URL: https://github.com/apache/flink/pull/17554


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909",
       "triggerID" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] wangyang0918 commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r752813610



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -205,17 +211,23 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
             Preconditions.checkArgument(pipelineJars.size() == 1, "Should only have one jar");
         }
 
-        final ClusterClientProvider<String> clusterClientProvider =
-                deployClusterInternal(
-                        KubernetesApplicationClusterEntrypoint.class.getName(),
-                        clusterSpecification,
-                        false);
+        ClusterClientProvider<String> clusterClientProvider;
+        try {
+            clusterClientProvider =
+                    deployClusterInternal(
+                            KubernetesApplicationClusterEntrypoint.class.getName(),
+                            clusterSpecification,
+                            false);
 
-        try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
-            LOG.info(
-                    "Create flink application cluster {} successfully, JobManager Web Interface: {}",
-                    clusterId,
-                    clusterClient.getWebInterfaceURL());
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
+                LOG.info(
+                        "Create flink application cluster {} successfully, JobManager Web Interface: {}",
+                        clusterId,
+                        clusterClient.getWebInterfaceURL());
+            }
+        } catch (Exception e) {

Review comment:
       I am curious whether we could wrap the `try...catch {// clean up resources}` in a separate method. Just like following. WDYT?
   
   ```
       private <T> ClusterClientProvider<T> safelyDeployCluster(
               SupplierWithException<ClusterClientProvider<T>, Exception> supplier)
               throws ClusterDeploymentException {
           try {
               return supplier.get();
           } catch (Exception e) {
               try {
                   LOG.warn(
                           "Failed to create the Kubernetes cluster \"{}\", try to clean up the residual resources.",
                           clusterId);
                   client.stopAndCleanupCluster(clusterId);
               } catch (Exception ex) {
                   LOG.warn(
                           "Failed to stop and clean up the Kubernetes cluster \"{}\".", clusterId, e);
               }
               throw new ClusterDeploymentException(e);
           }
       }
   ```

##########
File path: flink-kubernetes/src/test/java/org/apache/flink/kubernetes/KubernetesClusterDescriptorTest.java
##########
@@ -131,16 +131,14 @@ public void testKillCluster() throws Exception {
     }
 
     @Test
-    public void testDeployApplicationCluster() {
+    public void testDeployApplicationCluster() throws ClusterDeploymentException {
         flinkConfig.set(
                 PipelineOptions.JARS, Collections.singletonList("local:///path/of/user.jar"));
         flinkConfig.set(DeploymentOptions.TARGET, KubernetesDeploymentTarget.APPLICATION.getName());
-        try {
-            descriptor.deployApplicationCluster(clusterSpecification, appConfig);
-        } catch (Exception ignored) {
-        }
 
-        mockExpectedServiceFromServerSide(loadBalancerSvc);
+        mockFirstEmptyFollowByExpectedServiceFromServerSide(new Service(), loadBalancerSvc);

Review comment:
       I like this change. Great.

##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,39 +268,35 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private void killClusterSilently(Throwable throwable) {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+            LOG.warn(
+                    "Failed to create the Kubernetes cluster \"{}\", try to clean up the residual resources.",
+                    clusterId,
+                    throwable);

Review comment:
       Do we need to log the throwable here? We might have duplicated exception stack since we also throw a new `ClusterDeploymentException`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r753626315



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -205,17 +211,23 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
             Preconditions.checkArgument(pipelineJars.size() == 1, "Should only have one jar");
         }
 
-        final ClusterClientProvider<String> clusterClientProvider =
-                deployClusterInternal(
-                        KubernetesApplicationClusterEntrypoint.class.getName(),
-                        clusterSpecification,
-                        false);
+        ClusterClientProvider<String> clusterClientProvider;
+        try {
+            clusterClientProvider =
+                    deployClusterInternal(
+                            KubernetesApplicationClusterEntrypoint.class.getName(),
+                            clusterSpecification,
+                            false);
 
-        try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
-            LOG.info(
-                    "Create flink application cluster {} successfully, JobManager Web Interface: {}",
-                    clusterId,
-                    clusterClient.getWebInterfaceURL());
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
+                LOG.info(
+                        "Create flink application cluster {} successfully, JobManager Web Interface: {}",
+                        clusterId,
+                        clusterClient.getWebInterfaceURL());
+            }
+        } catch (Exception e) {

Review comment:
       Good suggestion 👍




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a6445f822c2261c2fe578845ebd93c333f8c689 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r753626375



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,39 +268,35 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private void killClusterSilently(Throwable throwable) {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+            LOG.warn(
+                    "Failed to create the Kubernetes cluster \"{}\", try to clean up the residual resources.",
+                    clusterId,
+                    throwable);

Review comment:
       removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] cc13ny commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
cc13ny commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r755719881



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       I think that's the key part of this change except other code re-organization. But I don't understand the PR title for the following
   
   - Is failing to get the cluster client same as failing to start that k8s cluster?
   - Which code to kill the cluster --> just failing to deploy means killing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f51e8519e195627c7fe098b3f5332bae905c2a0f Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r754923775



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,39 +247,50 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
+                LOG.info(
+                        "Create flink cluster {} successfully, JobManager Web Interface: {}",
+                        clusterId,
+                        clusterClient.getWebInterfaceURL());
+            }
+            return clusterClientProvider;
         } catch (Exception e) {
             try {
-                LOG.warn(

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] cc13ny commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
cc13ny commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r756310567



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       > Because the Flink cluster might be running normally.
   
   I think the same. It turns out to be, if cleaning up the K8s resources is the best strategy or if it's possible to retry and re-create a client with the existing resource? Cleaning up is definitely the simplest way. It depends on how much and in which failing situations we would like to handle (e.g. re-create the client).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550557


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 77e1b9059149c986fc2e35516ce33e3309b61cad (Mon Oct 25 05:47:52 UTC 2021)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
    * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-24624).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-973793867


   @wangyang0918 Thanks for your comments, I will address it this weekend .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r755729487



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       1. I think it means the failure when get cluster client failed, during the phase of deploy cluster. 
   2. Failing to deploy means killing.
   
   As I mentioned in [issue ](https://issues.apache.org/jira/browse/FLINK-24624?focusedCommentId=17434081&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17434081), This PR only solve the problem in case1. In session/application mode, we seems can not  totally ensure there are no left resource, because the deploy process is asynchronous, it may need to handle by the client. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r754923390



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,39 +247,50 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
+                LOG.info(
+                        "Create flink cluster {} successfully, JobManager Web Interface: {}",
+                        clusterId,
+                        clusterClient.getWebInterfaceURL());
+            }
+            return clusterClientProvider;
         } catch (Exception e) {
             try {
-                LOG.warn(
-                        "Failed to create the Kubernetes cluster \"{}\", try to clean up the residual resources.",
-                        clusterId);
                 client.stopAndCleanupCluster(clusterId);
-            } catch (Exception e1) {
-                LOG.info(
+            } catch (Exception ex) {
+                LOG.warn(
                         "Failed to stop and clean up the Kubernetes cluster \"{}\".",
                         clusterId,
-                        e1);
+                        ex);
             }
-            throw new ClusterDeploymentException(

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a6445f822c2261c2fe578845ebd93c333f8c689 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793) 
   * cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] wangyang0918 commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r755797584



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       @cc13ny Thanks for your valuable comments.
   
   @Aitozi This discussion make me to rethink that whether we really need to clean up the K8s resources when creating Flink client failed. Because the Flink cluster might be running normally.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] wangyang0918 commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r753895825



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,39 +247,50 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
+                LOG.info(
+                        "Create flink cluster {} successfully, JobManager Web Interface: {}",
+                        clusterId,
+                        clusterClient.getWebInterfaceURL());
+            }
+            return clusterClientProvider;
         } catch (Exception e) {
             try {
-                LOG.warn(
-                        "Failed to create the Kubernetes cluster \"{}\", try to clean up the residual resources.",
-                        clusterId);
                 client.stopAndCleanupCluster(clusterId);
-            } catch (Exception e1) {
-                LOG.info(
+            } catch (Exception ex) {
+                LOG.warn(
                         "Failed to stop and clean up the Kubernetes cluster \"{}\".",
                         clusterId,
-                        e1);
+                        ex);
             }
-            throw new ClusterDeploymentException(

Review comment:
       Also here, why you remove the exception message here.

##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -155,19 +156,14 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
     @Override
     public ClusterClientProvider<String> deploySessionCluster(
             ClusterSpecification clusterSpecification) throws ClusterDeploymentException {
-        final ClusterClientProvider<String> clusterClientProvider =
-                deployClusterInternal(
-                        KubernetesSessionClusterEntrypoint.class.getName(),
-                        clusterSpecification,
-                        false);
-
-        try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
-            LOG.info(
-                    "Create flink session cluster {} successfully, JobManager Web Interface: {}",
-                    clusterId,
-                    clusterClient.getWebInterfaceURL());
-        }
-        return clusterClientProvider;
+        final SupplierWithException<ClusterClientProvider<String>, Exception> supplier =

Review comment:
       Do we really need to have such local variable?

##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,39 +247,50 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
+                LOG.info(
+                        "Create flink cluster {} successfully, JobManager Web Interface: {}",
+                        clusterId,
+                        clusterClient.getWebInterfaceURL());
+            }
+            return clusterClientProvider;
         } catch (Exception e) {
             try {
-                LOG.warn(

Review comment:
       I am not sure why you remove this log.
   
   Maybe I do not make myself clear in the last review comment. What I mean is that we do not need to log the exception, not to remove the whole line of log.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r757741776



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       Thanks for your comments @cc13ny @wangyang0918 , closing this PR 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r756152018



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       I prefer to handle the creating Flink client exception during the phase of `deployCluster`. 
   I'm also ok to keep the current shape, it will not have too bad impact just with some resource object left.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   * fceb0c59da025114818f57ef73d73b1d030e8509 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] wangyang0918 commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r756606603



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       Re-create the client probably does not make sense for FLINK-24624 since it will always fail due to permission issues.
   
   After careful consideration, I lean to have more discussion and keep the current behavior. I still appreciate for @Aitozi 's work on this PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909",
       "triggerID" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae6c6bc42146b7a72215b4ceda8108fdc21b41f6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26972",
       "triggerID" : "ae6c6bc42146b7a72215b4ceda8108fdc21b41f6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ae6c6bc42146b7a72215b4ceda8108fdc21b41f6 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26972) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f51e8519e195627c7fe098b3f5332bae905c2a0f Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780) 
   * 8a6445f822c2261c2fe578845ebd93c333f8c689 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r754914762



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -155,19 +156,14 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
     @Override
     public ClusterClientProvider<String> deploySessionCluster(
             ClusterSpecification clusterSpecification) throws ClusterDeploymentException {
-        final ClusterClientProvider<String> clusterClientProvider =
-                deployClusterInternal(
-                        KubernetesSessionClusterEntrypoint.class.getName(),
-                        clusterSpecification,
-                        false);
-
-        try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {
-            LOG.info(
-                    "Create flink session cluster {} successfully, JobManager Web Interface: {}",
-                    clusterId,
-                    clusterClient.getWebInterfaceURL());
-        }
-        return clusterClientProvider;
+        final SupplierWithException<ClusterClientProvider<String>, Exception> supplier =

Review comment:
       removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] Aitozi commented on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
Aitozi commented on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-974585072


   @wangyang0918 I have addressed the comments, please take a look when you are free 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 77e1b9059149c986fc2e35516ce33e3309b61cad Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26793",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909",
       "triggerID" : "cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae6c6bc42146b7a72215b4ceda8108fdc21b41f6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ae6c6bc42146b7a72215b4ceda8108fdc21b41f6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cf579eec3aa9afc68f2cf9f5b77d5805ec4496b4 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26909) 
   * ae6c6bc42146b7a72215b4ceda8108fdc21b41f6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8a6445f822c2261c2fe578845ebd93c333f8c689",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f51e8519e195627c7fe098b3f5332bae905c2a0f Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26780) 
   * 8a6445f822c2261c2fe578845ebd93c333f8c689 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] cc13ny commented on a change in pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
cc13ny commented on a change in pull request #17554:
URL: https://github.com/apache/flink/pull/17554#discussion_r756310567



##########
File path: flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesClusterDescriptor.java
##########
@@ -256,36 +244,51 @@ private String getWebMonitorAddress(Configuration configuration) throws Exceptio
                     flinkConfig.get(JobManagerOptions.PORT));
         }
 
+        final KubernetesJobManagerParameters kubernetesJobManagerParameters =
+                new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
+
+        final FlinkPod podTemplate =
+                kubernetesJobManagerParameters
+                        .getPodTemplateFilePath()
+                        .map(
+                                file ->
+                                        KubernetesUtils.loadPodFromTemplateFile(
+                                                client, file, Constants.MAIN_CONTAINER_NAME))
+                        .orElse(new FlinkPod.Builder().build());
+        final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
+                KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
+                        podTemplate, kubernetesJobManagerParameters);
+
+        client.createJobManagerComponent(kubernetesJobManagerSpec);
+
+        return createClusterClientProvider(clusterId);
+    }
+
+    private ClusterClientProvider<String> safelyDeployCluster(
+            SupplierWithException<ClusterClientProvider<String>, Exception> supplier)
+            throws ClusterDeploymentException {
         try {
-            final KubernetesJobManagerParameters kubernetesJobManagerParameters =
-                    new KubernetesJobManagerParameters(flinkConfig, clusterSpecification);
-
-            final FlinkPod podTemplate =
-                    kubernetesJobManagerParameters
-                            .getPodTemplateFilePath()
-                            .map(
-                                    file ->
-                                            KubernetesUtils.loadPodFromTemplateFile(
-                                                    client, file, Constants.MAIN_CONTAINER_NAME))
-                            .orElse(new FlinkPod.Builder().build());
-            final KubernetesJobManagerSpecification kubernetesJobManagerSpec =
-                    KubernetesJobManagerFactory.buildKubernetesJobManagerSpecification(
-                            podTemplate, kubernetesJobManagerParameters);
-
-            client.createJobManagerComponent(kubernetesJobManagerSpec);
-
-            return createClusterClientProvider(clusterId);
+
+            ClusterClientProvider<String> clusterClientProvider = supplier.get();
+
+            try (ClusterClient<String> clusterClient = clusterClientProvider.getClusterClient()) {

Review comment:
       > Because the Flink cluster might be running normally.
   
   I think the same. It turns out to be, if cleaning up the K8s resources is the best strategy or if it's possible to retry and re-create a client with the existing resource? Cleaning up is definitely the simplest way. It depends on how much and in which failing situations we would like to hand (e.g. re-create the client).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51e8519e195627c7fe098b3f5332bae905c2a0f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   * f51e8519e195627c7fe098b3f5332bae905c2a0f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17554: [FLINK-24624][Kubernetes]Kill cluster when starting kubernetes session or application cluster failed

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17554:
URL: https://github.com/apache/flink/pull/17554#issuecomment-950550174


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25404",
       "triggerID" : "77e1b9059149c986fc2e35516ce33e3309b61cad",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407",
       "triggerID" : "fceb0c59da025114818f57ef73d73b1d030e8509",
       "triggerType" : "PUSH"
     }, {
       "hash" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779",
       "triggerID" : "720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fceb0c59da025114818f57ef73d73b1d030e8509 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25407) 
   * 720b4c0be3f96f2e6b17fffe264ffd4bc7a7dee5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26779) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org