You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/01/17 23:59:04 UTC

[GitHub] [flink] HuangZhenQiu opened a new pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

HuangZhenQiu opened a new pull request #14678:
URL: https://github.com/apache/flink/pull/14678


   ## What is the purpose of the change
   
   Add pluggable failure listener spi interface so that Flink users can customize the failure handling business logic by using the plugin framework. For example, the user can use the listener to emit metrics for a different type of error (application or platform).
   
   ## Brief change log
   
     - Add FailureListener as SPI and DefaultFailureListener
     - Add FailureListenerFactory for loading listener from job resources or plugin manager.
     - Add the test cases for loading in both SPI and the pugin framework.
   
   
   ## Verifying this change
   This change added tests and can be verified as follows:
   
     - Added integration tests in flink-tests for testing FailureListenerFactory plugin loading of customized failure listeners.
     - Added unit test for FailureListenerFactory
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes)
     - If yes, how is the feature documented? (not documented)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 25908aedd847a453981533a3bd87117bcdc7ac78 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204) 
   * 66efa5e224d899b11d89cf6419b376be0187a7bb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 296a46dab704e3a3553bb7a43128a3f4774bfb77 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 399c2ce9c9f9637a20ddbfd684a9a4e82b41265e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884) 
   * a3a28907e4723571c04853236d8c3712195e5821 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 34720c9f7ea37afb5d7f3d2a824b78fee916b755 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120) 
   * 7e811ff57a72897e49425a53b5d956f5a1f32ea9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 1a6bd7aa98b649bd119479247d34ea029b8ba636 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-766752536


   I reported the unrelated test failure in Jira.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 748434f298f7ba97a6f2b9e77638ef60ccad9d82 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197) 
   * 25908aedd847a453981533a3bd87117bcdc7ac78 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 11651466eb13c5c05d2cd0eed0ac08b9bf617185 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704) 
   * 12652d9c7829d2d26d41f61185cbd7ab8f4ad505 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r567364321



##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       Revised the intro part as your suggested.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 1a6bd7aa98b649bd119479247d34ea029b8ba636 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400) 
   * 11651466eb13c5c05d2cd0eed0ac08b9bf617185 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-771681310


   I noticed one issue during my manual review: When using a logger in the factory (loaded via the plugin mechanism), log4j will create a new log file. This is most likely caused by the separate classloader, which makes log4j believe it hasn't been initialized yet.
   To resolve this issue, I added the following to the Flink conf: `plugin.classloader.parent-first-patterns.additional: org.slf4j`.
   
   @HuangZhenQiu did you observe a similar behavior in your tests? If we both ran into this, we should mention this in the docs for the users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559765292



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;
+
+    public FailureListenerFactory(Configuration configuration) {
+        this.pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
+    }
+
+    public List<FailureListener> createFailureListener(JobManagerJobMetricGroup metricGroup) {

Review comment:
       Good suggestion. Changed accordingly.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;
+
+    public FailureListenerFactory(Configuration configuration) {
+        this.pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
+    }
+
+    public List<FailureListener> createFailureListener(JobManagerJobMetricGroup metricGroup) {
+        List<FailureListener> failureListeners = new ArrayList<>();
+
+        ServiceLoader<FailureListener> serviceLoader = ServiceLoader.load(FailureListener.class);
+        Iterator<FailureListener> fromServiceLoader = serviceLoader.iterator();
+        Iterator<FailureListener> fromPluginManager = pluginManager.load(FailureListener.class);

Review comment:
       Agree.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 25908aedd847a453981533a3bd87117bcdc7ac78 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204) 
   * 66efa5e224d899b11d89cf6419b376be0187a7bb Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352) 
   * e32796821c485c583a2b10b55d3f8b780971d357 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113",
       "triggerID" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16047",
       "triggerID" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * cabe06381bf59e3b94936d879d5f2aa5c34c878e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113) 
   * af1d7a046e857c17831c4866b0dc83830b5e1ee0 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16047) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * e32796821c485c583a2b10b55d3f8b780971d357 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-764968449


   @rmetzger @zentol 
   Thanks for these comments. Please review it again after the build is green.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 12652d9c7829d2d26d41f61185cbd7ab8f4ad505 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * f51da4ba54bd567ae6a58e16c2e9e3882a76831c Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * a4c68763883ef4f3c411fa606d0907ef334a6e76 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363) 
   * 1a6bd7aa98b649bd119479247d34ea029b8ba636 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r569628583



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+

Review comment:
       Looks like most of the utils in Flink are using class. Shall we keep it as class?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113",
       "triggerID" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * cabe06381bf59e3b94936d879d5f2aa5c34c878e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113) 
   * af1d7a046e857c17831c4866b0dc83830b5e1ee0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 12652d9c7829d2d26d41f61185cbd7ab8f4ad505 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719) 
   * 399c2ce9c9f9637a20ddbfd684a9a4e82b41265e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 7e811ff57a72897e49425a53b5d956f5a1f32ea9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539) 
   * f850630134ff2243b56a82206e9ec402a93acfb4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r560109636



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle a failure in the listener.
+     *
+     * @param cause the failure cause
+     * @param globalFailure whether the failure is a global failure
+     */
+    void onFailure(final Throwable cause, boolean globalFailure);

Review comment:
       Good point, thanks for the explanation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559766248



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricNames.java
##########
@@ -50,6 +50,7 @@ private MetricNames() {}
     public static final String NUM_REGISTERED_TASK_MANAGERS = "numRegisteredTaskManagers";
 
     public static final String NUM_RESTARTS = "numRestarts";
+    public static final String NUM_JOB_FAILURE = "numJobFailure";

Review comment:
       Added comment to make it clearer for users.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
##########
@@ -172,6 +176,14 @@
                         .createInstance(new DefaultExecutionSlotAllocationContext());
 
         this.verticesWaitingForRestart = new HashSet<>();
+
+        List<FailureListener> listeners =
+                failureListenerFactory.createFailureListener(jobManagerJobMetricGroup);
+
+        for (FailureListener listener : listeners) {
+            executionFailureHandler.registerFailureListener(listener);
+        }

Review comment:
       Agree.

##########
File path: flink-runtime/src/main/resources/META-INF/services/org.apache.flink.runtime.executiongraph.FailureListener
##########
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+org.apache.flink.runtime.executiongraph.DefaultFailureListener

Review comment:
       Removed.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/plugin/FailureListenerFactoryTest.java
##########
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.test.plugin;
+
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.DefaultPluginManager;
+import org.apache.flink.core.plugin.DirectoryBasedPluginFinder;
+import org.apache.flink.core.plugin.PluginDescriptor;
+import org.apache.flink.core.plugin.PluginFinder;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.testutils.CommonTestUtils;
+import org.apache.flink.runtime.executiongraph.FailureListener;
+import org.apache.flink.runtime.executiongraph.FailureListenerFactory;
+import org.apache.flink.runtime.metrics.groups.UnregisteredMetricGroups;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.ImmutableMap;
+import org.apache.flink.shaded.guava18.com.google.common.collect.Lists;
+
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+
+/** Test for {@link org.apache.flink.runtime.executiongraph.FailureListenerFactory}. */
+public class FailureListenerFactoryTest extends PluginTestBase {

Review comment:
       Agree. Changed accordingly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-764968449


   @rmetzger @zentol 
   Thanks for these comments. Please review it again after the build is green.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559979096



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/ExecutionFailureHandler.java
##########
@@ -98,11 +103,26 @@ public FailureHandlingResult getGlobalFailureHandlingResult(final Throwable caus
                 true);
     }
 
+    /** @param failureListener the failure listener to be registered */
+    public void registerFailureListener(FailureListener failureListener) {
+        failureListeners.add(failureListener);
+    }
+
     private FailureHandlingResult handleFailure(
             final Throwable cause,
             final Set<ExecutionVertexID> verticesToRestart,
             final boolean globalFailure) {
 
+        try {
+            for (FailureListener listener : failureListeners) {
+                listener.onFailure(cause, globalFailure);
+            }
+        } catch (Throwable e) {
+            return FailureHandlingResult.unrecoverable(
+                    new JobException("The failure in failure listener is not recoverable", e),

Review comment:
       ```suggestion
                       new JobException("Unexpected exception in FailureListener", e),
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * f850630134ff2243b56a82206e9ec402a93acfb4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615) 
   * f51da4ba54bd567ae6a58e16c2e9e3882a76831c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-784358861


   @flinkbot run azure


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * e32796821c485c583a2b10b55d3f8b780971d357 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354) 
   * a4c68763883ef4f3c411fa606d0907ef334a6e76 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r560573167



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle a failure in the listener.
+     *
+     * @param cause the failure cause
+     * @param globalFailure whether the failure is a global failure
+     */
+    void onFailure(final Throwable cause, boolean globalFailure);

Review comment:
       +1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 9e445296314a2bfde4c85b61f1a3d9ee892c97ba Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r560072956



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle a failure in the listener.
+     *
+     * @param cause the failure cause
+     * @param globalFailure whether the failure is a global failure
+     */
+    void onFailure(final Throwable cause, boolean globalFailure);

Review comment:
       The purpose of MetricGroups is not to make metadata readily accessible to other components.
   Why not just pass the jobID/jobName/whatever metadata you linked separately into the constructor/init? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 296a46dab704e3a3553bb7a43128a3f4774bfb77 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162) 
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 748434f298f7ba97a6f2b9e77638ef60ccad9d82 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 748434f298f7ba97a6f2b9e77638ef60ccad9d82 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197) 
   * 25908aedd847a453981533a3bd87117bcdc7ac78 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 296a46dab704e3a3553bb7a43128a3f4774bfb77 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162) 
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 748434f298f7ba97a6f2b9e77638ef60ccad9d82 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559765178



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;
+
+    public FailureListenerFactory(Configuration configuration) {
+        this.pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);

Review comment:
       Yes, I think there are several places that use PluginManager in this way. There is no global plugin manager instance per Flink process. But I think we can make it singleton in another PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * e32796821c485c583a2b10b55d3f8b780971d357 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354) 
   * a4c68763883ef4f3c411fa606d0907ef334a6e76 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r562223668



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {

Review comment:
       Agree. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-771681310


   I noticed one issue during my manual review: When using a logger in the factory (loaded via the plugin mechanism), log4j will create a new log file. This is most likely caused by the separate classloader, which makes log4j believe it hasn't been initialized yet.
   To resolve this issue, I added the following to the Flink conf: `plugin.classloader.parent-first-patterns.additional: org.slf4j`.
   
   @HuangZhenQiu did you observe a similar behavior in your tests? If we both ran into this, we should mention this in the docs for the users.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r568567754



##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 

Review comment:
       ```suggestion
   Flink provides a pluggable failure listener interface for users to register multiple instances, which are called each 
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric

Review comment:
       ```suggestion
   time an exception is reported at runtime. The default failure listener only records the failure count and emits the metric
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric
+"numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make call to external

Review comment:
       ```suggestion
   "numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make calls to external
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/JobMasterSchedulerTest.java
##########
@@ -103,7 +105,8 @@ public SchedulerNG createInstance(
                 ExecutionDeploymentTracker executionDeploymentTracker,
                 long initializationTimestamp,
                 ComponentMainThreadExecutor mainThreadExecutor,
-                JobStatusListener jobStatusListener) {
+                JobStatusListener jobStatusListener,
+                List<FailureListener> failureListenerFactory) {

Review comment:
       list != factory

##########
File path: flink-tests/src/test/java/org/apache/flink/test/plugin/jar/failurelistener/TestFailureListener.java
##########
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.test.plugin.jar.failurelistener;
+
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.metrics.MetricGroup;
+
+/** Implementation of {@link FailureListener} for plugin loading test. */
+public class TestFailureListener implements FailureListener {
+
+    @Override
+    public void init(JobID jobID, String jobName, MetricGroup metricGroup) {}

Review comment:
       I don't think the init() method is really needed. All this information can be passed into the factory.

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric
+"numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make call to external
+systems or classify the exceptions otherwise. For example, it can distinguish whether it is a flink runtime error or an 

Review comment:
       ```suggestion
   systems or classify the exceptions otherwise. For example, it can be used to distinguish whether it is a Flink runtime error or an 
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> getFailureListerners(

Review comment:
       Doesn't it make sense to return a `Set<FailureListener>` here already? otherwise, we might pass duplicate listeners  that will be deduped later anyways.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
##########
@@ -124,7 +125,8 @@
             final ExecutionDeploymentTracker executionDeploymentTracker,
             long initializationTimestamp,
             final ComponentMainThreadExecutor mainThreadExecutor,
-            final JobStatusListener jobStatusListener)
+            final JobStatusListener jobStatusListener,
+            final List<FailureListener> failureListeners)

Review comment:
       ```suggestion
               final Set<FailureListener> failureListeners)
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> getFailureListerners(
+            Configuration configuration, JobManagerJobMetricGroup metricGroup) {

Review comment:
       I think it is better to pass the MetricGroup here, instead of the JobManagerJobMetricGroup.

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric
+"numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make call to external
+systems or classify the exceptions otherwise. For example, it can distinguish whether it is a flink runtime error or an 
+application user logic error. With the accurate metrics, you may have better idea about the platform level metrics, 

Review comment:
       ```suggestion
   application user logic error. With accurate metrics, you may have a better idea about platform level metrics, 
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+

Review comment:
       ```suggestion
   public enum FailureListenerUtils {
       ;
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java
##########
@@ -333,10 +340,15 @@ public void onUnknownDeploymentsOf(
     }
 
     private SchedulerNG createScheduler(
+            Configuration configuration,
             ExecutionDeploymentTracker executionDeploymentTracker,
             JobManagerJobMetricGroup jobManagerJobMetricGroup,
             JobStatusListener jobStatusListener)
             throws Exception {
+
+        List<FailureListener> failureListeners =
+                FailureListenerUtils.getFailureListerners(configuration, jobManagerJobMetricGroup);

Review comment:
       I think it would be better to initialize the failure listeners in the JobMaster constructor, and pass the set into this method. In the JobMaster constructor, you have access to all the required fields (jobid, jobname, config etc.)

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java
##########
@@ -333,10 +340,15 @@ public void onUnknownDeploymentsOf(
     }
 
     private SchedulerNG createScheduler(
+            Configuration configuration,
             ExecutionDeploymentTracker executionDeploymentTracker,
             JobManagerJobMetricGroup jobManagerJobMetricGroup,
             JobStatusListener jobStatusListener)
             throws Exception {
+
+        List<FailureListener> failureListeners =

Review comment:
       ```suggestion
           Set<FailureListener> failureListeners =
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> getFailureListerners(

Review comment:
       ```suggestion
       public static List<FailureListener> getFailureListeners(
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113",
       "triggerID" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16047",
       "triggerID" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * af1d7a046e857c17831c4866b0dc83830b5e1ee0 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16047) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r562223668



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {

Review comment:
       Agree. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 7e811ff57a72897e49425a53b5d956f5a1f32ea9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539) 
   * f850630134ff2243b56a82206e9ec402a93acfb4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113",
       "triggerID" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16047",
       "triggerID" : "af1d7a046e857c17831c4866b0dc83830b5e1ee0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * af1d7a046e857c17831c4866b0dc83830b5e1ee0 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=16047) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r568567754



##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 

Review comment:
       ```suggestion
   Flink provides a pluggable failure listener interface for users to register multiple instances, which are called each 
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric

Review comment:
       ```suggestion
   time an exception is reported at runtime. The default failure listener only records the failure count and emits the metric
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric
+"numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make call to external

Review comment:
       ```suggestion
   "numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make calls to external
   ```

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/JobMasterSchedulerTest.java
##########
@@ -103,7 +105,8 @@ public SchedulerNG createInstance(
                 ExecutionDeploymentTracker executionDeploymentTracker,
                 long initializationTimestamp,
                 ComponentMainThreadExecutor mainThreadExecutor,
-                JobStatusListener jobStatusListener) {
+                JobStatusListener jobStatusListener,
+                List<FailureListener> failureListenerFactory) {

Review comment:
       list != factory

##########
File path: flink-tests/src/test/java/org/apache/flink/test/plugin/jar/failurelistener/TestFailureListener.java
##########
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.test.plugin.jar.failurelistener;
+
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.metrics.MetricGroup;
+
+/** Implementation of {@link FailureListener} for plugin loading test. */
+public class TestFailureListener implements FailureListener {
+
+    @Override
+    public void init(JobID jobID, String jobName, MetricGroup metricGroup) {}

Review comment:
       I don't think the init() method is really needed. All this information can be passed into the factory.

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric
+"numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make call to external
+systems or classify the exceptions otherwise. For example, it can distinguish whether it is a flink runtime error or an 

Review comment:
       ```suggestion
   systems or classify the exceptions otherwise. For example, it can be used to distinguish whether it is a Flink runtime error or an 
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> getFailureListerners(

Review comment:
       Doesn't it make sense to return a `Set<FailureListener>` here already? otherwise, we might pass duplicate listeners  that will be deduped later anyways.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
##########
@@ -124,7 +125,8 @@
             final ExecutionDeploymentTracker executionDeploymentTracker,
             long initializationTimestamp,
             final ComponentMainThreadExecutor mainThreadExecutor,
-            final JobStatusListener jobStatusListener)
+            final JobStatusListener jobStatusListener,
+            final List<FailureListener> failureListeners)

Review comment:
       ```suggestion
               final Set<FailureListener> failureListeners)
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> getFailureListerners(
+            Configuration configuration, JobManagerJobMetricGroup metricGroup) {

Review comment:
       I think it is better to pass the MetricGroup here, instead of the JobManagerJobMetricGroup.

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,50 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+Flink provides the pluggable failure listener interface for users to register multiple instances, which are called each 
+time an exception reported at runtime. The default failure listener is only to record the failure count and emit the metric
+"numJobFailure" for the job. The purpose of these listeners is to build metrics based on the exceptions, make call to external
+systems or classify the exceptions otherwise. For example, it can distinguish whether it is a flink runtime error or an 
+application user logic error. With the accurate metrics, you may have better idea about the platform level metrics, 

Review comment:
       ```suggestion
   application user logic error. With accurate metrics, you may have a better idea about platform level metrics, 
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+

Review comment:
       ```suggestion
   public enum FailureListenerUtils {
       ;
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java
##########
@@ -333,10 +340,15 @@ public void onUnknownDeploymentsOf(
     }
 
     private SchedulerNG createScheduler(
+            Configuration configuration,
             ExecutionDeploymentTracker executionDeploymentTracker,
             JobManagerJobMetricGroup jobManagerJobMetricGroup,
             JobStatusListener jobStatusListener)
             throws Exception {
+
+        List<FailureListener> failureListeners =
+                FailureListenerUtils.getFailureListerners(configuration, jobManagerJobMetricGroup);

Review comment:
       I think it would be better to initialize the failure listeners in the JobMaster constructor, and pass the set into this method. In the JobMaster constructor, you have access to all the required fields (jobid, jobname, config etc.)

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java
##########
@@ -333,10 +340,15 @@ public void onUnknownDeploymentsOf(
     }
 
     private SchedulerNG createScheduler(
+            Configuration configuration,
             ExecutionDeploymentTracker executionDeploymentTracker,
             JobManagerJobMetricGroup jobManagerJobMetricGroup,
             JobStatusListener jobStatusListener)
             throws Exception {
+
+        List<FailureListener> failureListeners =

Review comment:
       ```suggestion
           Set<FailureListener> failureListeners =
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> getFailureListerners(

Review comment:
       ```suggestion
       public static List<FailureListener> getFailureListeners(
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 399c2ce9c9f9637a20ddbfd684a9a4e82b41265e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884) 
   * a3a28907e4723571c04853236d8c3712195e5821 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r567364347



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> createFailureListener(

Review comment:
       Updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r567364234



##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only
+to record the failure count and emit the metrics numJobFailure for the job. If you need an advanced classification on exceptions, 
+you can build a plugin to customize failure listener. For example, it can distinguish whether it is a flink runtime error or an 
+application user logic error. With the accurate metrics, you may have better idea about the platform level metrics, for example 
+failures due to network, platform reliability, etc.
+
+
+# Implement a plugin for your custom failure listener

Review comment:
       Agree. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559765178



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;
+
+    public FailureListenerFactory(Configuration configuration) {
+        this.pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);

Review comment:
       Yes, I think there are 10 places that uses PluginManager in this way. There is no global plugin manager instance per Flink process. But I think we can make it singleton in another PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 296a46dab704e3a3553bb7a43128a3f4774bfb77 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * a3a28907e4723571c04853236d8c3712195e5821 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984) 
   * 5ffc877b63cf33e5a374cbbda55096e429c84da0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-766752536


   I reported the unrelated test failure in Jira.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 7e811ff57a72897e49425a53b5d956f5a1f32ea9 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r567364389



##########
File path: flink-core/src/main/java/org/apache/flink/core/failurelistener/FailureListener.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.core.failurelistener;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.metrics.MetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+@PublicEvolving
+public interface FailureListener {
+
+    /**
+     * Initialize the FailureListener with MetricGroup.
+     *
+     * @param jobName the name job whose failure will be subscribed by the listener
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(String jobName, MetricGroup metricGroup);

Review comment:
       I just feel jobID is not needed for most of the cases initially. Added accordingly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559978485



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle a failure in the listener.
+     *
+     * @param cause the failure cause
+     * @param globalFailure whether the failure is a global failure
+     */
+    void onFailure(final Throwable cause, boolean globalFailure);

Review comment:
       The `JobManagerJobMetricGroup` contains the JobId. Would it make sense to introduce a FailureListenerMetricGroup extends MetricGroup that exposes the JobId and JobName, and is not considered an internal API?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 9e445296314a2bfde4c85b61f1a3d9ee892c97ba Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675) 
   * cabe06381bf59e3b94936d879d5f2aa5c34c878e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 399c2ce9c9f9637a20ddbfd684a9a4e82b41265e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * a4c68763883ef4f3c411fa606d0907ef334a6e76 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363) 
   * 1a6bd7aa98b649bd119479247d34ea029b8ba636 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 34720c9f7ea37afb5d7f3d2a824b78fee916b755 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r560573025



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);

Review comment:
       Sounds good. I will also move the FailureListener and FailureListenerFactory into flink core module.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r563667477



##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only
+to record the failure count and emit the metrics numJobFailure for the job. If you need an advanced classification on exceptions, 

Review comment:
       ```suggestion
   to record the failure count and emit the metric "numJobFailure" for the job. If you need an advanced classification on exceptions, 
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only
+to record the failure count and emit the metrics numJobFailure for the job. If you need an advanced classification on exceptions, 
+you can build a plugin to customize failure listener. For example, it can distinguish whether it is a flink runtime error or an 

Review comment:
       ```suggestion
   you can build a plugin to customize the failure listener. For example, it can distinguish whether it is a flink runtime error or an 
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> createFailureListener(

Review comment:
       ```suggestion
       public static List<FailureListener> getFailureListeners(
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only

Review comment:
       ```suggestion 
   Each execution exception in a Flink job, will be passed to the JobManager. The default failure listener is only
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/ExecutionFailureHandler.java
##########
@@ -98,11 +103,25 @@ public FailureHandlingResult getGlobalFailureHandlingResult(final Throwable caus
                 true);
     }
 
+    /** @param failureListener the failure listener to be registered */
+    public void registerFailureListener(FailureListener failureListener) {
+        failureListeners.add(failureListener);
+    }
+
     private FailureHandlingResult handleFailure(
             final Throwable cause,
             final Set<ExecutionVertexID> verticesToRestart,
             final boolean globalFailure) {
 
+        try {
+            for (FailureListener listener : failureListeners) {
+                listener.onFailure(cause, globalFailure);
+            }
+        } catch (Throwable e) {
+            return FailureHandlingResult.unrecoverable(
+                    new JobException("Unexpected excepton in FailureListener", e), false);

Review comment:
       ```suggestion
                       new JobException("Unexpected exception in FailureListener", e), false);
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       This documentation page is hard to read in my opinion. It should first describe on a high level that a user can register multiple exception listeners, which are called each time an exception is reported at runtime.
   The purpose of these listeners is to build metrics based on the exceptions, make call to external systems or classify the exceptions otherwise. ...

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/DefaultFailureListener.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.runtime.metrics.MetricNames;
+
+/**
+ * Default implementation {@link org.apache.flink.core.failurelistener.FailureListener} that record

Review comment:
       ```suggestion
    * Default implementation {@link org.apache.flink.core.failurelistener.FailureListener} that records
   ```

##########
File path: flink-core/src/main/java/org/apache/flink/core/failurelistener/FailureListener.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.core.failurelistener;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.metrics.MetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+@PublicEvolving
+public interface FailureListener {
+
+    /**
+     * Initialize the FailureListener with MetricGroup.
+     *
+     * @param jobName the name job whose failure will be subscribed by the listener
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(String jobName, MetricGroup metricGroup);

Review comment:
       why not also passing the JobId?

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only
+to record the failure count and emit the metrics numJobFailure for the job. If you need an advanced classification on exceptions, 
+you can build a plugin to customize failure listener. For example, it can distinguish whether it is a flink runtime error or an 
+application user logic error. With the accurate metrics, you may have better idea about the platform level metrics, for example 
+failures due to network, platform reliability, etc.
+
+
+# Implement a plugin for your custom failure listener

Review comment:
       This is this heading a level lower than one above?. Shouldn't this be a "###" heading?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 25908aedd847a453981533a3bd87117bcdc7ac78 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204) 
   * 66efa5e224d899b11d89cf6419b376be0187a7bb Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352) 
   * e32796821c485c583a2b10b55d3f8b780971d357 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 5ffc877b63cf33e5a374cbbda55096e429c84da0 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068) 
   * 34720c9f7ea37afb5d7f3d2a824b78fee916b755 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559859082



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);

Review comment:
       `JobManagerJobMetricGroup` is an internal API that should not be exposed to user-code. Use `MetricGroup` instead.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle a failure in the listener.
+     *
+     * @param cause the failure cause
+     * @param globalFailure whether the failure is a global failure
+     */
+    void onFailure(final Throwable cause, boolean globalFailure);

Review comment:
       Shouldn't this also receive some information about the job? Given that the listener is configured per cluster, how would an implementation distinguish errors between different jobs within a session cluster?
   
   I think it would be good to out-line the use-case that you're trying to accomplish; as it stands, outside of counting exceptions there's little you can do with the current interface.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {

Review comment:
       Seems like a misnomer to call this a factory. You could replace this entire class with a static method that returns a collection of listeners.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle a failure in the listener.
+     *
+     * @param cause the failure cause
+     * @param globalFailure whether the failure is a global failure

Review comment:
       This needs better documentation or additional guidance on what constitutes a global failure.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private final PluginManager pluginManager;
+    private List<FailureListener> failureListeners = new ArrayList<>();

Review comment:
       why is this not `final`?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private final PluginManager pluginManager;

Review comment:
       What are we keeping this reference for?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {

Review comment:
       I would recommend having a factory implement the Plugin interface. This allows implementations to be much more flexible structurally.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * f51da4ba54bd567ae6a58e16c2e9e3882a76831c Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538) Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 296a46dab704e3a3553bb7a43128a3f4774bfb77 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 11651466eb13c5c05d2cd0eed0ac08b9bf617185 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704) 
   * 12652d9c7829d2d26d41f61185cbd7ab8f4ad505 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 748434f298f7ba97a6f2b9e77638ef60ccad9d82 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 11651466eb13c5c05d2cd0eed0ac08b9bf617185 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 25908aedd847a453981533a3bd87117bcdc7ac78 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761904406


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit af1d7a046e857c17831c4866b0dc83830b5e1ee0 (Fri May 28 08:58:22 UTC 2021)
   
   **Warnings:**
    * **1 pom.xml files were touched**: Check for build and licensing issues.
    * No documentation files were touched! Remember to keep the Flink docs up to date!
    * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-20833).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113",
       "triggerID" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 9e445296314a2bfde4c85b61f1a3d9ee892c97ba Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675) 
   * cabe06381bf59e3b94936d879d5f2aa5c34c878e Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113",
       "triggerID" : "cabe06381bf59e3b94936d879d5f2aa5c34c878e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * cabe06381bf59e3b94936d879d5f2aa5c34c878e Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15113) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 25908aedd847a453981533a3bd87117bcdc7ac78 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204) 
   * 66efa5e224d899b11d89cf6419b376be0187a7bb Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r560572644



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {

Review comment:
       Agree. Yes, I think all of SPI in flink are factory based. Will change accordingly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 296a46dab704e3a3553bb7a43128a3f4774bfb77 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162) 
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559765455



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/ExecutionFailureHandler.java
##########
@@ -98,11 +103,22 @@ public FailureHandlingResult getGlobalFailureHandlingResult(final Throwable caus
                 true);
     }
 
+    /** @param failureListener the failure listener to be registered */
+    public void registerFailureListener(FailureListener failureListener) {
+        if (!failureListeners.contains(failureListener)) {
+            failureListeners.add(failureListener);

Review comment:
       Yes, no need to filter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r559354094



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/ExecutionFailureHandler.java
##########
@@ -98,11 +103,22 @@ public FailureHandlingResult getGlobalFailureHandlingResult(final Throwable caus
                 true);
     }
 
+    /** @param failureListener the failure listener to be registered */
+    public void registerFailureListener(FailureListener failureListener) {
+        if (!failureListeners.contains(failureListener)) {
+            failureListeners.add(failureListener);

Review comment:
       Isn't HashSet.add() only adding something, if it isn't present already? 

##########
File path: flink-runtime/src/main/resources/META-INF/services/org.apache.flink.runtime.executiongraph.FailureListener
##########
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+org.apache.flink.runtime.executiongraph.DefaultFailureListener

Review comment:
       See comment above.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/ExecutionFailureHandler.java
##########
@@ -49,6 +51,8 @@
     /** Number of all restarts happened since this job is submitted. */
     private long numberOfRestarts;
 
+    private Set<FailureListener> failureListeners;

Review comment:
       ```suggestion
       private final Set<FailureListener> failureListeners;
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
##########
@@ -172,6 +176,14 @@
                         .createInstance(new DefaultExecutionSlotAllocationContext());
 
         this.verticesWaitingForRestart = new HashSet<>();
+
+        List<FailureListener> listeners =
+                failureListenerFactory.createFailureListener(jobManagerJobMetricGroup);
+
+        for (FailureListener listener : listeners) {
+            executionFailureHandler.registerFailureListener(listener);
+        }

Review comment:
       Since this loop executes code not controlled by the framework, I would recommend catching Throwables and returning them as an unrecoverable FailureHandlingResult.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;
+
+    public FailureListenerFactory(Configuration configuration) {
+        this.pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
+    }
+
+    public List<FailureListener> createFailureListener(JobManagerJobMetricGroup metricGroup) {

Review comment:
       I wonder if we can't do the discovery and initialization of the implementations in the constructor? The available implementations won't change while the process is running, so why re-initializing the listeners over and over again?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle each of failures in the listener.

Review comment:
       ```suggestion
        * Method to handle a failure in the listener.
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;
+
+    public FailureListenerFactory(Configuration configuration) {
+        this.pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
+    }
+
+    public List<FailureListener> createFailureListener(JobManagerJobMetricGroup metricGroup) {
+        List<FailureListener> failureListeners = new ArrayList<>();
+
+        ServiceLoader<FailureListener> serviceLoader = ServiceLoader.load(FailureListener.class);
+        Iterator<FailureListener> fromServiceLoader = serviceLoader.iterator();
+        Iterator<FailureListener> fromPluginManager = pluginManager.load(FailureListener.class);

Review comment:
       Are you using the service loader just for the default failure listener implementation? Maybe it's nicer to initialize the default implementation(s), by just adding them in code, instead of providing two mechanisms.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;
+
+    public FailureListenerFactory(Configuration configuration) {
+        this.pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);

Review comment:
       I have not worked with the Flink plugin system yet: Are you sure it's the right approach to initialize the PluginManager here? Basically, the question is: Is there one global plugin manager instance per Flink process, or are there multiple PluginManagers for each pluggable implementation (metrics reporters, file system implementations etc.) ?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricNames.java
##########
@@ -50,6 +50,7 @@ private MetricNames() {}
     public static final String NUM_REGISTERED_TASK_MANAGERS = "numRegisteredTaskManagers";
 
     public static final String NUM_RESTARTS = "numRestarts";
+    public static final String NUM_JOB_FAILURE = "numJobFailure";

Review comment:
       I'm not sure if our users will be confused if there are two similar metrics: `numRestarts` and `numJobFailure`.
   What's the exact difference between them?
   
   I guess the number of failures can be higher than the number of restarts (local failures, ignored restarts) ... but I fear a user just looking at the available metrics might pick the first one available.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/plugin/FailureListenerFactoryTest.java
##########
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.test.plugin;
+
+import org.apache.flink.configuration.ConfigConstants;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.DefaultPluginManager;
+import org.apache.flink.core.plugin.DirectoryBasedPluginFinder;
+import org.apache.flink.core.plugin.PluginDescriptor;
+import org.apache.flink.core.plugin.PluginFinder;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.testutils.CommonTestUtils;
+import org.apache.flink.runtime.executiongraph.FailureListener;
+import org.apache.flink.runtime.executiongraph.FailureListenerFactory;
+import org.apache.flink.runtime.metrics.groups.UnregisteredMetricGroups;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.ImmutableMap;
+import org.apache.flink.shaded.guava18.com.google.common.collect.Lists;
+
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+
+/** Test for {@link org.apache.flink.runtime.executiongraph.FailureListenerFactory}. */
+public class FailureListenerFactoryTest extends PluginTestBase {

Review comment:
       Maybe rename it to `FailureListenerPluginTest`, as your PR is proposing to add two FailureListenerFactoryTests.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListenerFactory.java
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Iterators;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.ServiceLoader;
+
+/** Factory class for creating {@link FailureListener} with plugin Manager. */
+public class FailureListenerFactory {
+    private PluginManager pluginManager;

Review comment:
       ```suggestion
       private final PluginManager pluginManager;
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * a4c68763883ef4f3c411fa606d0907ef334a6e76 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * a3a28907e4723571c04853236d8c3712195e5821 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984) 
   * 5ffc877b63cf33e5a374cbbda55096e429c84da0 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r563667477



##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only
+to record the failure count and emit the metrics numJobFailure for the job. If you need an advanced classification on exceptions, 

Review comment:
       ```suggestion
   to record the failure count and emit the metric "numJobFailure" for the job. If you need an advanced classification on exceptions, 
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only
+to record the failure count and emit the metrics numJobFailure for the job. If you need an advanced classification on exceptions, 
+you can build a plugin to customize failure listener. For example, it can distinguish whether it is a flink runtime error or an 

Review comment:
       ```suggestion
   you can build a plugin to customize the failure listener. For example, it can distinguish whether it is a flink runtime error or an 
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/FailureListenerUtils.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.core.failurelistener.FailureListenerFactory;
+import org.apache.flink.core.plugin.PluginManager;
+import org.apache.flink.core.plugin.PluginUtils;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+/** Utils for creating failure listener. */
+public class FailureListenerUtils {
+
+    public static List<FailureListener> createFailureListener(

Review comment:
       ```suggestion
       public static List<FailureListener> getFailureListeners(
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only

Review comment:
       ```suggestion 
   Each execution exception in a Flink job, will be passed to the JobManager. The default failure listener is only
   ```

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/ExecutionFailureHandler.java
##########
@@ -98,11 +103,25 @@ public FailureHandlingResult getGlobalFailureHandlingResult(final Throwable caus
                 true);
     }
 
+    /** @param failureListener the failure listener to be registered */
+    public void registerFailureListener(FailureListener failureListener) {
+        failureListeners.add(failureListener);
+    }
+
     private FailureHandlingResult handleFailure(
             final Throwable cause,
             final Set<ExecutionVertexID> verticesToRestart,
             final boolean globalFailure) {
 
+        try {
+            for (FailureListener listener : failureListeners) {
+                listener.onFailure(cause, globalFailure);
+            }
+        } catch (Throwable e) {
+            return FailureHandlingResult.unrecoverable(
+                    new JobException("Unexpected excepton in FailureListener", e), false);

Review comment:
       ```suggestion
                       new JobException("Unexpected exception in FailureListener", e), false);
   ```

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       This documentation page is hard to read in my opinion. It should first describe on a high level that a user can register multiple exception listeners, which are called each time an exception is reported at runtime.
   The purpose of these listeners is to build metrics based on the exceptions, make call to external systems or classify the exceptions otherwise. ...

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/failurelistener/DefaultFailureListener.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.failurelistener;
+
+import org.apache.flink.core.failurelistener.FailureListener;
+import org.apache.flink.metrics.Counter;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.runtime.metrics.MetricNames;
+
+/**
+ * Default implementation {@link org.apache.flink.core.failurelistener.FailureListener} that record

Review comment:
       ```suggestion
    * Default implementation {@link org.apache.flink.core.failurelistener.FailureListener} that records
   ```

##########
File path: flink-core/src/main/java/org/apache/flink/core/failurelistener/FailureListener.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.core.failurelistener;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.metrics.MetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+@PublicEvolving
+public interface FailureListener {
+
+    /**
+     * Initialize the FailureListener with MetricGroup.
+     *
+     * @param jobName the name job whose failure will be subscribed by the listener
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(String jobName, MetricGroup metricGroup);

Review comment:
       why not also passing the JobId?

##########
File path: docs/deployment/advanced/platform.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Customizable Features for Platform Users"
+nav-title: platform
+nav-parent_id: advanced
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+Flink provides a set of customizable features for users to extend from the default behavior through the plugin framework.
+
+## Customize Failure Listener
+For each of execution exceptions in a flink job, it will be passed to the job master. The default failure listener is only
+to record the failure count and emit the metrics numJobFailure for the job. If you need an advanced classification on exceptions, 
+you can build a plugin to customize failure listener. For example, it can distinguish whether it is a flink runtime error or an 
+application user logic error. With the accurate metrics, you may have better idea about the platform level metrics, for example 
+failures due to network, platform reliability, etc.
+
+
+# Implement a plugin for your custom failure listener

Review comment:
       This is this heading a level lower than one above?. Shouldn't this be a "###" heading?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 5ffc877b63cf33e5a374cbbda55096e429c84da0 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068) 
   * 34720c9f7ea37afb5d7f3d2a824b78fee916b755 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-772707208


   @rmetzger As I think it is a common issue of using a different classloader in runtime. Log4j issue can be an example of using the config plugin.classloader.parent-first-patterns.additional. I added in. the doc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 5ffc877b63cf33e5a374cbbda55096e429c84da0 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * f850630134ff2243b56a82206e9ec402a93acfb4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 9e445296314a2bfde4c85b61f1a3d9ee892c97ba Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13675) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] HuangZhenQiu commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
HuangZhenQiu commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-770324596


   @rmetzger 
   Thanks for these suggestions. Updated accordingly. Please review it again at your most convenient time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #14678:
URL: https://github.com/apache/flink/pull/14678#discussion_r560072956



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/FailureListener.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.executiongraph;
+
+import org.apache.flink.core.plugin.Plugin;
+import org.apache.flink.runtime.metrics.groups.JobManagerJobMetricGroup;
+
+/** Failure listener to customize the behavior for each type of failures tracked in job manager. */
+public interface FailureListener extends Plugin {
+
+    /**
+     * Initialize the listener with JobManagerJobMetricGroup.
+     *
+     * @param metricGroup metrics group that the listener can add customized metrics definition.
+     */
+    void init(JobManagerJobMetricGroup metricGroup);
+
+    /**
+     * Method to handle a failure in the listener.
+     *
+     * @param cause the failure cause
+     * @param globalFailure whether the failure is a global failure
+     */
+    void onFailure(final Throwable cause, boolean globalFailure);

Review comment:
       The purpose of MetricGroups is not to make metadata readily accessible to other components.
   Why not just pass the jobID/jobName/whatever metadata you like separately into the constructor/init? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761905721


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12162",
       "triggerID" : "296a46dab704e3a3553bb7a43128a3f4774bfb77",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cade20e85b29ca63c51383dca04976c1d9801042",
       "triggerType" : "PUSH"
     }, {
       "hash" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12197",
       "triggerID" : "748434f298f7ba97a6f2b9e77638ef60ccad9d82",
       "triggerType" : "PUSH"
     }, {
       "hash" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12204",
       "triggerID" : "25908aedd847a453981533a3bd87117bcdc7ac78",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12352",
       "triggerID" : "66efa5e224d899b11d89cf6419b376be0187a7bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12354",
       "triggerID" : "e32796821c485c583a2b10b55d3f8b780971d357",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12363",
       "triggerID" : "a4c68763883ef4f3c411fa606d0907ef334a6e76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12400",
       "triggerID" : "1a6bd7aa98b649bd119479247d34ea029b8ba636",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12704",
       "triggerID" : "11651466eb13c5c05d2cd0eed0ac08b9bf617185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12719",
       "triggerID" : "12652d9c7829d2d26d41f61185cbd7ab8f4ad505",
       "triggerType" : "PUSH"
     }, {
       "hash" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12884",
       "triggerID" : "399c2ce9c9f9637a20ddbfd684a9a4e82b41265e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a3a28907e4723571c04853236d8c3712195e5821",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=12984",
       "triggerID" : "a3a28907e4723571c04853236d8c3712195e5821",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13068",
       "triggerID" : "5ffc877b63cf33e5a374cbbda55096e429c84da0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13120",
       "triggerID" : "34720c9f7ea37afb5d7f3d2a824b78fee916b755",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13539",
       "triggerID" : "7e811ff57a72897e49425a53b5d956f5a1f32ea9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13615",
       "triggerID" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13538",
       "triggerID" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f51da4ba54bd567ae6a58e16c2e9e3882a76831c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13658",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "f850630134ff2243b56a82206e9ec402a93acfb4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=13632",
       "triggerID" : "784358861",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9e445296314a2bfde4c85b61f1a3d9ee892c97ba",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cade20e85b29ca63c51383dca04976c1d9801042 UNKNOWN
   * 9e445296314a2bfde4c85b61f1a3d9ee892c97ba UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #14678: [FLINK-20833][runtime] Add pluggable failure listener in job manager

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #14678:
URL: https://github.com/apache/flink/pull/14678#issuecomment-761904406


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 296a46dab704e3a3553bb7a43128a3f4774bfb77 (Mon Jan 18 00:02:10 UTC 2021)
   
   **Warnings:**
    * **1 pom.xml files were touched**: Check for build and licensing issues.
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org