You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/07/31 18:01:19 UTC

[GitHub] [flink] mxm opened a new pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

mxm opened a new pull request #13042:
URL: https://github.com/apache/flink/pull/13042


   ## What is the purpose of the change
   
   This adds the configuration option `taskmanager.graceful-shutdown-on-error`
   which defaults to `true`. If set to `false`, a custom SecurityManager will be
   installed on top of the existing SecurityManager to exit forcefully via 
   `Runtime#halt`.
   
   ## Brief change log
   
   - If `taskmanager.graceful-shutdown-on-error: false` is set in the config,
   the shutdown will be performed via `Runtime#halt` instead of `System#exit`.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
     - Addded unit tests
     - Added integration test
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? JavaDocs / auto-generated configuration
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
mxm commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-669865447


   Actually, it might not be me: https://status.npmjs.org/
   
   ![image](https://user-images.githubusercontent.com/837221/89525465-87433500-d7e6-11ea-905a-23d8bd65bc3d.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * d1104b36d044153572bd61172b03c7af3cf8c9b8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213) 
   * 8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
mxm commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-669864569


   I'm unable to build the documentation. Attaching a log here in case there is something obvious I'm missing: [08-06T11_04_59_915Z-debug.log](https://github.com/apache/flink/files/5034433/2020-08-06T11_04_59_915Z-debug.log)
   
   I used the following command:
   
   ```
   mvn package -Dgenerate-config-docs -pl flink-docs -am -nsu -DskipTests -Dcheckstyle.skip
   ```
   
   which currently yields
   
   ```
   [ERROR] npm ERR! code E404
   [ERROR] npm ERR! 404 Not Found: @antv/g2@3.4.10
   [ERROR]
   [ERROR] npm ERR! A complete log of this run can be found in:
   [ERROR] npm ERR!     /Users/max/.npm/_logs/2020-08-06T11_04_59_915Z-debug.log
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on a change in pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r464480958



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java
##########
@@ -470,4 +479,30 @@ static String getTaskManagerResourceID(Configuration config, String rpcAddress,
 					? InetAddress.getLocalHost().getHostName() + "-" + new AbstractID().toString().substring(0, 6)
 					: rpcAddress + ":" + rpcPort + "-" + new AbstractID().toString().substring(0, 6));
 	}
+
+	/**
+	 * If configured, registers a custom SecurityManager which forcefully exists the TaskManager instead of
+	 * shutting it down gracefully via the registered ShutdownHooks.
+	 *
+	 * @param configuration The task manager configuration
+	 */
+	private static void maybeOverrideShutdownLogic(Configuration configuration) {
+		boolean gracefulShutdown = configuration.get(GRACEFUL_SHUTDOWN_ON_ERROR);
+		if (gracefulShutdown) {
+			// No need to setup a SecurityManager to deal with System.exit calls.
+			return;
+		}
+		SecurityManager forcefulShutdownManager = new ExitTrappingSecurityManager(
+			status -> Runtime.getRuntime().halt(status),
+			System.getSecurityManager());
+		try {
+			System.setSecurityManager(forcefulShutdownManager);
+		} catch (Exception e) {
+			throw new IllegalConfigurationException(
+				String.format("Could not register forceful shutdown handler for configuration '%s'. Either allow setting a SecurityManager or set the configuration to its default: '%s'",
+					GRACEFUL_SHUTDOWN_ON_ERROR.key(),
+					GRACEFUL_SHUTDOWN_ON_ERROR.defaultValue()),
+				e);

Review comment:
       That is indeed a limitation of this approach. The default behavior doesn't require installing a security manager but to use the forceful shutdown feature, we need permission to add a security manager. The situation is similar for `Thread.setDefaultUncaughtExceptionHandler` because it also needs permission from the `SecurityManager`. Arguably, maybe that's a less restrictive permission.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667260335


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 0d7596f938ca8c655fef4c011d67e921d3a4b946 (Fri Jul 31 18:03:56 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * d1104b36d044153572bd61172b03c7af3cf8c9b8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213) 
   * 8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 3d666819aa43ea14b87276318fec3fcf196a4a65 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206) 
   * 75f19b47fedf94e26143ea354acf0737d70914ec Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210) 
   * d1104b36d044153572bd61172b03c7af3cf8c9b8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214) 
   * 1ddd967d98afba6c86d04e1f805d5eaab48197c6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246",
       "triggerID" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5253",
       "triggerID" : "8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5253) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on a change in pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r466325255



##########
File path: flink-core/src/main/java/org/apache/flink/configuration/ClusterOptions.java
##########
@@ -78,4 +79,13 @@
 				.text("Enable the slot spread out allocation strategy. This strategy tries to spread out " +
 					"the slots evenly across all available %s.", code("TaskExecutors"))
 				.build());
+
+	@Documentation.Section(Documentation.Sections.EXPERT_CLUSTER)
+	public static final ConfigOption<Boolean> HALT_ON_FATAL_ERROR =
+		key("cluster.processes.halt-on-fatal-error")

Review comment:
       Currently at:
   ```
   [ERROR] npm ERR! code E404
   [ERROR] npm ERR! 404 Not Found: @yarnpkg/lockfile@1.1.0
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann closed pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
tillrohrmann closed pull request #13042:
URL: https://github.com/apache/flink/pull/13042


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 3d666819aa43ea14b87276318fec3fcf196a4a65 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206) 
   * 75f19b47fedf94e26143ea354acf0737d70914ec UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r465927435



##########
File path: flink-annotations/src/main/java/org/apache/flink/annotation/docs/Documentation.java
##########
@@ -91,6 +91,7 @@
 		public static final String EXPERT_ZOOKEEPER_HIGH_AVAILABILITY = "expert_high_availability_zk";
 		public static final String EXPERT_SECURITY_SSL = "expert_security_ssl";
 		public static final String EXPERT_ROCKSDB = "expert_rocksdb";
+		public static final String EXPERT_CLUSTER = "expert_cluster";

Review comment:
       You need to include this section into the `config.md(.zh)` for it to show up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 75f19b47fedf94e26143ea354acf0737d70914ec Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210) 
   * d1104b36d044153572bd61172b03c7af3cf8c9b8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-668452463


   Fair points. If it should ever become a problem that we set a custom `SecurityManager` we might have to revisit this problem.
   
   +1 for making the mechanism available for all Flink processes.
   
   One small comment for the place where to instantiate the `SecurityManager`: At the moment it happens in the `TaskManagerRunner.startTaskManager` method. This method will be called multiple times by the `MiniCluster`, though. Hence, I would suggest to move the instantiation of the `SecurityManager` more to the entrypoints (`ClusterEntrypoint`, `MiniCluster`, `TaskManagerRunner.runTaskManagerSecurely` (update `YarnTaskExecutorRunner` to call into the same method)).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 3d666819aa43ea14b87276318fec3fcf196a4a65 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
mxm commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-669161031


   PR updated to address the comments. 
   
   - Added more reasoning to why this feature might be used
   - Moved the initialization code
   - I've skipped `MiniCluster` because I'm not sure this feature should be applied in a MiniCluster context. 
   - Refactored startup for YarnTaskExecutorRunner
   - I've renamed the configuration option name (defaults to false now)
   - Added tests to check that the feature is disabled by default
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-670374315


   @mxm For the future, if you are running into issues building the WebUI, then you can skip it using `-Pskip-webui-build`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246",
       "triggerID" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * d945f0a46af73e5b231a309ad07479de9c9f1b3e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246) 
   * 8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 3d666819aa43ea14b87276318fec3fcf196a4a65 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246",
       "triggerID" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 1ddd967d98afba6c86d04e1f805d5eaab48197c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243) 
   * d945f0a46af73e5b231a309ad07479de9c9f1b3e Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246",
       "triggerID" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * d945f0a46af73e5b231a309ad07479de9c9f1b3e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
mxm commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-668083787


   Good questions @tillrohrmann. I was looking into setting a default exception handler but then realized we have exception handling in numerous places and it was not exactly trivial to figure out if the default handler would end up being called. Also, you mentioned an application can set a default exception handler too. 
   
   In the end, the SecurityManager approach seemed to be the most elegant. It has the drawback of needing the permission to install a SecurityManager. Similarly, `Thread#setDefaultUncaughtExceptionHandler` also needs permission from the SecurityManager, if present. 
   
   There is a bit of magic involved here, but not by default. Overall, I think it is reasonable to allow converting exit calls if explicitly switched on. After all, the configuration option is a workaround for a JVM bug which means that a bit of magic might be appropriate.
   
   We can expand the configuration option to be a global option and not only affect the task managers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r465791644



##########
File path: flink-yarn/src/main/java/org/apache/flink/yarn/YarnTaskExecutorRunner.java
##########
@@ -90,17 +85,9 @@ private static void runTaskManagerSecurely(String[] args) {
 			LOG.info("Current working Directory: {}", currDir);
 
 			final Configuration configuration = TaskManagerRunner.loadConfiguration(args);
+			setupAndModifyConfiguration(configuration, currDir, ENV);
 
-			final PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
-
-			FileSystem.initialize(configuration, pluginManager);
-
-			setupConfigurationAndInstallSecurityContext(configuration, currDir, ENV);
-
-			SecurityUtils.getInstalledContext().runSecured((Callable<Void>) () -> {
-				TaskManagerRunner.runTaskManager(configuration, pluginManager);
-				return null;
-			});
+			TaskManagerRunner.runTaskManagerSecurely(configuration);

Review comment:
       I think we also have to do this for the `MesosTaskExecutorRunner`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 75f19b47fedf94e26143ea354acf0737d70914ec Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210) 
   * d1104b36d044153572bd61172b03c7af3cf8c9b8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r465931031



##########
File path: flink-core/src/main/java/org/apache/flink/configuration/ClusterOptions.java
##########
@@ -78,4 +79,13 @@
 				.text("Enable the slot spread out allocation strategy. This strategy tries to spread out " +
 					"the slots evenly across all available %s.", code("TaskExecutors"))
 				.build());
+
+	@Documentation.Section(Documentation.Sections.EXPERT_CLUSTER)
+	public static final ConfigOption<Boolean> HALT_ON_FATAL_ERROR =
+		key("cluster.processes.halt-on-fatal-error")
+			.booleanType()
+			.defaultValue(false)
+			.withDescription("Whether processes should halt on fatal errors instead of performing a graceful shutdown. " +
+				"In some environments (e.g. Java 8 with the G1 garbage collector), a regular graceful shutdown can lead " +
+				"to a JVM deadlock. See: https://issues.apache.org/jira/browse/FLINK-16510");

Review comment:
       ```suggestion
   			.withDescription(Description.builder().text(
   				"Whether processes should halt on fatal errors instead of performing a graceful shutdown. " +
   					"In some environments (e.g. Java 8 with the G1 garbage collector), a regular graceful shutdown can lead " +
   					"to a JVM deadlock. See %s for details.",
   				link("https://issues.apache.org/jira/browse/FLINK-16510", "FLINK-16510")
   			.build());
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r464446281



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java
##########
@@ -470,4 +479,30 @@ static String getTaskManagerResourceID(Configuration config, String rpcAddress,
 					? InetAddress.getLocalHost().getHostName() + "-" + new AbstractID().toString().substring(0, 6)
 					: rpcAddress + ":" + rpcPort + "-" + new AbstractID().toString().substring(0, 6));
 	}
+
+	/**
+	 * If configured, registers a custom SecurityManager which forcefully exists the TaskManager instead of
+	 * shutting it down gracefully via the registered ShutdownHooks.
+	 *
+	 * @param configuration The task manager configuration
+	 */
+	private static void maybeOverrideShutdownLogic(Configuration configuration) {
+		boolean gracefulShutdown = configuration.get(GRACEFUL_SHUTDOWN_ON_ERROR);
+		if (gracefulShutdown) {
+			// No need to setup a SecurityManager to deal with System.exit calls.
+			return;
+		}
+		SecurityManager forcefulShutdownManager = new ExitTrappingSecurityManager(
+			status -> Runtime.getRuntime().halt(status),
+			System.getSecurityManager());
+		try {
+			System.setSecurityManager(forcefulShutdownManager);
+		} catch (Exception e) {
+			throw new IllegalConfigurationException(
+				String.format("Could not register forceful shutdown handler for configuration '%s'. Either allow setting a SecurityManager or set the configuration to its default: '%s'",
+					GRACEFUL_SHUTDOWN_ON_ERROR.key(),
+					GRACEFUL_SHUTDOWN_ON_ERROR.defaultValue()),
+				e);

Review comment:
       What if some corporate policies do not allow to install a custom `SecurityManager`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 3d666819aa43ea14b87276318fec3fcf196a4a65 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on a change in pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r465807429



##########
File path: flink-yarn/src/main/java/org/apache/flink/yarn/YarnTaskExecutorRunner.java
##########
@@ -90,17 +85,9 @@ private static void runTaskManagerSecurely(String[] args) {
 			LOG.info("Current working Directory: {}", currDir);
 
 			final Configuration configuration = TaskManagerRunner.loadConfiguration(args);
+			setupAndModifyConfiguration(configuration, currDir, ENV);
 
-			final PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
-
-			FileSystem.initialize(configuration, pluginManager);
-
-			setupConfigurationAndInstallSecurityContext(configuration, currDir, ENV);
-
-			SecurityUtils.getInstalledContext().runSecured((Callable<Void>) () -> {
-				TaskManagerRunner.runTaskManager(configuration, pluginManager);
-				return null;
-			});
+			TaskManagerRunner.runTaskManagerSecurely(configuration);

Review comment:
       Good point!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
mxm commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-669962254


   Awesome, thanks for your review and for generating the docs!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on a change in pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r464480866



##########
File path: flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java
##########
@@ -79,6 +79,16 @@
 			.defaultValue(false)
 			.withDescription("Whether to kill the TaskManager when the task thread throws an OutOfMemoryError.");
 
+	/**
+	 * Whether the task manager should attempt to gracefully shutdown itself on errors.
+	 */
+	@Documentation.Section(Documentation.Sections.ALL_TASK_MANAGER)
+	public static final ConfigOption<Boolean> GRACEFUL_SHUTDOWN_ON_ERROR =
+		key("taskmanager.graceful-shutdown-on-error")

Review comment:
       The scope is currently all task manager processes but it makes sense to extend it to all entry points.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tweise commented on a change in pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
tweise commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r465360429



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java
##########
@@ -470,4 +479,30 @@ static String getTaskManagerResourceID(Configuration config, String rpcAddress,
 					? InetAddress.getLocalHost().getHostName() + "-" + new AbstractID().toString().substring(0, 6)
 					: rpcAddress + ":" + rpcPort + "-" + new AbstractID().toString().substring(0, 6));
 	}
+
+	/**
+	 * If configured, registers a custom SecurityManager which forcefully exists the TaskManager instead of
+	 * shutting it down gracefully via the registered ShutdownHooks.
+	 *
+	 * @param configuration The task manager configuration
+	 */
+	private static void maybeOverrideShutdownLogic(Configuration configuration) {
+		boolean gracefulShutdown = configuration.get(GRACEFUL_SHUTDOWN_ON_ERROR);
+		if (gracefulShutdown) {
+			// No need to setup a SecurityManager to deal with System.exit calls.
+			return;
+		}
+		SecurityManager forcefulShutdownManager = new ExitTrappingSecurityManager(
+			status -> Runtime.getRuntime().halt(status),
+			System.getSecurityManager());
+		try {
+			System.setSecurityManager(forcefulShutdownManager);
+		} catch (Exception e) {
+			throw new IllegalConfigurationException(
+				String.format("Could not register forceful shutdown handler for configuration '%s'. Either allow setting a SecurityManager or set the configuration to its default: '%s'",
+					GRACEFUL_SHUTDOWN_ON_ERROR.key(),
+					GRACEFUL_SHUTDOWN_ON_ERROR.defaultValue()),
+				e);

Review comment:
       I think that is an acceptable limitation since it is an optional feature. It would be good to document it as part of the configuration option.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
mxm commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-669283889


   Updated and rebased.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246",
       "triggerID" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5253",
       "triggerID" : "8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * d945f0a46af73e5b231a309ad07479de9c9f1b3e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5246) 
   * 8aacbd45aca3c53de3c1d89d4323f3f1d7cf7a0b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5253) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] mxm commented on a change in pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r466301465



##########
File path: flink-core/src/main/java/org/apache/flink/configuration/ClusterOptions.java
##########
@@ -78,4 +79,13 @@
 				.text("Enable the slot spread out allocation strategy. This strategy tries to spread out " +
 					"the slots evenly across all available %s.", code("TaskExecutors"))
 				.build());
+
+	@Documentation.Section(Documentation.Sections.EXPERT_CLUSTER)
+	public static final ConfigOption<Boolean> HALT_ON_FATAL_ERROR =
+		key("cluster.processes.halt-on-fatal-error")

Review comment:
       I'll update the docs once my computer has finished downloading the internet ;)

##########
File path: flink-annotations/src/main/java/org/apache/flink/annotation/docs/Documentation.java
##########
@@ -91,6 +91,7 @@
 		public static final String EXPERT_ZOOKEEPER_HIGH_AVAILABILITY = "expert_high_availability_zk";
 		public static final String EXPERT_SECURITY_SSL = "expert_security_ssl";
 		public static final String EXPERT_ROCKSDB = "expert_rocksdb";
+		public static final String EXPERT_CLUSTER = "expert_cluster";

Review comment:
       Thanks. Will do!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 1ddd967d98afba6c86d04e1f805d5eaab48197c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-669911370


   Let me try whether I can build the documentation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * d1104b36d044153572bd61172b03c7af3cf8c9b8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243",
       "triggerID" : "1ddd967d98afba6c86d04e1f805d5eaab48197c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d945f0a46af73e5b231a309ad07479de9c9f1b3e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 1ddd967d98afba6c86d04e1f805d5eaab48197c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5243) 
   * d945f0a46af73e5b231a309ad07479de9c9f1b3e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #13042: [FLINK-16510] Allow overriding of graceful with forceful shutdown

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r464443086



##########
File path: flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java
##########
@@ -79,6 +79,16 @@
 			.defaultValue(false)
 			.withDescription("Whether to kill the TaskManager when the task thread throws an OutOfMemoryError.");
 
+	/**
+	 * Whether the task manager should attempt to gracefully shutdown itself on errors.
+	 */
+	@Documentation.Section(Documentation.Sections.ALL_TASK_MANAGER)
+	public static final ConfigOption<Boolean> GRACEFUL_SHUTDOWN_ON_ERROR =
+		key("taskmanager.graceful-shutdown-on-error")

Review comment:
       I think it would be nice to make this behaviour configurable for all Flink processes.

##########
File path: flink-runtime/src/test/java/org/apache/flink/runtime/security/ExitTrappingSecurityManagerTest.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.security;
+
+import org.junit.Test;
+
+import java.security.Permission;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import static org.hamcrest.Matchers.is;
+import static org.junit.Assert.assertThat;
+
+/** Unit tests for {@link ExitTrappingSecurityManager}. */
+public class ExitTrappingSecurityManagerTest {

Review comment:
       `extends TestLogger` is missing.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java
##########
@@ -470,4 +479,30 @@ static String getTaskManagerResourceID(Configuration config, String rpcAddress,
 					? InetAddress.getLocalHost().getHostName() + "-" + new AbstractID().toString().substring(0, 6)
 					: rpcAddress + ":" + rpcPort + "-" + new AbstractID().toString().substring(0, 6));
 	}
+
+	/**
+	 * If configured, registers a custom SecurityManager which forcefully exists the TaskManager instead of
+	 * shutting it down gracefully via the registered ShutdownHooks.
+	 *
+	 * @param configuration The task manager configuration
+	 */
+	private static void maybeOverrideShutdownLogic(Configuration configuration) {
+		boolean gracefulShutdown = configuration.get(GRACEFUL_SHUTDOWN_ON_ERROR);
+		if (gracefulShutdown) {
+			// No need to setup a SecurityManager to deal with System.exit calls.
+			return;
+		}
+		SecurityManager forcefulShutdownManager = new ExitTrappingSecurityManager(
+			status -> Runtime.getRuntime().halt(status),
+			System.getSecurityManager());
+		try {
+			System.setSecurityManager(forcefulShutdownManager);
+		} catch (Exception e) {
+			throw new IllegalConfigurationException(
+				String.format("Could not register forceful shutdown handler for configuration '%s'. Either allow setting a SecurityManager or set the configuration to its default: '%s'",
+					GRACEFUL_SHUTDOWN_ON_ERROR.key(),
+					GRACEFUL_SHUTDOWN_ON_ERROR.defaultValue()),
+				e);

Review comment:
       What if some corporate policies does not allow to install a custom `SecurityManager`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #13042:
URL: https://github.com/apache/flink/pull/13042#discussion_r465927278



##########
File path: flink-core/src/main/java/org/apache/flink/configuration/ClusterOptions.java
##########
@@ -78,4 +79,13 @@
 				.text("Enable the slot spread out allocation strategy. This strategy tries to spread out " +
 					"the slots evenly across all available %s.", code("TaskExecutors"))
 				.build());
+
+	@Documentation.Section(Documentation.Sections.EXPERT_CLUSTER)
+	public static final ConfigOption<Boolean> HALT_ON_FATAL_ERROR =
+		key("cluster.processes.halt-on-fatal-error")

Review comment:
       You will need to regenerate the documentation.

##########
File path: flink-annotations/src/main/java/org/apache/flink/annotation/docs/Documentation.java
##########
@@ -91,6 +91,7 @@
 		public static final String EXPERT_ZOOKEEPER_HIGH_AVAILABILITY = "expert_high_availability_zk";
 		public static final String EXPERT_SECURITY_SSL = "expert_security_ssl";
 		public static final String EXPERT_ROCKSDB = "expert_rocksdb";
+		public static final String EXPERT_CLUSTER = "expert_cluster";

Review comment:
       You need to include this section into the `config.md` for it to show up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #13042: [FLINK-16510] Allow configuring shutdown behavior to avoid JVM freeze

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #13042:
URL: https://github.com/apache/flink/pull/13042#issuecomment-667268558


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092",
       "triggerID" : "0d7596f938ca8c655fef4c011d67e921d3a4b946",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94ce334e46321af04393fdcf7219f4c8c80ba655",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5206",
       "triggerID" : "3d666819aa43ea14b87276318fec3fcf196a4a65",
       "triggerType" : "PUSH"
     }, {
       "hash" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5210",
       "triggerID" : "75f19b47fedf94e26143ea354acf0737d70914ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5213",
       "triggerID" : "d1104b36d044153572bd61172b03c7af3cf8c9b8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214",
       "triggerID" : "8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0d7596f938ca8c655fef4c011d67e921d3a4b946 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5092) 
   * 94ce334e46321af04393fdcf7219f4c8c80ba655 UNKNOWN
   * 8826ac20d30fc87d026d2406bf4de1bdcb4ddeb4 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5214) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org