You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/06 15:46:41 UTC

[jira] [Commented] (FLINK-5718) Handle JVM Fatal Exceptions in Tasks

    [ https://issues.apache.org/jira/browse/FLINK-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854227#comment-15854227 ] 

ASF GitHub Bot commented on FLINK-5718:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/3276

    [FLINK-5718] [core] TaskManagers exit the JVM on fatal exceptions.

    *This adds a feature requested by a user for production stability.*
    
    Certain exceptions should not be attempted to be handled by the TaskManager, because they indicate that the JVM is corrupt. When the task throws such an exception, the TaskManager simply forcefully and immediately exits the JVM.
    
    Optionally, the `OutOfMemoryError` can also be set to cause such immediate JVM termination, via the `taskmanager.jvm-exit-on-oom` config option.
    
    
    ### Tests
    
    This adds a test that tests the option and the actual process kill (via a spawned test process). 
    
    ### Documentation
    
    This adds the `taskmanager.jvm-exit-on-oom` to the `setup/config.md` docs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink exit_on_fatal_error

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3276.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3276
    
----
commit 21c08817554e5a66186afa83158ca9c6ac975ba4
Author: Stephan Ewen <se...@apache.org>
Date:   2017-02-06T14:52:39Z

    [FLINK-5718] [core] TaskManagers exit the JVM on fatal exceptions.

----


> Handle JVM Fatal Exceptions in Tasks
> ------------------------------------
>
>                 Key: FLINK-5718
>                 URL: https://issues.apache.org/jira/browse/FLINK-5718
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>
> The TaskManager catches and handles all types of exceptions right now (all {{Throwables}}). The intention behind that is:
>   - Many {{Error}} subclasses are recoverable for the TaskManagers, such as failure to load/link user code
>   - We want to give eager notifications to the JobManager in case something in a task goes wrong.
> However, there are some exceptions which should probably simply terminate the JVM, if caught in the task thread, because they may leave the JVM in a dysfunctional limbo state:
>   - {{OutOfMemoryError}}
>   - {{InternalError}}
>   - {{UnknownError}}
>   - {{ZipError}}
> These are basically the subclasses of {{VirtualMachineError}}, except for {{StackOverflowError}}, which is recoverable and usually recovered already by the time the exception has been thrown and the stack unwound.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)