You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/24 08:55:02 UTC

[jira] [Commented] (FLINK-8466) ErrorInfo needs to hold Exception as SerializedThrowable

    [ https://issues.apache.org/jira/browse/FLINK-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337156#comment-16337156 ] 

ASF GitHub Bot commented on FLINK-8466:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/5348

    [FLINK-8466] [runtime] Make sure ErrorInfo references no user-defined classes

    ## What is the purpose of the change
    
    Making sure that ErrorInfo references no user-defined classes and thus does not prevent class unloading.
    
    ## Brief change log
    
      - `ErrorInfo` holds its exceptions as `SerializedThrowable`.
      - Pull the "ground truth" exception into a separate fields, so that the ExecutionGraph logic itself can always assume to have the proper ground-truth exception. That is important, because the ExecutionGraph may react differently to different exception types, and it is critical that the exception does not get lost on occasion (as possible in case of a weak reference).
    
    
    ## Verifying this change
    
    Self contained, adds a unit test.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no)**
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no)**
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes** / no / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no)**
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink error_info

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5348.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5348
    
----
commit 9a95c377a8ce713791a8697ca8fa2a5f24dacc4f
Author: Stephan Ewen <se...@...>
Date:   2018-01-23T21:00:06Z

    [FLINK-8466] [runtime] Make sure ErrorInfo references no user-defined classes.
    
    That way, holding on to the ErrorInfo does not prevent class unloading.
    
    However, this implies that the ErrorInfo must not hold strong references to any Exception classes.
    For that reason, the commit pull the "ground truth" exception into a separate fields, so that the
    ExecutionGraph logic itself can always assume to have the proper ground-truth exception.

----


> ErrorInfo needs to hold Exception as SerializedThrowable
> --------------------------------------------------------
>
>                 Key: FLINK-8466
>                 URL: https://issues.apache.org/jira/browse/FLINK-8466
>             Project: Flink
>          Issue Type: Bug
>          Components: Local Runtime
>    Affects Versions: 1.4.0
>            Reporter: Jelmer Kuperus
>            Assignee: Stephan Ewen
>            Priority: Blocker
>             Fix For: 1.5.0, 1.4.1
>
>         Attachments: Screen Shot 2018-01-20 at 1.48.33 PM.png
>
>
> MemoryArchivist holding on to last thrown exception prevents the user code classloader from being garbage collected.
> MemoryArchivist holds on to any number of ArchivedExecutionGraph's. These ArchivedExecutionGraph instances contain a failureCause field of type ErrorInfo that wrap the exception in case one was thrown to terminate the job
> This exception class will more often than not have been loaded by a user code classloader, and as long as the MemoryArchivist holds on to this exception, the JVM won't be able to reclaim the resources held by this classloader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)