You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/02/16 14:43:00 UTC

[jira] [Commented] (FLINK-8673) Don't let JobManagerRunner shut down itself

    [ https://issues.apache.org/jira/browse/FLINK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367391#comment-16367391 ] 

ASF GitHub Bot commented on FLINK-8673:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/5510

    [FLINK-8673] [flip6] Use JobManagerRunner#resultFuture for success and failure communication

    ## What is the purpose of the change
    
    This commit removes the OnCompletionActions and FatalErrorHandler from the
    JobManagerRunner. Instead it communicates a successful job execution of the
    failure case through the JobManagerRunner#resultFuture.
    
    Furthermore, this commit no longer allows the JobManagerRunner to shut down itself.
    All shut down logic must be triggered by the owner of the JobManagerRunner.
    
    This PR is based on #5494.
    
    ## Verifying this change
    
    This change added tests and can be verified as follows: `JobManagerRunnerTest`
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink jobManagerRunnerShutdown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5510.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5510
    
----
commit d07221f9c918757a301c37f86ceb72bf5bb2dd0a
Author: gyao <ga...@...>
Date:   2018-02-14T19:47:11Z

    [FLINK-7711][flip6] Implement JarListHandler
    
    This closes #5209.
    This closes #5455.

commit 4a5dad9388d3ea655e045ec91aa9e1d60774100c
Author: zjureel <zj...@...>
Date:   2017-12-19T09:07:56Z

    [FLINK-7857][flip6] Port JobVertexDetailsHandler to REST endpoint

commit 1fdb138fbf8057853008cf80d1ce44acf3af98b6
Author: Till Rohrmann <tr...@...>
Date:   2018-02-15T10:16:12Z

    [FLINK-8612] [flip6] Enable non-detached job mode
    
    The non-detached job mode waits until has served the JobResult of
    a completed job at least once before it terminates.
    
    This closes #5435.

commit b53051a89e1c03396db25a38e7fe6fb3cb8bf16b
Author: gyao <ga...@...>
Date:   2018-02-15T10:32:15Z

    [FLINK-7857][flip6] Return status 404 if JobVertex is unknown
    
    This closes #5493.
    This closes #5035.

commit dc98857e06dd70df97e86bd606da2121a6ff21e4
Author: Till Rohrmann <tr...@...>
Date:   2018-02-15T10:37:58Z

    [FLINK-8662] [tests] Harden FutureUtilsTest#testRetryWithDelay
    
    This commit moves the start of the time measurement before the triggering of
    the retry with delay operation.
    
    This closes #5494.

commit 1ad474f6820f729a9bc7bcdad26a41fd178c025e
Author: Till Rohrmann <tr...@...>
Date:   2018-02-16T14:04:32Z

    [FLINK-8673] [flip6] Use JobManagerRunner#resultFuture for success and failure communication
    
    This commit removes the OnCompletionActions and FatalErrorHandler from the
    JobManagerRunner. Instead it communicates a successful job execution of the
    failure case through the JobManagerRunner#resultFuture.
    
    Furthermore, this commit no longer allows the JobManagerRunner to shut down itself.
    All shut down logic must be triggered by the owner of the JobManagerRunner.

----


> Don't let JobManagerRunner shut down itself
> -------------------------------------------
>
>                 Key: FLINK-8673
>                 URL: https://issues.apache.org/jira/browse/FLINK-8673
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> Currently, the {{JobManagerRunner}} is allowed to shut down itself in case of a job completion. This, however, can cause problems when the {{Dispatcher}} receives a request for a {{JobMaster}}. If the {{Dispatcher}} is not told about the shut down of the {{JobMaster}} then it might still try to send requests to it. This will lead to time outs.
> It would be better to simply let the {{JobManagerRunner}} not shut down itself and defer it to the owner (the {{Dispatcher}}). We can do this by listening on the {{JobManagerRunner#resultFuture}} which is completed by the {{JobManagerRunner}} in case of a successful job completion or a failure. That way we could also get rid of the {{OnCompletionActions}} and the {{FatalErrorHandler}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)