You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/15 08:59:00 UTC

[jira] [Commented] (FLINK-8434) The new yarn resource manager should take over the running task managers after failover

    [ https://issues.apache.org/jira/browse/FLINK-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326011#comment-16326011 ] 

ASF GitHub Bot commented on FLINK-8434:
---------------------------------------

GitHub user shuai-xu opened a pull request:

    https://github.com/apache/flink/pull/5297

    [FLINK-8434] Take over the running task manager after yarn app master failvoer

    
    ## What is the purpose of the change
    
    *This pull request makes the yarn resource manager could take over the running container from previous attempt.*
    
    ## Verifying this change
    
    This change is tested manually.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? ( no)
      - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shuai-xu/flink jira-8434

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5297.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5297
    
----
commit 2b05e621f57f1d6885d37c3aa7972e6755bc1a20
Author: shuai.xus <sh...@...>
Date:   2018-01-15T08:54:40Z

    [FLINK-8434] Take over the running task manager after yarn app master failover

----


> The new yarn resource manager should take over the running task managers after failover
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-8434
>                 URL: https://issues.apache.org/jira/browse/FLINK-8434
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management
>    Affects Versions: 1.5.0
>            Reporter: shuai.xu
>            Assignee: shuai.xu
>            Priority: Major
>              Labels: flip-6
>
> The app master which container the job master and yarn resource manager may failover during running on yarn. The new resource manager should take over the running task managers after started. But now the YarnResourceManager does not record the running container to workerNodeMap, so when task managers register to it, it will reject them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)