You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/08 02:49:20 UTC

[jira] [Commented] (FLINK-4537) ResourceManager registration with JobManager

    [ https://issues.apache.org/jira/browse/FLINK-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15472525#comment-15472525 ] 

ASF GitHub Bot commented on FLINK-4537:
---------------------------------------

GitHub user beyond1920 opened a pull request:

    https://github.com/apache/flink/pull/2479

    [FLINK-4537] [cluster management] ResourceManager registration with JobManager

    This pull request is to implement ResourceManager registration with JobManager, which including:
    1. Check whether input resourceManagerLeaderId is as same as the current leadershipSessionId of resourceManager. If not, it means that maybe two or more resourceManager exists at the same time, and current resourceManager is not the proper rm. so it rejects or ignores the registration.
    2. Check whether exists a valid JobMaster at the giving address by connecting to the address. Reject the registration from invalid address.(Hidden in the connect logic)
    3. Keep JobID and JobMasterGateway mapping relationships.
    4. Start a JobMasterLeaderListener at the given JobID to listen to the leadership of the specified JobMaster.
    5. Send registration successful ack to the jobMaster.
    
    Main difference are 6 points:
    1. Add getJobMasterLeaderRetriever method to get job master leader retriever in HighAvailabilityServices, NonHaServices, A inner class in TaskExecutor, TestingHighAvailabilityServices.
    2. Change registerJobMaster method logic of ResourceManager based on the above step
    3. Change the input parameters of registerJobMaster method in ResourceManager and ResourceManagerGateway class to be consistent with registerTaskExecutor, from jobMasterRegistration to resourceManagerLeaderId + jobMasterAddress  + jobID
    4. Change the result type of registerJobMaster method in ResourceManager and ResourceManagerGateway class to be consistent with RetryingRegistration, from org.apache.flink.runtime.resourcemanager.RegistrationResponse to org.apache.flink.runtime.registration.RegistrationResponse
    5. Add a LeaderRetrievalListener in ResourceManager to listen to leadership of jobMaster
    6. Add a test class for registerJobMaster method in ResourceManager

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/alibaba/flink jira-4537

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2479
    
----
commit fa66ac8ae86745dc9daf1fb07c6c96be4f336c90
Author: beyond1920 <be...@126.com>
Date:   2016-09-01T07:27:20Z

    rsourceManager registration with JobManager

commit f5e54a21e4a864b5ac5f2f548b6d3dea3edcb619
Author: beyond1920 <be...@126.com>
Date:   2016-09-07T09:53:44Z

    Add JobMasterLeaderRetriverListener at ResourceManager

----


> ResourceManager registration with JobManager
> --------------------------------------------
>
>                 Key: FLINK-4537
>                 URL: https://issues.apache.org/jira/browse/FLINK-4537
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Cluster Management
>            Reporter: Maximilian Michels
>            Assignee: zhangjing
>
> The ResourceManager keeps tracks of all JobManager's which execute Jobs. When a new JobManager registered, its leadership status is checked through the HighAvailabilityServices. It will then be registered at the ResourceManager using the {{JobID}} provided with the initial registration message.
> ResourceManager should use JobID and LeaderSessionID(notified by HighAvailabilityServices) to identify a a session to JobMaster.
> When JobManager's register at ResourceManager, it takes the following 2 input parameters :
> 1. resourceManagerLeaderId:  the fencing token for the ResourceManager leader which is kept by JobMaster who send the registration
> 2. JobMasterRegistration: contain address, JobID
> ResourceManager need to process the registration event based on the following steps:
> 1. Check whether input resourceManagerLeaderId is as same as the current leadershipSessionId of resourceManager. If not, it means that maybe two or more resourceManager exists at the same time, and current resourceManager is not the proper rm. so it  rejects or ignores the registration.
> 2. Check whether exists a valid JobMaster at the giving address by connecting to the address. Reject the registration from invalid address.(Hidden in the connect logic)
> 3. Keep JobID and JobMasterGateway mapping relationships.
> 4. Start a JobMasterLeaderListener at the given JobID to listen to the leadership of the specified JobMaster.
> 5. Send registration successful ack to the jobMaster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)