You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/06 17:54:41 UTC

[jira] [Commented] (FLINK-5974) Support Mesos DNS

    [ https://issues.apache.org/jira/browse/FLINK-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959444#comment-15959444 ] 

ASF GitHub Bot commented on FLINK-5974:
---------------------------------------

GitHub user vijikarthi opened a pull request:

    https://github.com/apache/flink/pull/3692

    FLINK-5974 Added configurations to support mesos-dns hostname resolution

    This PR addresses FLINK-5974 requirements which takes care of handling dynamic host name resolution for JM and TM components especially in some deployment environment like Mesos/DCOS.
    
    It addresses two main functionalities.
    
    a) Dynamic host name configuration
    
    Support for specifying hostname for JM/TM is already available through `-jobmanager.rpc.address` and `taskmanager.hostname` configurations.
    
    However in Mesos DC/OS type of environment, each task container can be looked up using an hostname alias which is derived using the format `<task>.<service>.mesos` where the service discovery is managed through `mesos-dns`. To support these dynamic hostname lookup, we have introduced a new configuration `mesos.resourcemanager.tasks.hostname` which takes the format `_TASK.<ANY_VALUE>`. 
    
    When this property is supplied, the `_TASK` token will be replaced with the `TASK_ID` of the TM container and the final derived string will be used to populate `taskmanager.hostname` configuration.
    
    For example, in DCOS setup one could supply the configuration as `-Dmesos.resourcemanager.tasks.hostname=_TASK.{{FRAMEWORK_NAME}}.mesos` where `FRAMEWORK_NAME` could be `flink`
    
    Please refer to https://docs.mesosphere.com/1.9/usage/service-discovery/mesos-dns/service-naming/#a-records for more details on how Mesos service discovery works.
    
    b) Support to run *any* bootstrap script prior to execute TM startup script
    
    Currently, the TM boot script `mesos-taskmanager.sh` is the only script that is passed to Mesos launcher for booting TM container. 
    
    In DC/OS environment where service discovery is common, we need a mechanism to wait for the service discovery records to be available and the hostname is indeed resolvable before launching the TM boot script. 
    
    DCOS deployment offers a way to validate and wait for the service discovery records to be available before launching the tasks. Please see below links for more details on how it works.
    https://mesosphere.github.io/dcos-commons/developer-guide.html#task-bootstrap
    https://github.com/mesosphere/dcos-commons/blob/master/sdk/bootstrap/main.go
    
    To support this, we have introduced a new configuration `mesos.resourcemanager.tasks.cmd-prefix=$FLINK_HOME/bin/bootstrap` to provide any executable/script that can be configured to run prior to executing the TM bootstrap command. 
    
    This feature *currently* works *only for Docker based image* where the bootstrap script can be pre-baked in to a specific location that can be used to configure `mesos.resourcemanager.tasks.cmd-prefix'.
    
    While both the implementations are helping in addressing the Mesos/DCOS type of deployment but the implementation is agnostic of these environments and can be used for any generic deployment that may need such a facility.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vijikarthi/flink FLINK-5974-Master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3692.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3692
    
----
commit aeb432dc7fe8bcdd5faa49b8ad5dfb5630ea0747
Author: Vijay Srinivasaraghavan <vi...@emc.com>
Date:   2017-04-06T16:48:39Z

    FLINK-5974 Added configurations to support mesos-dns hostname resolution

----


> Support Mesos DNS
> -----------------
>
>                 Key: FLINK-5974
>                 URL: https://issues.apache.org/jira/browse/FLINK-5974
>             Project: Flink
>          Issue Type: Improvement
>          Components: Cluster Management, Mesos
>            Reporter: Eron Wright 
>            Assignee: Vijay Srinivasaraghavan
>
> In certain Mesos/DCOS environments, the slave hostnames aren't resolvable.  For this and other reasons, Mesos DNS names would ideally be used for communication within the Flink cluster, not the hostname discovered via `InetAddress.getLocalHost`.
> Some parts of Flink are already configurable in this respect, notably `jobmanager.rpc.address`.  However, the Mesos AppMaster doesn't use that setting for everything (e.g. artifact server), it uses the hostname.
> Similarly, the `taskmanager.hostname` setting isn't used in Mesos deployment mode.   To effectively use Mesos DNS, the TM should use `<task-name>.<framework-name>.mesos` as its hostname.   This could be derived from an interpolated configuration string.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)