You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/06 17:54:41 UTC
[jira] [Commented] (FLINK-5974) Support Mesos DNS
[ https://issues.apache.org/jira/browse/FLINK-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959444#comment-15959444 ]
ASF GitHub Bot commented on FLINK-5974:
---------------------------------------
GitHub user vijikarthi opened a pull request:
https://github.com/apache/flink/pull/3692
FLINK-5974 Added configurations to support mesos-dns hostname resolution
This PR addresses FLINK-5974 requirements which takes care of handling dynamic host name resolution for JM and TM components especially in some deployment environment like Mesos/DCOS.
It addresses two main functionalities.
a) Dynamic host name configuration
Support for specifying hostname for JM/TM is already available through `-jobmanager.rpc.address` and `taskmanager.hostname` configurations.
However in Mesos DC/OS type of environment, each task container can be looked up using an hostname alias which is derived using the format `<task>.<service>.mesos` where the service discovery is managed through `mesos-dns`. To support these dynamic hostname lookup, we have introduced a new configuration `mesos.resourcemanager.tasks.hostname` which takes the format `_TASK.<ANY_VALUE>`.
When this property is supplied, the `_TASK` token will be replaced with the `TASK_ID` of the TM container and the final derived string will be used to populate `taskmanager.hostname` configuration.
For example, in DCOS setup one could supply the configuration as `-Dmesos.resourcemanager.tasks.hostname=_TASK.{{FRAMEWORK_NAME}}.mesos` where `FRAMEWORK_NAME` could be `flink`
Please refer to https://docs.mesosphere.com/1.9/usage/service-discovery/mesos-dns/service-naming/#a-records for more details on how Mesos service discovery works.
b) Support to run *any* bootstrap script prior to execute TM startup script
Currently, the TM boot script `mesos-taskmanager.sh` is the only script that is passed to Mesos launcher for booting TM container.
In DC/OS environment where service discovery is common, we need a mechanism to wait for the service discovery records to be available and the hostname is indeed resolvable before launching the TM boot script.
DCOS deployment offers a way to validate and wait for the service discovery records to be available before launching the tasks. Please see below links for more details on how it works.
https://mesosphere.github.io/dcos-commons/developer-guide.html#task-bootstrap
https://github.com/mesosphere/dcos-commons/blob/master/sdk/bootstrap/main.go
To support this, we have introduced a new configuration `mesos.resourcemanager.tasks.cmd-prefix=$FLINK_HOME/bin/bootstrap` to provide any executable/script that can be configured to run prior to executing the TM bootstrap command.
This feature *currently* works *only for Docker based image* where the bootstrap script can be pre-baked in to a specific location that can be used to configure `mesos.resourcemanager.tasks.cmd-prefix'.
While both the implementations are helping in addressing the Mesos/DCOS type of deployment but the implementation is agnostic of these environments and can be used for any generic deployment that may need such a facility.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vijikarthi/flink FLINK-5974-Master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3692.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3692
----
commit aeb432dc7fe8bcdd5faa49b8ad5dfb5630ea0747
Author: Vijay Srinivasaraghavan <vi...@emc.com>
Date: 2017-04-06T16:48:39Z
FLINK-5974 Added configurations to support mesos-dns hostname resolution
----
> Support Mesos DNS
> -----------------
>
> Key: FLINK-5974
> URL: https://issues.apache.org/jira/browse/FLINK-5974
> Project: Flink
> Issue Type: Improvement
> Components: Cluster Management, Mesos
> Reporter: Eron Wright
> Assignee: Vijay Srinivasaraghavan
>
> In certain Mesos/DCOS environments, the slave hostnames aren't resolvable. For this and other reasons, Mesos DNS names would ideally be used for communication within the Flink cluster, not the hostname discovered via `InetAddress.getLocalHost`.
> Some parts of Flink are already configurable in this respect, notably `jobmanager.rpc.address`. However, the Mesos AppMaster doesn't use that setting for everything (e.g. artifact server), it uses the hostname.
> Similarly, the `taskmanager.hostname` setting isn't used in Mesos deployment mode. To effectively use Mesos DNS, the TM should use `<task-name>.<framework-name>.mesos` as its hostname. This could be derived from an interpolated configuration string.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)